文件名称:
The Fisher Neyman-Pearson Theories
开发工具:
文件大小: 2mb
下载次数: 0
上传时间: 2019-08-23
详细说明:详细 介绍Fisher 理论, Neyman Pearson理论是不是一回事,它们的区别是什么2
1. Introduction。
The formulation and philosophy of hypothesis testing as we know it today was
largely created by three men: R.A. Fisher (1890-1962), J. Neyman (1894-1981), and
E.S. Pearson(1895-1980)in the period 1915-1933. Since then it has expanded into
one of the most widely used quantitative methodologies, and has found its way into
nearly all areas of human endeavor. It is a fairly commonly held view that the
theories due to Fisher on the one hand and to neyman and pearson on the other, are
quite distinct. This is reflected in the fact that separate terms are often used(although
somewhat inconsistently) to designate the two approaches: Significance testing for
Fishers and Hypothesis testing for that of Neyman and Pearson. But are they really
that different?
It is interesting to see what Fisher, Neyman, and Pearson themselves have to say
about this question. Fisher frequently attacked the Neyman-Pearson (NP)approach as
completely inappropriate to the testing of scientific hypotheses (although perhaps suit-
able in the context of acceptance sampling). In his last book Statistical Methods and
Scientific Inference"(3rd ed, published posthumously in 1973, to which we shall
refer as SMsi), he writes(p. 103)
The examples elaborated in the foregoing sections of numerical discrepancies
constitute only one aspect of the deep-seated difference in point of view
On the other hand, Neyman(1976)stated that he is not aware of a conceptual
difference between a'test of a statistical hypothesis'and a 'test of significance'and
[that he] uses these terms interchangeably
Pearson(1974) took an intermediate position by acknowledging the existence of
differences but claiming that they were of little importance in practice. After referring
to inference as" the manner in which we bring the theory of probability into gear with
the way our mind works in reaching decisions and practical conclusions",, he contin-
ues: If, as undoubtedly seems the case, the same mechanism of this putting into gear
operation'does not work for everyone in identical ways, this does not seem to
matter’’
In the present paper, written just ten years after the death of the last protagonist, I
examine yet another possibility: that important differences do exist but that it may be
possible to formulate a unified theory that combines the best features of both
pproaches
Since both are concerned with the testing of hypotheses, it is convenient
here to ignore this terminological distinction and to use the term "hypothesis
testing regardless of whether the testing is carried out in a Fisherian or
Neyman-Pearsonian mode
For the sake of completeness it should be said that in addition to the Fisher and
Neyman-Pearson theories there exist still other philosophies of testing of which we
shall mention only two
There is bayesian hypothesis testing, which, on the basis of stronger assumptions
permits assigning probabilities to the various hypotheses being considered. All three
authors were very hostile to this formulation and were in fact motivated in their work
by a desire to rid hypothesis testing of the need to assume a prior distribution over the
available hypotheses.
Finally, in certain important situations tests can be obtained by an approach also
due to Fisher for which he used the term fiducial. Most comparisons of Fisher's work
on hypothesis testing with that of Neyman and Pearson(see for example Morrison and
Henkel (1970), Steger(1971), Spielman(1974, 1978), Carlson (1976), Barnett(1982))
do not include a discussion of the fiducial argument which most statisticians have
found difficult to follow. Although Fisher himself viewed fiducial considerations to be
a very important part of his statistical thinking, this topic can easily be split off from
other aspects of his work, and we shall here not consider either the fiducial or the
Bayesian approach any further.
It seems appropriate to conclude this introduction with two personal statements
(1)I was a student of Neyman's and later for many years his colleague. As a
result I am fairly familiar with his thinking. On the other hand, I have seriously stu
died Fishers work only in recent years and, perhaps partly for this reason, have found
his ideas much harder to understand. I shall therefore try to follow Fisher's advice to
a correspondent(Bennett, 1990, p 221)
If you must write about someone else's work it is, i feel sure worth taking even
more than a little trouble to avoid misrepresenting him. One safeguard is to use actual
quotations from his writing
(ii)Some of the Fisher-Neyman debate is concerned with issues studied in dept
by philosophers of science. (See for example Braithwaite(1953), Hacking (1965)
Kyburg (1974), and Seidenfeld( 1979).I am not a philosopher, and the present paper
is written from a statistical, not a philosophical, point of view.
Although the main substantive papers(NP 1928 and 1933a)were joint by
Neyman and Pearson, their collaboration stopped soon after Neyman left
Pearson's Department to set up his own program in Berkeley. After that, the
debate was carried on primarily by Fisher and Neyman
2. Testing Statistical Hypotheses
The modern theory of testing hypotheses began with Students discovery of the t-
distribution in 1908. This was followed by Fisher with a series of papers culminating
in his book Statistical Methods for Research Workers''(1925), in which he created a
new paradigm for hypothesis testing. He greatly extended the applicability of the t-test
(to the two-sample problem and the testing of regression coefficients), and generalized
it to the teting of hypotheses in the analysis of variance. He advocated 5% as the
standard level (with 1% as a more stringent alternative); and through applying this new
methodology to a variety of practical examples he established it as a highly popular
statistical approach for many fields of science
A question that Fisher did not raise was the origin of his test statistics: why these
rather than some others? This is the question that Neyman and Pearson considered
and which(after some preliminary work in NP(1928) they answered in NP (1933a)
Their solution involved not only the hypothesis but also a class of possible alterna-
tives, and the probabilities of two kinds of error: false rejection (Error I and false
acceptance(Error In). The"best'test was one that minimized PA(Error r) subject to
a bound on PH(Error I), the latter being the significance level of the test. They com-
pletely solved this problem for the case of testing a simple (i.e. single distribution
hypothesis against a simple alternative by means of the Neyman-Pearson Lemma. For
more complex situations the theory required additional concepts, and working out the
details of this NP-program was an important concern of mathematical statistics in the
following decades
The nP introduction to the two kinds of error contained a brief statement that was
to become the focus of much later debate. Without hoping to know whether each
separate hypothesis is true or false, the authors wrote, we may search for rules to
govern our behavior with regard to them, in following which we insure that, in the
long run of experience, we shall not be too often wrong. And in this and the follow
ing paragraph they refer to a test (i.e. a rule to reject or accept the hypothesis)asa
rule of behavior”
3. Inductive Inference vs inductive behavior
Fisher (1932)started a paper entitled " Inverse probability and the use of likeli
hood'with the statement"logicians have long distinguished two modes of human rea-
soning, under the respective names of deductive and inductive reasoning. In induc
tive reasoning we attempt to argue from the particular, which is typically a body of
observational material, to the general, which is typically a theory applicable to future
experence
5
He developed his ideas in more detail in a 1935 paper, The logic of inductive infer-
ence'' where he explains:
everyone who does habitually attempt the difficult task of making sense of
igures is, in fact, essaying a logical process of the kind we call inductive, in that he is
attempting to draw inferences from the particular to the general. Such inferences we
recognize to be uncertain inferences.. He continues in the next paragraph.
Although some uncertain inferences can be rigorously expressed in terms of
mathematical probability, it does not follow that mathematical probability is an ade
quate concept for the rigorous expression of uncertain inferences of every kind. The
inferences of the classical theory of probability are all deductive in character. They
are statements about the behaviour of individuals, or samples, or sequences of samples
drawn from populations which are fully known. More generally, however,a
mathematical quantity of a different kind, which I have termed mathematical likeli
hood, appears to take its place [i.e. the place of probability] as a measure of rational
belief when we are reasoning from the sample to the population
The paper was presented at a meeting of the royal Statistical Society and was not
well received. The last discussant was Neyman who began in a very complimentary
vein. He then suggested that some readers might react by thinking: What an interest-
ing problem is raised! How could I develop it further, but, he continues"I person-
ally seem to have another kind of psychology and cant help thinking: What an
interesting way of asking and answering questions, but can't i do it differently?
More specifically Neyman asks granted that the conception of likelihood is
independent of the classical theory of probability, isnt it possible to construct a theory
of mathematical statistics which would be based soley upon the theory of probability
( thus independent of the conception of likelihood)and be adequate from the point of
view of practical statistical work?
And later, still more directly: Now what could be considered as a sufficiently
simple and unquestionable principle in statistical work? I think the basic conception
here is the conception of frequency of errors in judgement. 'He points out that this
idea applies to both hypothesis testing and estimation and completes the paragraph
with the statement that"the complex of results in this direction may be considered as
a system of mathematical statistics alternative to that of Professor Fisher, and entirely
based on the classical theory of probability.
Of Fisher, LJ. Savage(1976)in his insightful overview of Fisher's great accom-
plishments"On rereading R A. Fisher"wrote: Fisher burned even more than the rest
of us, it seems to me, to be original, right, important, famous, and respected. "One can
then imagine Fisher's reaction to this attack on his cherished and ambitious attempt to
6
put scientific thinking on an entirely new basis.
Neyman's message was: We have no need for your inductive inference and its new
concept of likelihood. The problem can be solved in a very satisfactory manner using
only the classical theory of probability and deductive arguments, by minimizing the
probability of errors [i.e. of wrong conclusions
Both Neyman and Fisher considered the distinction between "inductive behavior
and inductive inference" to lie at the center of their disagreement. In fact, in writing
retrospectively about the dispute, Neyman(1961)said that" the subject of the dispute
may be symbolized by the opposing terms inductive reasoning"and inductive
behavior. That Fisher also assigned a central role to this distinction is indicated by
his statement in SMsI( p.7)that"there is something horrifying in the ideological
movement represented by the doctrine that reasoning, properly speaking, cannot be
applied to empirical data to lead to inferences valid in the real world.
Actually, the interpretation of acceptance or rejection of a hypothesis as behavior
or inference, as a decision or conclusion, is largely a matter of terminology which
diverts attention from the more central issue: whether only deductive arguments are
needed to reach the desired end, or whether there is a need also for induction. a con
cept which inspired Fisher while for Neyman it was imbued with an aura of suspect
mysticism. This issue had in fact a long history and a resolution (albeit in a deter-
ministic rather than a stochastic setting) in the description of the scientific method as
lypothetico-deductive". According to this view of science, induction is required in
deciding on the experiment to be performed, in the formulation of the model and the
hypothesis, while the testing of the model and the hypothesis can be carried out deduc
tively
Surprisingly, Fisher himself seemed to view the situation somewhat similarly, when
in his 1939 obituary of Student he wrote: Many mathematicians must possess the
penetration necessary to perceive, when confronted with concrete experimental results
that it must be possible to use them, by rigorously objective calculations, to throw light
on the plausibility or otherwise of the interpretations that suggest themselves. A few
must also possess the pertinacity needed to convert this intuition into such a completed
procedure as we know as a test of significance. It is, I believe nothing but an illusion
to think that this process can ever be reduced to a self-contained mathematical theory
of tests of significance. Constructive imagination, together with much knowledge
based on experience of data of the same kind, must be exercised before deciding on
what hypotheses are worth testing, and in what respects. Only when this fundamental
thinking has been accomplished can the problem be given a mathematical form
4. Errors of the second kind
Fisher did not respond immediately to the attack Neyman had mounted in his dis
cussion of Fishers paper. However, in a note in Nature(1935b) which was ostensibl
a reply to an only tangentially related statement by Karl Pearson, be lashed out at
Neyman and E.s. Pearson without however mentioning their names. Karl Pearson, in
a letter to Nature had complained that his x -test of goodness of fit was not a rule for
deciding whether or not to reject a possibly correct hypothesis, but rather an attempt to
see whether a distribution, although not expected to be exactly correct, would provide
an adequate fit. After a relatively mild rejection of this position, Fisher adds in a last
paragraph: For the logical fallacy of believing that a hypothesis has been proved to
be true, merely because it is not contradicted by the available facts, has no more right
to insinuate itself in statistical than in other kinds of scientific reasoning. Yet it does
so only too frequently. Indeed, the"error of accepting an hypothesis when it is false
has been specially named by some writers" errors of the second kind. It would
therefore, add greatly to the clarity with which the tests of significance are regarded
if
it were generally understood that tests of significance, when used accurately, are capa
ble of rejecting or invalidating hypotheses, in so far as these are contradicted by the
data; but that they are never capable of establishing them as certainly true. In fact that
errors of the second kind'"'are committed only by those who misunderstand the
nature and application of tests of significance
After this outburst, the dispute appeared to die down. Undoubtedly it helped that
in 1938 Neyman, the more combative of Fishers opponents, left London for Berkeley,
thereby removing the irritation of offices in the same building and frequent encounters
at meetings of the Royal Statistical Society. Then, twenty years after the Nature ar
cle, Fisher(1955)published a paper devoted entirely to an attack on the point of view
expressed in numerous papers by neyman, Pearson, Wald and bartlett
The first introductory sections suggest two reasons for Fisher,s writing such a
paper at that time. He begins by describing the progress that had been made during
the present century" in the business of interpreting observational data, so as to obtain
a better understanding of the real world. He mentions in particular" the use of better
mathematics and more comprehensive ideas in mathematical statistics,,the new
theory of experimental design; and"a more complete understanding.. of the struc
ture and peculiarities of inductive logic
Much that I have to say, Fisher continues, will not command universal assent
I know this for it is just because I find myself in disagreement with some of the modes
of exposition of this new subject which have from time to time been adopted, that I
have taken this opportunity of expressing a different point of view
8
What Fisher was referring to are developments that had occurred since the publica
tion of his early papers and the two books, ""Statistical Methods for Research wash-
ers2(1925)and"The Design of Experiments''(1935c). His methods had been enor
mously successful; his tests, the analysis of variance, the experimental designs had
become the staple of working statisticians. His books had reached a wide public. (By
1946, Statistical Methods had reach the 10th Edition) but -and this must have been
tremendously galling to him- his philosophical approach had not found acceptance.
On the one hand, his central concept of fiducial inference had found few adherents; on
the other, perhaps even more annoying, developments growing out of Neyman's philo-
sophy had been grafted onto his framework and were highly successful. There had
been considerable elaboration of the NP theory of optimal tests; more importantly, the
idea of power (1-P(Error m)) was generally accepted as a concept of interest in itself
and as the basis for sample size determination; and finally Neyman's philosophy of
inductive behavior had been formalized by Wald into a comprehensive theory of Sta
tistical Decision Functions
An additional stimulus for Fisher's paper appears to have been a suggestion by
George Barnard which Fisher acknowledges in a letter of Feb.9, 1954:(Bennett
(1990, p 9)"I find, looking up the old papers, that I can now understand, much better
sion that my own work on estimation had only the same end in view,, 8 0
than before the early work of Neyman, or Neyman and Pearson, in the light of what
you said the other afternoon, for it now seems clear to me, as it did not before, that
Neyman, thinking all the time of acceptance procedures, was under the misapprehend
Fisher accepts in the introduction to his 1955 paper that "there is no difference to
matter in the field of mathematical analysis [i.e. typically the different approaches lead
to essentially the same methods]. but, he says, there is a clear difference in logical
point of view. He then acknowledges his debt to Barnard and strikes a theme which
will be dominant in his discussion of these issues from now on i owe to professor
Barnard.. the penetrating observation that this difference in point of view originated
when Neyman, thinking that he was correcting and improving my own early work on
tests of significance, as a means to "the improvement of natural knowledge, In fact
reinterpreted them in terms of that technological and commercial apparatus which is
known as an acceptance procedue>,
With this remark, Fisher cedes to Neyman's idea of inductive behavior the lower
spheres of technology and commerce, while reserving his own deeper, more difficult,
and hence less understood and accepted idea of inductive inference for scientific work.
One must admit that the NP terms behavior, error, acceptance, and rejection, and their
extension by Wald to decision and loss function, encourage such an interpretation
9
More specifically, Fisher's attack in the paper under discussion concentrated on
three targets: repeated sampling from the same population; errors of the second kind
and inductive behavior,. Neyman replied in the following year with a Note on an
article by Sir Ronald Fisher". The year 1956 also saw the publication of Fisher's last
book (SMSi), which sets out once more in full his own position and his criticism of
the opposing view, and the next year Neyman followed with a paper, Inductive
behavior'as a basic concept of philosophy of science". The exchange ended with a
last furry: a paper by Fisher(1960)entitled"Scientific thought and the refinement of
human reason"and Neyman's reply the following year: " Silver Jubilee of my dispute
with fisher”
It is tempting to quote some of the interesting and colorful statements that can be
found in these publications, but in fact not much new ground was covered. At the end
of his life Fisher continued to feel strongly that the ideas conveyed by the terms rules
of behavior and its long-run consequences, particularly errors of the second kind, had
no place in scientific inference
5. Conditional inference
While Fisher's approach to testing included no consideration of power, the NP
pproach failed to pay attention to an important concen raised by Fisher. In
order to
discuss this issue we must begin by considering briefly the different meanings Fisher
and Neyman attach to probability.
For Neyman, the idea of probability is fairly straightforward: It represents an ideal
ization of long-run frequency in a long sequence of repetitions under constant condi-
tions. (See for example Neyman( 1952, p. 27)and Neyman(1957, p.9). Later (Ney
man(1977), he points out that by the law of large numbers this idea permits an exten
sion: that if a sequence of independent events is observed, each with probability p of
success, then the long-run success frequency will be approximately p even if the
events are not identical. This property greatly adds to the appeal and applicability of a
frequentist probability. In particular it is the way in which Neyman came to interpret
the value of a significance level
On the other hand, the meaning of probability is a problem with which Fisher
grappled throughout his life and, not surprisingly, his views too underwent some
changes. The concept at which he eventually arrived is much broader than Neyman's
In a statement of probability,, he says on p. 113 of sMsi, "the predicand, which
may be conceived as an object, as an event, or as a proposition, is asserted to be one
of a set of a number, however large, of like entities of which a known proportion, P,
have some relevant characteristic, not possessed by the remainder. It is further
asserted that no subset of the entire set, having a different proportion, can be
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.