The Fisher Neyman-Pearson Theories详细介绍Fisher 理论,

文件名称: The Fisher Neyman-Pearson Theories

所属分类: 教育

开发工具:

文件大小: 2mb

下载次数: 0

上传时间: 2019-08-23

提供者: hit****

下载 (2mb)

不能下载？报告错误

详细说明：详细介绍Fisher 理论, Neyman Pearson理论是不是一回事，它们的区别是什么2 1. Introduction。 The formulation and philosophy of hypothesis testing as we know it today was largely created by three men: R.A. Fisher (1890-1962), J. Neyman (1894-1981), and E.S. Pearson(1895-1980)in the period 1915-1933. Since then it has expanded into one of the most widely used quantitative methodologies, and has found its way into nearly all areas of human endeavor. It is a fairly commonly held view that the theories due to Fisher on the one hand and to neyman and pearson on the other, are quite distinct. This is reflected in the fact that separate terms are often used(although somewhat inconsistently) to designate the two approaches: Significance testing for Fishers and Hypothesis testing for that of Neyman and Pearson. But are they really that different? It is interesting to see what Fisher, Neyman, and Pearson themselves have to say about this question. Fisher frequently attacked the Neyman-Pearson (NP)approach as completely inappropriate to the testing of scientific hypotheses (although perhaps suit- able in the context of acceptance sampling). In his last book Statistical Methods and Scientific Inference"(3rd ed, published posthumously in 1973, to which we shall refer as SMsi), he writes(p. 103) The examples elaborated in the foregoing sections of numerical discrepancies constitute only one aspect of the deep-seated difference in point of view On the other hand, Neyman(1976)stated that he is not aware of a conceptual difference between a'test of a statistical hypothesis'and a 'test of significance'and [that he] uses these terms interchangeably Pearson(1974) took an intermediate position by acknowledging the existence of differences but claiming that they were of little importance in practice. After referring to inference as" the manner in which we bring the theory of probability into gear with the way our mind works in reaching decisions and practical conclusions",, he contin- ues: If, as undoubtedly seems the case, the same mechanism of this putting into gear operation'does not work for everyone in identical ways, this does not seem to matter’’ In the present paper, written just ten years after the death of the last protagonist, I examine yet another possibility: that important differences do exist but that it may be possible to formulate a unified theory that combines the best features of both pproaches Since both are concerned with the testing of hypotheses, it is convenient here to ignore this terminological distinction and to use the term "hypothesis testing regardless of whether the testing is carried out in a Fisherian or Neyman-Pearsonian mode For the sake of completeness it should be said that in addition to the Fisher and Neyman-Pearson theories there exist still other philosophies of testing of which we shall mention only two There is bayesian hypothesis testing, which, on the basis of stronger assumptions permits assigning probabilities to the various hypotheses being considered. All three authors were very hostile to this formulation and were in fact motivated in their work by a desire to rid hypothesis testing of the need to assume a prior distribution over the available hypotheses. Finally, in certain important situations tests can be obtained by an approach also due to Fisher for which he used the term fiducial. Most comparisons of Fisher's work on hypothesis testing with that of Neyman and Pearson(see for example Morrison and Henkel (1970), Steger(1971), Spielman(1974, 1978), Carlson (1976), Barnett(1982)) do not include a discussion of the fiducial argument which most statisticians have found difficult to follow. Although Fisher himself viewed fiducial considerations to be a very important part of his statistical thinking, this topic can easily be split off from other aspects of his work, and we shall here not consider either the fiducial or the Bayesian approach any further. It seems appropriate to conclude this introduction with two personal statements (1)I was a student of Neyman's and later for many years his colleague. As a result I am fairly familiar with his thinking. On the other hand, I have seriously stu died Fishers work only in recent years and, perhaps partly for this reason, have found his ideas much harder to understand. I shall therefore try to follow Fisher's advice to a correspondent(Bennett, 1990, p 221) If you must write about someone else's work it is, i feel sure worth taking even more than a little trouble to avoid misrepresenting him. One safeguard is to use actual quotations from his writing (ii)Some of the Fisher-Neyman debate is concerned with issues studied in dept by philosophers of science. (See for example Braithwaite(1953), Hacking (1965) Kyburg (1974), and Seidenfeld( 1979).I am not a philosopher, and the present paper is written from a statistical, not a philosophical, point of view. Although the main substantive papers(NP 1928 and 1933a)were joint by Neyman and Pearson, their collaboration stopped soon after Neyman left Pearson's Department to set up his own program in Berkeley. After that, the debate was carried on primarily by Fisher and Neyman 2. Testing Statistical Hypotheses The modern theory of testing hypotheses began with Students discovery of the t- distribution in 1908. This was followed by Fisher with a series of papers culminating in his book Statistical Methods for Research Workers''(1925), in which he created a new paradigm for hypothesis testing. He greatly extended the applicability of the t-test (to the two-sample problem and the testing of regression coefficients), and generalized it to the teting of hypotheses in the analysis of variance. He advocated 5% as the standard level (with 1% as a more stringent alternative); and through applying this new methodology to a variety of practical examples he established it as a highly popular statistical approach for many fields of science A question that Fisher did not raise was the origin of his test statistics: why these rather than some others? This is the question that Neyman and Pearson considered and which(after some preliminary work in NP(1928) they answered in NP (1933a) Their solution involved not only the hypothesis but also a class of possible alterna- tives, and the probabilities of two kinds of error: false rejection (Error I and false acceptance(Error In). The"best'test was one that minimized PA(Error r) subject to a bound on PH(Error I), the latter being the significance level of the test. They com- pletely solved this problem for the case of testing a simple (i.e. single distribution hypothesis against a simple alternative by means of the Neyman-Pearson Lemma. For more complex situations the theory required additional concepts, and working out the details of this NP-program was an important concern of mathematical statistics in the following decades The nP introduction to the two kinds of error contained a brief statement that was to become the focus of much later debate. Without hoping to know whether each separate hypothesis is true or false, the authors wrote, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong. And in this and the follow ing paragraph they refer to a test (i.e. a rule to reject or accept the hypothesis)asa rule of behavior” 3. Inductive Inference vs inductive behavior Fisher (1932)started a paper entitled " Inverse probability and the use of likeli hood'with the statement"logicians have long distinguished two modes of human rea- soning, under the respective names of deductive and inductive reasoning. In induc tive reasoning we attempt to argue from the particular, which is typically a body of observational material, to the general, which is typically a theory applicable to future experence 5 He developed his ideas in more detail in a 1935 paper, The logic of inductive infer- ence'' where he explains: everyone who does habitually attempt the difficult task of making sense of igures is, in fact, essaying a logical process of the kind we call inductive, in that he is attempting to draw inferences from the particular to the general. Such inferences we recognize to be uncertain inferences.. He continues in the next paragraph. Although some uncertain inferences can be rigorously expressed in terms of mathematical probability, it does not follow that mathematical probability is an ade quate concept for the rigorous expression of uncertain inferences of every kind. The inferences of the classical theory of probability are all deductive in character. They are statements about the behaviour of individuals, or samples, or sequences of samples drawn from populations which are fully known. More generally, however,a mathematical quantity of a different kind, which I have termed mathematical likeli hood, appears to take its place [i.e. the place of probability] as a measure of rational belief when we are reasoning from the sample to the population The paper was presented at a meeting of the royal Statistical Society and was not well received. The last discussant was Neyman who began in a very complimentary vein. He then suggested that some readers might react by thinking: What an interest- ing problem is raised! How could I develop it further, but, he continues"I person- ally seem to have another kind of psychology and cant help thinking: What an interesting way of asking and answering questions, but can't i do it differently? More specifically Neyman asks granted that the conception of likelihood is independent of the classical theory of probability, isnt it possible to construct a theory of mathematical statistics which would be based soley upon the theory of probability ( thus independent of the conception of likelihood)and be adequate from the point of view of practical statistical work? And later, still more directly: Now what could be considered as a sufficiently simple and unquestionable principle in statistical work? I think the basic conception here is the conception of frequency of errors in judgement. 'He points out that this idea applies to both hypothesis testing and estimation and completes the paragraph with the statement that"the complex of results in this direction may be considered as a system of mathematical statistics alternative to that of Professor Fisher, and entirely based on the classical theory of probability. Of Fisher, LJ. Savage(1976)in his insightful overview of Fisher's great accom- plishments"On rereading R A. Fisher"wrote: Fisher burned even more than the rest of us, it seems to me, to be original, right, important, famous, and respected. "One can then imagine Fisher's reaction to this attack on his cherished and ambitious attempt to 6 put scientific thinking on an entirely new basis. Neyman's message was: We have no need for your inductive inference and its new concept of likelihood. The problem can be solved in a very satisfactory manner using only the classical theory of probability and deductive arguments, by minimizing the probability of errors [i.e. of wrong conclusions Both Neyman and Fisher considered the distinction between "inductive behavior and inductive inference" to lie at the center of their disagreement. In fact, in writing retrospectively about the dispute, Neyman(1961)said that" the subject of the dispute may be symbolized by the opposing terms inductive reasoning"and inductive behavior. That Fisher also assigned a central role to this distinction is indicated by his statement in SMsI( p.7)that"there is something horrifying in the ideological movement represented by the doctrine that reasoning, properly speaking, cannot be applied to empirical data to lead to inferences valid in the real world. Actually, the interpretation of acceptance or rejection of a hypothesis as behavior or inference, as a decision or conclusion, is largely a matter of terminology which diverts attention from the more central issue: whether only deductive arguments are needed to reach the desired end, or whether there is a need also for induction. a con cept which inspired Fisher while for Neyman it was imbued with an aura of suspect mysticism. This issue had in fact a long history and a resolution (albeit in a deter- ministic rather than a stochastic setting) in the description of the scientific method as lypothetico-deductive". According to this view of science, induction is required in deciding on the experiment to be performed, in the formulation of the model and the hypothesis, while the testing of the model and the hypothesis can be carried out deduc tively Surprisingly, Fisher himself seemed to view the situation somewhat similarly, when in his 1939 obituary of Student he wrote: Many mathematicians must possess the penetration necessary to perceive, when confronted with concrete experimental results that it must be possible to use them, by rigorously objective calculations, to throw light on the plausibility or otherwise of the interpretations that suggest themselves. A few must also possess the pertinacity needed to convert this intuition into such a completed procedure as we know as a test of significance. It is, I believe nothing but an illusion to think that this process can ever be reduced to a self-contained mathematical theory of tests of significance. Constructive imagination, together with much knowledge based on experience of data of the same kind, must be exercised before deciding on what hypotheses are worth testing, and in what respects. Only when this fundamental thinking has been accomplished can the problem be given a mathematical form 4. Errors of the second kind Fisher did not respond immediately to the attack Neyman had mounted in his dis cussion of Fishers paper. However, in a note in Nature(1935b) which was ostensibl a reply to an only tangentially related statement by Karl Pearson, be lashed out at Neyman and E.s. Pearson without however mentioning their names. Karl Pearson, in a letter to Nature had complained that his x -test of goodness of fit was not a rule for deciding whether or not to reject a possibly correct hypothesis, but rather an attempt to see whether a distribution, although not expected to be exactly correct, would provide an adequate fit. After a relatively mild rejection of this position, Fisher adds in a last paragraph: For the logical fallacy of believing that a hypothesis has been proved to be true, merely because it is not contradicted by the available facts, has no more right to insinuate itself in statistical than in other kinds of scientific reasoning. Yet it does so only too frequently. Indeed, the"error of accepting an hypothesis when it is false has been specially named by some writers" errors of the second kind. It would therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that tests of significance, when used accurately, are capa ble of rejecting or invalidating hypotheses, in so far as these are contradicted by the data; but that they are never capable of establishing them as certainly true. In fact that errors of the second kind'"'are committed only by those who misunderstand the nature and application of tests of significance After this outburst, the dispute appeared to die down. Undoubtedly it helped that in 1938 Neyman, the more combative of Fishers opponents, left London for Berkeley, thereby removing the irritation of offices in the same building and frequent encounters at meetings of the Royal Statistical Society. Then, twenty years after the Nature ar cle, Fisher(1955)published a paper devoted entirely to an attack on the point of view expressed in numerous papers by neyman, Pearson, Wald and bartlett The first introductory sections suggest two reasons for Fisher,s writing such a paper at that time. He begins by describing the progress that had been made during the present century" in the business of interpreting observational data, so as to obtain a better understanding of the real world. He mentions in particular" the use of better mathematics and more comprehensive ideas in mathematical statistics,,the new theory of experimental design; and"a more complete understanding.. of the struc ture and peculiarities of inductive logic Much that I have to say, Fisher continues, will not command universal assent I know this for it is just because I find myself in disagreement with some of the modes of exposition of this new subject which have from time to time been adopted, that I have taken this opportunity of expressing a different point of view 8 What Fisher was referring to are developments that had occurred since the publica tion of his early papers and the two books, ""Statistical Methods for Research wash- ers2(1925)and"The Design of Experiments''(1935c). His methods had been enor mously successful; his tests, the analysis of variance, the experimental designs had become the staple of working statisticians. His books had reached a wide public. (By 1946, Statistical Methods had reach the 10th Edition) but -and this must have been tremendously galling to him- his philosophical approach had not found acceptance. On the one hand, his central concept of fiducial inference had found few adherents; on the other, perhaps even more annoying, developments growing out of Neyman's philo- sophy had been grafted onto his framework and were highly successful. There had been considerable elaboration of the NP theory of optimal tests; more importantly, the idea of power (1-P(Error m)) was generally accepted as a concept of interest in itself and as the basis for sample size determination; and finally Neyman's philosophy of inductive behavior had been formalized by Wald into a comprehensive theory of Sta tistical Decision Functions An additional stimulus for Fisher's paper appears to have been a suggestion by George Barnard which Fisher acknowledges in a letter of Feb.9, 1954:(Bennett (1990, p 9)"I find, looking up the old papers, that I can now understand, much better sion that my own work on estimation had only the same end in view,, 8 0 than before the early work of Neyman, or Neyman and Pearson, in the light of what you said the other afternoon, for it now seems clear to me, as it did not before, that Neyman, thinking all the time of acceptance procedures, was under the misapprehend Fisher accepts in the introduction to his 1955 paper that "there is no difference to matter in the field of mathematical analysis [i.e. typically the different approaches lead to essentially the same methods]. but, he says, there is a clear difference in logical point of view. He then acknowledges his debt to Barnard and strikes a theme which will be dominant in his discussion of these issues from now on i owe to professor Barnard.. the penetrating observation that this difference in point of view originated when Neyman, thinking that he was correcting and improving my own early work on tests of significance, as a means to "the improvement of natural knowledge, In fact reinterpreted them in terms of that technological and commercial apparatus which is known as an acceptance procedue>, With this remark, Fisher cedes to Neyman's idea of inductive behavior the lower spheres of technology and commerce, while reserving his own deeper, more difficult, and hence less understood and accepted idea of inductive inference for scientific work. One must admit that the NP terms behavior, error, acceptance, and rejection, and their extension by Wald to decision and loss function, encourage such an interpretation 9 More specifically, Fisher's attack in the paper under discussion concentrated on three targets: repeated sampling from the same population; errors of the second kind and inductive behavior,. Neyman replied in the following year with a Note on an article by Sir Ronald Fisher". The year 1956 also saw the publication of Fisher's last book (SMSi), which sets out once more in full his own position and his criticism of the opposing view, and the next year Neyman followed with a paper, Inductive behavior'as a basic concept of philosophy of science". The exchange ended with a last furry: a paper by Fisher(1960)entitled"Scientific thought and the refinement of human reason"and Neyman's reply the following year: " Silver Jubilee of my dispute with fisher” It is tempting to quote some of the interesting and colorful statements that can be found in these publications, but in fact not much new ground was covered. At the end of his life Fisher continued to feel strongly that the ideas conveyed by the terms rules of behavior and its long-run consequences, particularly errors of the second kind, had no place in scientific inference 5. Conditional inference While Fisher's approach to testing included no consideration of power, the NP pproach failed to pay attention to an important concen raised by Fisher. In order to discuss this issue we must begin by considering briefly the different meanings Fisher and Neyman attach to probability. For Neyman, the idea of probability is fairly straightforward: It represents an ideal ization of long-run frequency in a long sequence of repetitions under constant condi- tions. (See for example Neyman( 1952, p. 27)and Neyman(1957, p.9). Later (Ney man(1977), he points out that by the law of large numbers this idea permits an exten sion: that if a sequence of independent events is observed, each with probability p of success, then the long-run success frequency will be approximately p even if the events are not identical. This property greatly adds to the appeal and applicability of a frequentist probability. In particular it is the way in which Neyman came to interpret the value of a significance level On the other hand, the meaning of probability is a problem with which Fisher grappled throughout his life and, not surprisingly, his views too underwent some changes. The concept at which he eventually arrived is much broader than Neyman's In a statement of probability,, he says on p. 113 of sMsi, "the predicand, which may be conceived as an object, as an event, or as a proposition, is asserted to be one of a set of a number, however large, of like entities of which a known proportion, P, have some relevant characteristic, not possessed by the remainder. It is further asserted that no subset of the entire set, having a different proportion, can be

(系统自动生成,下载前可以参看下载内容)