04 November 2019

George E P Box - Collected Quotes

"Statistical criteria should (1) be sensitive to change in the specific factors tested, (2) be insensitive to changes, of a magnitude likely to occur in practice, in extraneous factors." (George E P Box, 1955)

"The method of least squares is used in the analysis of data from planned experiments and also in the analysis of data from unplanned happenings. The word 'regression' is most often used to describe analysis of unplanned data. It is the tacit assumption that the requirements for the validity of least squares analysis are satisfied for unplanned data that produces a great deal of trouble." (George E P Box, "Use and Abuse of Regression", 1966)

"To find out what happens to a system when you interfere with it you have to interfere with it (not just passively observe it)." (George E P Box, "Use and Abuse of Regression", 1966)

"Because we can never be sure that a postulated model is entirely appropriate, we must proceed in such a manner that inadequacies can be taken account of and their implications considered as we go along. To do this we must regard statistical analysis, which is a step in the major iteration […] as itself an iteration. To be on firm ground we must do more than merely postulate a model; we must build and test a tentative model at each stage of the investigation."(George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"In moving from conjecture to experimental data, (D), experiments must be designed which make best use of the experimenter's current state of knowledge and which best illuminate his conjecture. In moving from data to modified conjecture, (A), data must be analyzed so as to accurately present information in a manner which is readily understood by the experimenter." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"The process [of statistical analysis] usually begins by the postulating of a model worthy to be tentatively entertained. The data analyst will have arrived at this tentative model in cooperation with the scientific investigator. They will choose it 'So that, in the light of the then available knowledge, it best takes account of relevant phenomena in the simplest way possible. it will usually contain unknown parameters. Given the data the analyst can now make statistical inferences about the parameters conditional on the correctness of this first tentative model. These inferences form part of the conditional analysis. If the model is correct, they provide all there is to know about the problem under study, given the data." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"A man in daily muddy contact with field experiments could not be expected to have much faith in any direct assumption of independently distributed normal errors." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Competent statisticians will be front line troops in our war for survival-but how do we get them? I think there is now a wide readiness to agree that what we want are neither mere theorem provers nor mere users of a cookbook. A proper balance of theory and practice is needed and, most important, statisticians must learn how to be good scientists; a talent which has to be acquired by experience and example." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world." (George E P Box, "Science and Statistics", Journal of the American Statistical Association, Vol. 71, 1976)

"Major advances in science and in the science of statistics in particular, usually occur, therefore, as the result of the theory-practice iteration. [...] For the theory-practice iteration to work, the scientist must be, as it were, mentally ambidextrous; fascinated equally on the one hand by possible meanings, theories, and tentative models to be induced from data and the practical reality of the real world, and on the other with the factual implications deducible from tentative theories, models and hypotheses." (George E P Box, "Science and Statistics", Journal of the American Statistical Association, Vol. 71, 1976)

"One important idea is that science is a means whereby learning is achieved, not by mere theoretical speculation on the one hand, nor by the undirected accumulation of practical facts on the other, but rather by a motivated iteration between theory and practice.” (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

“Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.” (George E P Box, “Science and Statistics", Journal of the American Statistical Association 71, 1976)

"Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)

"I am continually surprised that statisticians, even good ones, still seem to ignore this iterative aspect of investigation and talk as if the movement from an initial (perhaps ill-posed) question, to design, to data collection, to analysis of the data, to 'the answer' were a one-shot affair. The wise investigator expends his effort not in one grand design (necessarily conceived at a time when he knows least about unfolding reality), but in a series of smaller designs, analyzing, modifying, and getting new ideas as he goes. This iterative aspect of research has a profound influence on almost everything the investigator and the statistician do, and it has been the source of much misunderstanding. Just as the rules that govern mathematical iteration are very different from those that govern solutions in closed form, so the rules that ought to apply to the statistics of most real scientific investigations are different, broader, and vaguer than those that might apply to a single decision or to a single test of hypothesis." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"It is widely recognized that the advancement of learning does not proceed by conjecture alone, nor by observation alone, but by an iteration involving both. Certainly, scientific investigation proceeds by such iteration. Examination of empirical data inspires a tentative explanation which, when further exposed to reality, may lead to its modification. This modified explanation is again put in jeopardy by further exposure to reality, and so on, in a continued alternation between induction and deduction." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Models, of course, are never true, but fortunately it is only necessary that they be useful. For this it is usually needful only that they not be grossly wrong. I think rather simple modifications of our present models will prove adequate to take account of most realities of the outside world. The difficulties of computation which would have been a barrier in the past need not deter us now." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Please can Data Analysts get themselves together again and become whole Statisticians before it is too late? Before they, their employers, and their clients forget the other equally important parts of the job statisticians should be doing, such as designing investigations and building models?" (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"A mechanistic model has the following advantages: 1. It contributes to our scientific understanding of the phenomenon under study. 2. It usually provides a better basis for extrapolation (at least to conditions worthy of further experimental investigation if not through the entire range of all input variables). 3. It tends to be parsimonious (i. e, frugal) in the use of parameters and to provide better estimates of the response." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)

"Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)

“The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful.” (George E P Box, 1987)

"A first analysis of experimental results should, I believe, invariably be conducted using flexible data analytical techniques - looking at graphs and simple statistics - that so far as possible allow the data to 'speak for themselves'. The unexpected phenomena that such a approach often uncovers can be of the greatest importance in shaping and sometimes redirecting the course of an ongoing investigation." (George Box, "Signal to Noise Ratios, Performance Criteria, and Transformations", Technometrics 30, 1988)

"Statistics is, or should be, about scientific investigation and how to do it better, but many statisticians believe it is a branch of mathematics." (George E P Box, Commentary, Technometrics 32, 1990)

"Acceleration of knowledge generation also emphasizes the need for lifelong education. The trained teacher, scientist or engineer can no longer regard what they have learned at the university as supplying their needs for the rest of their lives." (George E P Box, "Total Quality: Its Origins and its Future", 1995)

"Scientific method is concerned with efficient ways of generating knowledge." (George E P Box, "Total Quality: Its Origins and its Future", 1995)

"The central limit theorem says that, under conditions almost always satisfied in the real world of experimentation, the distribution of such a linear function of errors will tend to normality as the number of its components becomes large. The tendency to normality occurs almost regardless of the individual distributions of the component errors. An important proviso is that several sources of error must make important contributions to the overall error and that no particular source of error dominate the rest." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"All models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind." (George E P Box & Norman R Draper, "Response Surfaces, Mixtures, and Ridge Analyses", 2007)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

Misquoted: Andrew Lang's Using Statistics for Support rather than Illumination

The quote is from Andrew Lang's speech from 1910 (see [3]) referenced in several other places (see [4], [5], [6]) without specifying the...