"The method of least squares is used in the analysis of data from planned experiments and also in the analysis of data from unplanned happenings. The word 'regression' is most often used to describe analysis of unplanned data. It is the tacit assumption that the requirements for the validity of least squares analysis are satisfied for unplanned data that produces a great deal of trouble." (George E P Box, "Use and Abuse of Regression", 1966)
"To find out what happens to a system when you interfere with it you have to interfere with it (not just passively observe it)." (George E P Box, "Use and Abuse of Regression", 1966)
"Because we can never be sure that a postulated model is
entirely appropriate, we must proceed in such a manner that inadequacies can be
taken account of and their implications considered as we go along. To do this
we must regard statistical analysis, which is a step in the major iteration […]
as itself an iteration. To be on firm ground we must do more than merely
postulate a model; we must build and test a tentative model at each stage of
the investigation."
"In moving from conjecture to experimental data, (D),
experiments must be designed which make best use of the experimenter's current
state of knowledge and which best illuminate his conjecture. In moving from
data to modified conjecture, (A), data must be analyzed so as to accurately
present information in a manner which is readily understood by the
experimenter."
"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)
"The process [of statistical analysis] usually begins by the postulating of a model worthy to be tentatively entertained. The data analyst will have arrived at this tentative model in cooperation with the scientific investigator. They will choose it 'So that, in the light of the then available knowledge, it best takes account of relevant phenomena in the simplest way possible. it will usually contain unknown parameters. Given the data the analyst can now make statistical inferences about the parameters conditional on the correctness of this first tentative model. These inferences form part of the conditional analysis. If the model is correct, they provide all there is to know about the problem under study, given the data." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)
"Since small differences in probability cannot be appreciated by the human mind, there seems little point in being excessively precise about uncertainty." (George E P Box & G C Tiao, "Bayesian inference in statistical analysis", 1973)
"A man in daily muddy contact with field experiments could not be expected to have much faith in any direct assumption of independently distributed normal errors." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)"Competent statisticians will be front line troops in our war for survival-but how do we get them? I think there is now a wide readiness to agree that what we want are neither mere theorem provers nor mere users of a cookbook. A proper balance of theory and practice is needed and, most important, statisticians must learn how to be good scientists; a talent which has to be acquired by experience and example." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)
"In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world." (George E P Box, "Science and Statistics", Journal of the American Statistical Association, Vol. 71, 1976)
"Major advances in science and in the science of statistics in particular, usually occur, therefore, as the result of the theory-practice iteration. [...] For the theory-practice iteration to work, the scientist must be, as it were, mentally ambidextrous; fascinated equally on the one hand by possible meanings, theories, and tentative models to be induced from data and the practical reality of the real world, and on the other with the factual implications deducible from tentative theories, models and hypotheses." (George E P Box, "Science and Statistics", Journal of the American Statistical Association, Vol. 71, 1976)
“Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.” (George E P Box, “Science and Statistics", Journal of the American Statistical Association 71, 1976)
"Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad." (George E P Box, "Science and Statistics", Journal of the American Statistical Association 71, 1976)
"I am continually surprised that statisticians, even good ones,
still seem to ignore this iterative aspect of investigation and talk as if the
movement from an initial (perhaps ill-posed) question, to design, to data
collection, to analysis of the data, to 'the answer' were a one-shot affair.
The wise investigator expends his effort not in one grand design (necessarily
conceived at a time when he knows least about unfolding reality), but in a
series of smaller designs, analyzing, modifying, and getting new ideas as he
goes. This iterative aspect of research has a profound influence on almost
everything the investigator and the statistician do, and it has been the source
of much misunderstanding. Just as the rules that govern mathematical iteration
are very different from those that govern solutions in closed form, so the
rules that ought to apply to the statistics of most real scientific investigations
are different, broader, and vaguer than those that might apply to a single
decision or to a single test of hypothesis."
"It is widely recognized that the advancement of learning does not proceed by conjecture alone, nor by observation alone, but by an iteration involving both. Certainly, scientific investigation proceeds by such iteration. Examination of empirical data inspires a tentative explanation which, when further exposed to reality, may lead to its modification. This modified explanation is again put in jeopardy by further exposure to reality, and so on, in a continued alternation between induction and deduction." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)
"Models, of course, are never true, but fortunately it is only necessary that they be useful. For this it is usually needful only that they not be grossly wrong. I think rather simple modifications of our present models will prove adequate to take account of most realities of the outside world. The difficulties of computation which would have been a barrier in the past need not deter us now." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)
"Please can Data Analysts get themselves together again and
become whole Statisticians before it is too late? Before they, their employers,
and their clients forget the other equally important parts of the job
statisticians should be doing, such as designing investigations and building
models?"
"When the statistician looks at the outside world, he cannot,
for example, rely on finding errors that are independently and identically
distributed in approximately normal distributions. In particular, most economic
and business data are collected serially and can be expected, therefore, to be
heavily serially dependent. So is much of the data collected from the automatic
instruments which are becoming so common in laboratories these days. Analysis
of such data, using procedures such as standard regression analysis which
assume independence, can lead to gross error. Furthermore, the possibility of contamination
of the error distribution by outliers is always present and has recently
received much attention. More generally, real data sets, especially if they are
long, usually show inhomogeneity in the mean, the variance, or both, and it is
not always possible to randomize." (George E P Box, "Some Problems of Statistics
and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365),
1979)
"A mechanistic model has the following advantages: 1. It contributes to our scientific understanding of the phenomenon under study. 2. It usually provides a better basis for extrapolation (at least to conditions worthy of further experimental investigation if not through the entire range of all input variables). 3. It tends to be parsimonious (i. e, frugal) in the use of parameters and to provide better estimates of the response." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)
"Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)“The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful.” (George E P Box, 1987)
"Acceleration of knowledge generation also emphasizes the need for lifelong education. The trained teacher, scientist or engineer can no longer regard what they have learned at the university as supplying their needs for the rest of their lives." (George E P Box, "Total Quality: Its Origins and its Future", 1995)
"Scientific method is concerned with efficient ways of
generating knowledge."
"Statistics is, or should be, about scientific investigation and how to do it better, but many statisticians believe it is a branch of mathematics." (George Box, AmStat News 2000)
"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)
"All models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind." (George E P Box & Norman R Draper, "Response Surfaces, Mixtures, and Ridge Analyses", 2007)
No comments:
Post a Comment