Quotable Mathematics: statistics

Showing posts with label statistics. Show all posts

26 August 2025

Emile Cheysson - Collected Quotes

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondence or anomalies, to find the causes, to identify the laws." (Émile Cheysson, circa 1877)

"It is this combination of observation at the foundation and geometry at the summit that I wished to express by naming this method Geometric Statistics. It cannot be subject to the usual criticisms directed at the use of pure mathematics in economic matters, which are said to be too complex to be confined within a formula." (Emile Cheysson, "La Statistique géométrique", 1888)

"It then becomes a method of graphical interpolation or extrapolation, which involves hypothetically extending a curve within or beyond the range of known data points, assuming the continuity of its pattern. In this way, one can fill in gaps in past observations and even probe the depths of the future." (Emile Cheysson, "La Statistique géométrique", 1888)

"This method is what I call Geometric Statistics. But despite its somewhat forbidding name-which I’ll explain in a moment - it is not a mathematical abstraction or a mere intellectual curiosity accessible only to a select few. It is intended, if not for all merchants and industrialists, then at least for that elite who lead the masses behind them. Practice is both its starting point and its destination. It was inspired in me more than fifteen years ago by the demands of the profession, and if I’ve decided to present it today, it’s because I’ve since verified its advantages through various applications, both in private industry and in public service." (Emile Cheysson, "La Statistique géométrique", 1888)

"Graphical statistics thus possess a variety of resources that it deploys depending on the case, in order to find the most expressive and visually appealing way to depict the phenomenon. One must especially avoid trying to convey too much at once and becoming obscure by striving for completeness. Its main virtue - or one might say, its true reason for being - is clarity. If a diagram becomes so cluttered that it loses its clarity, then it is better to use the numerical table it was meant to translate." (Emile Cheysson, "Albume de statistique graphique", 1889)

"This method not only has the advantage of appealing to the senses as well as to the intellect, and of illustrating facts and laws to the eye that would be difficult to uncover in long numerical tables. It also has the privilege of escaping the obstacles that hinder the easy dissemination of scientific work - obstacles arising from the diversity of languages and systems of weights and measures among different nations. These obstacles are unknown to drawing. A diagram is not German, English, or Italian; everyone immediately grasps its relationships of scale, area, or color. Graphical statistics are thus a kind of universal language, allowing scholars from all countries to freely exchange their ideas and research, to the great benefit of science itself." (Emile Cheysson, "Albume de statistique graphique", 1889)

"Today, there is hardly any field of human activity that does not make use of graphical statistics. Indeed, it perfectly meets a dual need of our time: the demand for information that is both rapid and precise. Graphical methods fulfill these two conditions wonderfully. They allow us not only to grasp an entire series of phenomena at a glance, but also to highlight relationships or anomalies, identify causes, and extract underlying laws. They advantageously replace long tables of numbers, so that - without compromising the precision of statistics - they broaden and popularize its benefits." (Emile Cheysson, "Albume de statistique graphique", 1889)

"When a law is contained in figures, it is buried like metal in an ore; it is necessary to extract it. This is the work of graphical representation. It points out the coincidences, the relationships between phenomena, their anomalies, and we have seen what a powerful means of control it puts in the hands of the statistician to verify new data, discover and correct errors with which they have been stained." (Emile Cheysson, "Les methods de la statistique", 1890)

Sources: Bibliothéque Nationale de la France [>>]

25 August 2025

On Significance (2000-2009)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"If you flip a coin three times and it lands on heads each time, it's probably chance. If you flip it a hundred times and it lands on heads each time, you can be pretty sure the coin has heads on both sides. That's the concept behind statistical significance - it's the odds that the correlation (or other finding) is real, that it isn't just random chance." (T Colin Campbell, "The China Study", 2004)

"Many statistics texts do not mention this and students often ask, ‘What if you get a probability of exactly 0.05?’ Here the result would be considered not significant, since significance has been defined as a probability of less than 0.05 (<0.05). Some texts define a significant result as one where the probability is less than or equal to 0.05 ( 0.05). In practice this will make very little difference, but since Fisher proposed the ‘less than 0.05’ definition, which is also used by most scientific publications, it will be used here." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The dual meaning of the word significant brings into focus the distinction between drawing a mathematical inference and practical inference from statistical results." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"A common statistical error is to summarize comparisons by statistical significance and then draw a sharp distinction between significant and nonsignificant results. The approach of summarizing by statistical significance has a number of pitfalls, most of which are covered in standard statistics courses but one that we believe is less well known. We refer to the fact that changes in statistical significance are not themselves significant. A small change in a group mean, a regression coefficient, or any other statistical quantity can be neither statistically significant nor practically important, but such a change can lead to a large change in the significance level of that quantity relative to a null hypothesis." (Andrew Gelman & Hal Stern, "The Difference between 'Significant' and 'Not Significant' Is Not Itself Statistically Significant", The American Statistician Vol. 60 (4), 2006

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"For the study of the topology of the interactions of a complex system it is of central importance to have proper random null models of networks, i.e., models of how a graph arises from a random process. Such models are needed for comparison with real world data. When analyzing the structure of real world networks, the null hypothesis shall always be that the link structure is due to chance alone. This null hypothesis may only be rejected if the link structure found differs significantly from an expectation value obtained from a random model. Any deviation from the random null model must be explained by non-random processes." (Jörg Reichardt, "Structure in Complex Networks", 2009)

On Significance (1950-1974)

"In the examples we have given [...] our judgment whether P was small enough to justify us in suspecting a significant difference [...] has been more or less intuitive. Most people would agree [...] that a probability of .0001 is so small that the evidence is very much in favour. . . . Suppose we had obtained P = 0.1. [...] Where, if anywhere, can we draw the line? The odds against the observed event which influence a decision one way or the other depend to some extent on the caution of the investigator. Some people (not necessarily statisticians) would regard odds of ten to one as sufficient. Others would be more conservative and reserve judgment until the odds were much greater. It is a matter of personal taste." (G U Yule & M G Kendall, "An introduction to the theoryof statistics" 14th ed., 1950)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things: (a) a test of significance, to be used on the data first of all, and (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney, "Facts from Figures", 1951)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric Statistics", Journal of the American Statistical Association 52, 1957)

"[...] to make measurements and then ignore their magnitude would ordinarily be pointless. Exclusive reliance on tests of significance obscures the fact that statistical significance does not imply substantive significance." (I Richard Savage, "Nonparametric Statistics", Journal of the American Statistical Association 52, 1957)

"[...] the tests of null hypotheses of zero differences, of no relationships, are frequently weak, perhaps trivial statements of the researcher’s aims [...] in many cases, instead of the tests of significance it would be more to the point to measure the magnitudes of the relationships, attaching proper statements of their sampling variation. The magnitudes of relationships cannot be measured in terms of levels of significance." (Leslie Kish, "Some statistical problems in research design", American Sociological Review 24, 1959)

"There are instances of research results presented in terms of probability values of ‘statistical significance’ alone, without noting the magnitude and importance of the relationships found. These attempts to use the probability levels of significance tests as measures of the strengths of relationships are very common and very mistaken." (Leslie Kish, "Some statistical problems in research design", American Sociological Review 24, 1959)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"[...] the test of significance has been carrying too much of the burden of scientific inference. It may well be the case that wise and ingenious investigators can find their way to reasonable conclusions from data because and in spite of their procedures. Too often, however, even wise and ingenious investigators [...] tend to credit the test of significance with properties it does not have." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"[...] we need to get on with the business of generating [...] hypotheses and proceed to do investigations and make inferences which bear on them, instead of [...] testing the statistical null hypothesis in any number of contexts in which we have every reason to suppose that it is false in the first place." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"Significance levels are usually computed and reported, but power and confidence limits are not. Perhaps they should be." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"The emphasis on significance levels tends to obscure a fundamental distinction between the size of an effect and its statistical significance." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"[...] too many users of the analysis of variance seem to regard the reaching of a mediocre level of significance as more important than any descriptive specification of the underlying averages. Our thesis is that people have strong intuitions about random sampling; that these intuitions are wrong in fundamental respects; that these intuitions are shared by naive subjects and by trained scientists; and that they are applied with unfortunate consequences in the course of scientific inquiry. We submit that people view a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. Consequently, they expect any two samples drawn from a particular population to be more similar to one another and to the population than sampling theory predicts, at least for small samples." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

24 August 2025

On Numbers: .05 level

"The results of t shows that P is between .02 and .05. The result must be judged significant, though barely so [...] we find... t=1.844 [with 13 df, P = 0.088]. The difference between the regression coefficients, though relatively large, cannot be regarded as significant." (Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. [...] If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. Belief in the hypothesis as an accurate representation of the population sampled is confronted by the logical disjunction: Either the hypothesis is untrue, or the value of χ2 has attained by chance an exceptionally high value. The actual value of P obtainable from the table by interpolation indicates the strength of the evidence against the hypothesis. A value of χ2 exceeding the 5 per cent. point is seldom to be disregarded." (Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who 'rejects' a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance." (Ronald A Fisher, "Statistical Methods and Scientific Inference", 1956)

"[...] blind adherence to the .05 level denies any consideration of alternative strategies, and it is a serious impediment to the interpretation of data." (James K Skipper Jr. et al, "The sacredness of .05: A note concerning the uses of statistical levels of significance in social science", The American Sociologist 2, 1967)

"The current obsession with .05 [...] has the consequence of differentiating significant research findings and those best forgotten, published studies from unpublished ones, and renewal of grants from termination. It would not be difficult to document the joy experienced by a social scientist when his F ratio or t value yields significance at .05, nor his horror when the table reads 'only' .10 or .06. One comes to internalize the difference between .05 and .06 as 'right' vs. 'wrong', 'creditable' vs. 'embarrassing', 'success' vs. 'failure'." (James K Skipper Jr. et al, "The sacredness of .05: A note concerning the uses of statistical levels of significance in social science", The American Sociologist 2, 1967)

"Rejection of a true null hypothesis at the 0.05 level will occur only one in 20 times. The overwhelming majority of these false rejections will be based on test statistics close to the borderline value. If the null hypothesis is false, the inter-ocular traumatic test ['hit between the eyes'] will often suffice to reject it; calculation will serve only to verify clear intuition." (Ward Edwards et al,Bayesian Statistical Inference for Psychological Research", 1992)

"[...] they [confidence limits] are rarely to be found in the literature. I suspect that the main reason they are not reported is that they are so embarrassingly large!" (Jacob Cohen,The earth is round" (p<.05)", American Psychologist 49, 1994)

"After four decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred .05 criterion - still persist. This article reviews the problems with this practice [...] 'What’s wrong with [null hypothesis significance testing]? Well, among many other things, it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!" (Jacob Cohen,The earth is round" (p<.05)", American Psychologist 49, 1994)

"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection" (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman,The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)

"There is a growing realization that reported 'statistically significant' claims in statistical publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p" (for 'probability') is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p- value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear." (Andrew Gelman & Eric Loken,The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

15 August 2025

On Estimates (2010-2019)

"A second approach to statistical inference is estimation, which focuses on finding the best point estimate of the population parameter that’s of greatest interest; it also gives an interval estimate of that parameter, to signal how close our point estimate is likely to be to the population value." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Meta-analysis is a set of techniques for the quantitative analysis of results from two or more studies on the same or similar issues. […] Meta-analytic thinking is estimation thinking that considers any result in the context of past and potential future results on the same question. It focuses on the cumulation of evidence over studies." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Meta-analytic thinking is the consideration of any result in relation to previous results on the same or similar questions, and awareness that combination with future results is likely to be valuable. Meta-analytic thinking is the application of estimation thinking to more than a single study. It prompts us to seek meta-analysis of previous related studies at the planning stage of research, then to report our results in a way that makes it easy to include them in future meta-analyses. Meta-analytic thinking is a type of estimation thinking, because it, too, focuses on estimates and uncertainty." (Geoff Cumming, "Understanding the New Statistics", 2012)

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"An estimate (the mathematical definition) is a number derived from observed values that is as close as we can get to the true parameter value. Useful estimators are those that are 'better' in some sense than any others." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Estimators are functions of the observed values that can be used to estimate specific parameters. Good estimators are those that are consistent and have minimum variance. These properties are guaranteed if the estimator maximizes the likelihood of the observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"One final warning about the use of statistical models (whether linear or otherwise): The estimated model describes the structure of the data that have been observed. It is unwise to extend this model very far beyond the observed data." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"One kind of probability - classic probability - is based on the idea of symmetry and equal likelihood […] In the classic case, we know the parameters of the system and thus can calculate the probabilities for the events each system will generate. […] A second kind of probability arises because in daily life we often want to know something about the likelihood of other events occurring […]. In this second case, we need to estimate the parameters of the system because we don’t know what those parameters are. […] A third kind of probability differs from these first two because it’s not obtained from an experiment or a replicable event - rather, it expresses an opinion or degree of belief about how likely a particular event is to occur. This is called subjective probability […]." (Daniel J Levitin, "Weaponized Lies", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"The margin of error is how accurate the results are, and the confidence interval is how confident you are that your estimate falls within the margin of error." (Daniel J Levitin, "Weaponized Lies", 2017)

On Estimates (2000-2009)

“[…] we underestimate the share of randomness in about everything […] The degree of resistance to randomness in one’s life is an abstract idea, part of its logic counterintuitive, and, to confuse matters, its realizations nonobservable.” (Nassim N Taleb, “Fooled by Randomness”, 2001)

"Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the 'intuitive linear' view of history rather than the 'historical exponential' view." (Ray Kurzweil, "The Singularity is Near", 2005)

"[myth:] Accuracy is more important than precision. For single best estimates, be it a mean value or a single data value, this question does not arise because in that case there is no difference between accuracy and precision. (Think of a single shot aimed at a target.) Generally, it is good practice to balance precision and accuracy. The actual requirements will differ from case to case." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"As uncertainties of scientific data values are nearly as important as the data values themselves, it is usually not acceptable that a best estimate is only accompanied by an estimated uncertainty. Therefore, only the size of nondominant uncertainties should be estimated. For estimating the size of a nondominant uncertainty we need to find its upper limit, i.e., we want to be as sure as possible that the uncertainty does not exceed a certain value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Due to the theory that underlies uncertainties an infinite number of data values would be necessary to determine the true value of any quantity. In reality the number of available data values will be relatively small and thus this requirement can never be fully met; all one can get is the best estimate of the true value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is the aim of all data analysis that a result is given in form of the best estimate of the true value. Only in simple cases is it possible to use the data value itself as result and thus as best estimate." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is the nature of an uncertainty that it is not known and can never be known, whether the best estimate is greater or less than the true value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"The methodology of feedback design is borrowed from cybernetics (control theory). It is based upon methods of controlled system model’s building, methods of system states and parameters estimation (identification), and methods of feedback synthesis. The models of controlled system used in cybernetics differ from conventional models of physics and mechanics in that they have explicitly specified inputs and outputs. Unlike conventional physics results, often formulated as conservation laws, the results of cybernetical physics are formulated in the form of transformation laws, establishing the possibilities and limits of changing properties of a physical system by means of control." (Alexander L Fradkov, "Cybernetical Physics: From Control of Chaos to Quantum Control", 2007)

On Estimates (1975-1999)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (W Edwards Deming, "On Probability as Basis for Action", American Statistician, Volume 29, Number 4, November 1975)

"Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977

"The central point is that statistical significance is quite different from scientific significance and that therefore estimation [...] of the magnitude of effects is in general essential regardless of whether statistically significant departure from the null hypothesis is achieved." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977)

"There are considerable dangers in overemphasizing the role of significance tests in the interpretation of data." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977)

"In physics it is usual to give alternative theoretical treatments of the same phenomenon. We construct different models for different purposes, with different equations to describe them. Which is the right model, which the 'true' set of equations? The question is a mistake. One model brings out some aspects of the phenomenon; a different model brings out others. Some equations give a rougher estimate for a quantity of interest, but are easier to solve. No single model serves all purposes best." (Nancy Cartwright, "How the Laws of Physics Lie", 1983)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"It has been widely felt, probably for thirty years and more, that significance tests are overemphasized and often misused and that more emphasis should be put on estimation and prediction. While such a shift of emphasis does seem to be occurring, for example in medical statistics, the continued very extensive use of significance tests is on the one hand alarming and on the other evidence that they are aimed, even if imperfectly, at some widely felt need." (David R Cox, "Some general aspects of the theory of statistics", International Statistical Review 54, 1986)

"A mechanistic model has the following advantages: 1. It contributes to our scientiﬁc understanding of the phenomenon under study. 2. It usually provides a better basis for extrapolation (at least to conditions worthy of further experimental investigation if not through the entire range of all input variables). 3. It tends to be parsimonious (i. e, frugal) in the use of parameters and to provide better estimates of the response." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)

"A tendency to drastically underestimate the frequency of coincidences is a prime characteristic of innumerates, who generally accord great significance to correspondences of all sorts while attributing too little significance to quite conclusive but less flashy statistical evidence." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"The zeta function is probably the most challenging and mysterious object of modern mathematics, in spite of its utter simplicity. [...] The main interest comes from trying to improve the Prime Number Theorem, i.e., getting better estimates for the distribution of the prime numbers. The secret to the success is assumed to lie in proving a conjecture which Riemann stated in 1859 without much fare, and whose proof has since then become the single most desirable achievement for a mathematician." (Martin C Gutzwiller, "Chaos in Classical and Quantum Mechanics", 1990)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993))

"A model for simulating dynamic system behavior requires formal policy descriptions to specify how individual decisions are to be made. Flows of information are continuously converted into decisions and actions. No plea about the inadequacy of our understanding of the decision-making processes can excuse us from estimating decision-making criteria. To omit a decision point is to deny its presence - a mistake of far greater magnitude than any errors in our best estimate of the process." (Jay W Forrester, "Policies, decisions and information sources for modeling", 1994)

"In constructing a model, we always attempt to maximize its usefulness. This aim is closely connected with the relationship among three key characteristics of every systems model: complexity, credibility, and uncertainty. This relationship is not as yet fully understood. We only know that uncertainty (predictive, prescriptive, etc.) has a pivotal role in any efforts to maximize the usefulness of systems models. Although usually (but not always) undesirable when considered alone, uncertainty becomes very valuable when considered in connection to the other characteristics of systems models: in general, allowing more uncertainty tends to reduce complexity and increase credibility of the resulting model. Our challenge in systems modelling is to develop methods by which an optimal level of allowable uncertainty can be estimated for each modelling problem." (George J Klir & Bo Yuan, "Fuzzy Sets and Fuzzy Logic: Theory and Applications", 1995)

"Delay time, the time between causes and their impacts, can highly influence systems. Yet the concept of delayed effect is often missed in our impatient society, and when it is recognized, it’s almost always underestimated. Such oversight and devaluation can lead to poor decision making as well as poor problem solving, for decisions often have consequences that don’t show up until years later. Fortunately, mind mapping, fishbone diagrams, and creativity/brainstorming tools can be quite useful here." (Stephen G Haines, "The Manager's Pocket Guide to Strategic and Business Planning", 1998)

“Accurate estimates depend at least as much upon the mental model used in forming the picture as upon the number of pieces of the puzzle that have been collected.” (Richards J. Heuer Jr, “Psychology of Intelligence Analysis”, 1999)

On Estimates (1950-1974)

"A good estimator will be unbiased and will converge more and more closely (in the long run) on the true value as the sample size increases. Such estimators are known as consistent. But consistency is not all we can ask of an estimator. In estimating the central tendency of a distribution, we are not confined to using the arithmetic mean; we might just as well use the median. Given a choice of possible estimators, all consistent in the sense just defined, we can see whether there is anything which recommends the choice of one rather than another. The thing which at once suggests itself is the sampling variance of the different estimators, since an estimator with a small sampling variance will be less likely to differ from the true value by a large amount than an estimator whose sampling variance is large." (Michael J Moroney, "Facts from Figures", 1951)

"The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1951)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"We realize that if someone just 'grabs a handful', the individuals in the handful almost always resemble one another (on the average) more than do the members of a simple random sample. Even if the 'grabs' [sampling] are randomly spread around so that every individual has an equal chance of entering the sample, there are difficulties. Since the individuals of grab samples resemble one another more than do individuals of random samples, it follows (by a simple mathematical argument) that the means of grab samples resemble one another less than the means of random samples of the same size. From a grab sample, therefore, we tend to underestimate the variability in the population, although we should have to overestimate it in order to obtain valid estimates of variability of grab sample means by substituting such an estimate into the formula for the variability of means of simple random samples. Thus using simple random sample formulas for grab sample means introduces a double bias, both parts of which lead to an unwarranted appearance of higher stability." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"We must know more about a plan than the probabilities of selection. We must know also the procedure by which to draw the sampling units, and the formula or procedure by which to calculate the estimate." (William E Deming, "Sample Design in Business Research", 1960)

"The most commonly occurring weakness in the application of Fisherian methods is, I think, undue emphasis on tests of significance, and failure to recognize that in many types of experimental work, estimates of the treatment effects, together with estimates of the error to which they are subject, are the quantities of primary interest." (Frances Yates, "Sir Ronald Fisher and the Design of Experiments", Biometrics Vol. 20, 1964)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"Never make a calculation until you know the answer: Make an estimate before every calculation, try a simple physical argument (symmetry! invariance! conservation!) before every derivation, guess the answer to every puzzle. Courage: no one else needs to know what the guess is. Therefore make it quickly, by instinct. A right guess reinforces this instinct. A wrong guess brings the refreshment of surprise. In either case, life as a spacetime expert, however long, is more fun!" (Edwin F Taylor & John A Wheeler, "Spacetime Physics", 1966)

On Estimates (-1949)

"If Nicolaus Copernicus, the distinguished and incomparable master, in this work had not been deprived of exquisite and faultless instruments, he would have left us this science far more well-established. For he, if anybody, was outstanding and had the most perfect understanding of the geometrical and arithmetical requisites for building up this discipline. Nor was he in any respect inferior to Ptolemy; on the contrary, he surpassed him greatly in certain fields, particularly as far as the device of fi t ness and compendious harmony in hypotheses is concerned. And his apparently absurd opinion that the Earth revolves does not obstruct this estimate, because a circular motion designed to go on uniformly about another point than the very center of the circle, as actually found in the Ptolemaic hypotheses of all the planets except that of the Sun, offends against the very basic principles of our discipline in a far more absurd and intolerable way than does the attributing to the Earth one motion or another which, being a natural motion, turns out to be imperceptible. There does not at all arise from this assumption so many unsuitable consequences as most people think." (Tycho Brahe, [letter to Christopher Rothman] 1587)

"The Author of nature has not given laws to the universe, which, like the institutions of men, carry in themselves the elements of their own destruction. He has not per mitted, in his works, any symptom of infancy or of old age, or any sign by which we may estimate either their future or their past duration. He may put an end, as he no doubt gave a beginning, to the present system, at some determinate period; but we may safely conclude, that this great catastrophe will not be brought about by any of the laws now existing, and that it is not indicated by anything which we perceive." (John Playfair, "Illustrations of the Huttonian Theory of the Earth", 1802)

"The scientific value of a theory of this kind, in which we make so many assumptions, and introduce so many adjustable constants, cannot be estimated merely by its numerical agreement with certain sets of experiments. If it has any value it is because it enables us to form a mental image of what takes place in a piece of iron during magnetization." (James C Maxwell, "Treatise on Electricity and Magnetism" Vol. II, 1873)

"It [probability] is the very guide of life, and hardly can we take a step or make a decision of any kind without correctly or incorrectly making an estimation of probabilities." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"A statistical estimate may be good or bad, accurate or the reverse; but in almost all cases it is likely to be more accurate than a casual observer’s impression, and the nature of things can only be disproved by statistical methods." (Arthur L Bowley, "Elements of Statistics", 1901)

"Great numbers are not counted correctly to a unit, they are estimated; and we might perhaps point to this as a division between arithmetic and statistics, that whereas arithmetic attains exactness, statistics deals with estimates, sometimes very accurate, and very often sufficiently so for their purpose, but never mathematically exact." (Arthur L Bowley, "Elements of Statistics", 1901)

“Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible.” (Sir Arthur L Bowley, “Elements of Statistics”, 1901)

"[...] no one knows better than the engineer the need of discrimination between the sure ground of known data and formal logic, on the one hand - as exemplified, say, by mathematical operations - and acts of judgment on the other; and no one has learned through wider experience than the engineer the need of applying his conclusions in the light of that component part which, of necessity, has been dependent on estimate and judgment." (William F Durand, Transactions of The American Society of Mechanical Engineers Vol.47, [address] 1925)

On Sampling (2000-2019)

"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems. [...] The best samples are those that come as close as possible to being random." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"There are two problems with sampling - one obvious, and the other more subtle. The obvious problem is sample size. Samples tend to be much smaller than their populations. [...] Obviously, it is possible to question results based on small samples. The smaller the sample, the less confidence we have that the sample accurately reflects the population. However, large samples aren't necessarily good samples. This leads to the second issue: the representativeness of a sample is actually far more important than sample size. A good sample accurately reflects (or 'represents') the population." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"First, if you already know that the population from which your sample has been taken is normally distributed (perhaps you have data for a variable that has been studied before), you can assume the distribution of sample means from this population will also be normally distributed. Second, the central limit theorem […] states that the distribution of the means of samples of about 25 or more taken from any population will be approximately normal, provided the population is not grossly non-normal (e.g. a population that is bimodal). Therefore, provided your sample size is sufficiently large you can usually do a parametric test. Finally, you can examine your sample. Although there are statistical tests for normality, many statisticians have cautioned that these tests often indicate the sample is significantly non normal even when a t-test will still give reliable results." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Unfortunately, the only way to estimate the appropriate minimum sample size needed in an experiment is to know, or have good estimates of, the effect size and standard deviation of the population(s). Often the only way to estimate these is to do a pilot experiment with a sample. For most tests there are formulae that use these (sample) statistics to give the appropriate sized sample for a desired power." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

"Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question)." (Greg Snow, "R-Help", 2014)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

Steve McKillup - Collected Quotes

"A correlation between two variables means they vary together. A positive correlation means that high values of one variable are associated with high values of the other, while a negative correlation means that high values of one variable are associated with low values of the other." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Accuracy is the closeness of a measured value to the true value. Precision is the ‘spread’ or variability of repeated measures of the same value." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Correlation is an exploratory technique used to examine whether the values of two variables are significantly related, meaning whether the values of both variables change together in a consistent way. (For example, an increase in one may be accompanied by a decrease in the other.) There is no expectation that the value of one variable can be predicted from the other, or that there is any causal relationship between them." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Designing a well-controlled, appropriately replicated and realistic experiment has been described by some researchers as an ‘art’. It is not, but there are often several different ways to test the same hypothesis, and hence several different experiments that could be done. Consequently, it is difficult to set a guide to designing experiments beyond an awareness of the general principles." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Even an apparently well-designed mensurative or manipulative experiment may still suffer from a lack of realism." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Graphs may reveal patterns in data sets that are not obvious from looking at lists or calculating descriptive statistics. Graphs can also provide an easily understood visual summary of a set of results." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Inaccurate and imprecise measurements or a poor or unrealistic sampling design can result in the generation of inappropriate hypotheses. Measurement errors or a poor experimental design can give a false or misleading outcome that may result in the incorrect retention or rejection of an hypothesis." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"It has often been said, ‘There is no such thing as a perfect experiment.’ One inherent problem is that, as a design gets better and better, the cost in time and equipment also increases, but the ability to actually do the experiment decreases. An absolutely perfect design may be impossible to carry out. Therefore, every researcher must choose a design that is ‘good enough’ but still practical. There are no rules for this – the decision on design is in the hands of the researcher, and will be eventually judged by their colleagues who examine any report from the work." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"It is important to realise that Type 1 error can only occur when the null hypothesis applies. There is absolutely no risk if the null hypothesis is false. Unfortunately, you are most unlikely to know if the null hypothesis applies or not - if you did know, you would not be doing an experiment to test it! If the null hypothesis applies, the risk of Type 1 error is the same as the probability level you have chosen" (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Linear correlation analysis assumes that the data are random representatives taken from the larger population of values for each variable, which are normally distributed and have been measured on a ratio, interval or ordinal scale. A scatter plot of these variables will have what is called a bivariate normal distribution. If the data are not normally distributed, or the relationship does not appear to be linear, they may be able to be analysed by nonparametric tests for correlation [...]" (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"No hypothesis or theory can ever be proven - one day there may be evidence that rejects it and leads to a different explanation (which can include all the successful predictions of the previous hypothesis).Consequently we can only falsify or disprove hypotheses and theories – we can never ever prove them." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"One of the nastiest pitfalls is appearing to have a replicated manipulative experimental design, which really is not replicated." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"One way of generating hypotheses is to collect data and look for patterns. Often, however, it is difficult to see any pattern from a set of data, which may just be a list of numbers. Graphs and descriptive statistics are very useful for summarising and displaying data in ways that may reveal patterns." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Parametric tests are designed for analyzing data from a known distribution, and the majority assume a normally distributed population. Although parametric tests are quite robust to departures from normality, and major ones can often be reduced by transformation, there are some cases where the population is so grossly non-normal that parametric testing is unwise. In these cases a powerful analysis can often still be done by using a non-parametric test." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Sample statistics like the mean, variance, standard deviation, and especially the standard error of the mean are estimates of population statistics that can be used to predict the range within which 95% of the means of a particular sample size will occur. Knowing this, you can use a parametric test to estimate the probability that a sample mean is the same as an expected value, or the probability that the means of two samples are from the same population." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Statistical tests are just a way of working out the probability of obtaining the observed, or an even more extreme, difference among samples (or between an observed and expected value) if a specific hypothesis (usually the null of no difference) is true. Once the probability is known, the experimenter can make a decision about the difference, using criteria that are uniformly used and understood." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The essential features of the ‘hypothetico-deductive’ view of scientific method are that a person observes or samples the natural world and uses all the information available to make an intuitive, logical guess, called an hypothesis, about how the system functions. The person has no way of knowing if their hypothesis is correct - it may or may not apply. Predictions made from the hypothesis are tested, either by further sampling or by doing experiments. If the results are consistent with the predictions then the hypothesis is retained. If they are not, it is rejected, and a new hypothesis formulated." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The unavoidable problem with using probability to help you make a decision is that there is always a chance of making a wrong decision and you have no way of telling when you have done this." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"The use of a t-test makes three assumptions. The first is that the data are normally distributed. The second is that each sample has been taken at random from its respective population and the third is that for an independent sample test, the variances are the same. It has, however, been shown that t-tests are actually very ‘robust’ – that is, they will still generate statistics that approximate the t distribution and give realistic probabilities even when the data show considerable departure from normality and when sample variances are dissimilar." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"When expected frequencies are small, the calculated chi-square statistic is inaccurate and tends to be too large, therefore indicating a lower than appropriate probability, which increases the risk of Type 1 error. It used to be recommended that no expected frequency in a goodness of fit test should be less than five, but this has been relaxed somewhat in the light of more recent research, and it is now recommended that no more than 20% of expected frequencies should be less than five." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Whenever you make a decision based on the probability of a result, there is a risk of either a Type 1 or a Type 2 error. There is only a risk of Type 1 error when the null hypothesis applies, and the risk is the chosen probability level. There is only a risk of Type 2 error when the null hypothesis is false." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

"Without compromising the risk of Type 1 error, the only way a researcher can reduce the risk of Type 2 error to an acceptable level and therefore ensure sufficient power is to increase their sample size. Every researcher has to ask themselves the question, ‘What sample size do I need to ensure the risk of Type 2 error is low and therefore power is high?’ This is an important question because samples are usually costly to take, so there is no point in increasing sample size past the point where power reaches an acceptable level." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)

09 August 2025

David R Cox - Collected Quotes

"Exact truth of a null hypothesis is very unlikely except in a genuine uniformity trial." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"Assumptions that we make, such as those concerning the form of the population sampled, are always untrue." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977)

"In any particular application, graphical or other informal analysis may show that consistency or inconsistency with H0 is so clear cut that explicit calculation of p is unnecessary." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977)

"At a simpler level, some elementary but important suggestions for the clarity of graphs are as follows: (i) the axes should be clearly labelled with the names of the variables and the units of measurement; (ii) scale breaks should be used for false origins; (iii) comparison of related diagrams should be made easy, for example by using identical scales of measurement and placing diagrams side by side; (iv) scales should be arranged so that systematic and approximately linear relations are plotted at roughly 45° to the x-axis; (v) legends should make diagrams as nearly self-explanatory, i.e. independent of the text, as is feasible; (vi) interpretation should not be prejudiced by the technique of presentation, for example by superimposing thick smooth curves on scatter diagrams of points faintly reproduced." (David R Cox,"Some Remarks on the Role in Statistics of Graphical Methods", Applied Statistics 27 (1), 1978)

"Most graphs used in the analysis of data consist of points arising in effect from distinct individuals, although there are certainly other possibilities, such as the use of lines dual to points. In many cases of exploratory analysis, however, the display of supplementary information attached to some or all of the points will be crucial for successful interpretation. The primary co-ordinate axes should, of course, be chosen to express the main dependence explicitly, if not initially certainly in the final presentation of conclusions." (David R Cox,"Some Remarks on the Role in Statistics of Graphical Methods", Applied Statistics 27 (1), 1978)

"So far as is feasible, diagrams should be planned so that (a) departures from "standard" conditions should be revealed as departures from linearity, or departures from totally random scatter, or as departures of contours from circular form; (b) different points should have approximately independent errors; (c) points should have approximately equal errors, preferably known and indicated, or, if equal errors cannot be achieved, major differences in the precision of individual points should be indicated, at least roughly; (d) individual points should have clearcut interpretation; (e) variables plotted should have clearcut physical interpretation; (f) any non-linear transformations applied should not accentuate uninteresting ranges; (g) any reasonable invariance should be exploited." (David R Cox,"Some Remarks on the Role in Statistics of Graphical Methods", Applied Statistics 27 (1), 1978)

"There are two general decisions to be made when displaying supplementary information, the first concerning the amount of such information and the second the precise technique to be used. The amount of supplementary information that it is sensible to show depends on the number of points. The possibility of showing such information only for relatively extreme points and possibly for a sample of the more central points should be considered when the number of points is large; thus in a probability plot of contrasts from a large factorial experiment it may be enough to label only the more extreme values." (David R Cox,"Some Remarks on the Role in Statistics of Graphical Methods", Applied Statistics 27 (1), 1978)

"It is very bad practice to summarise an important investigation solely by a value of P." (David R Cox, "Statistical significance tests", British Journal of Clinical Pharmacology 14, 1982)

"The criterion for publication should be the achievement of reasonable precision and not whether a significant effect has been found." (David R Cox, "Statistical significance tests", British Journal of Clinical Pharmacology 14, 1982)

"The continued very extensive use of significance tests is alarming." (David R Cox, "Some general aspects of the theory of statistics", International Statistical Review 54, 1986)

"Most real life statistical problems have one or more nonstandard features. There are no routine statistical questions; only questionable statistical routines." (David R Cox)

28 July 2025

Statistical Tools IV: Urns

"The early experts in probability theory were forever talking about drawing colored balls out of 'urns' . This was not because people are really interested in jars or boxes full of a mixed-up lot of colored balls, but because those urns full of balls could often be designed so that they served as useful and illuminating models of important real situations. In fact, the urns and balls are not themselves supposed real. They are fictitious and idealized urns and balls, so that the probability of drawing out any one ball is just the same as for any other." (Warren Weaver, "Lady Luck: The Theory of Probability". 1963)

"The urn model is to be the expression of three postulates: (1) the constancy of a probability distribution, ensured by the solidity of the vessel, (2) the random-character of the choice, ensured by the narrowness of the mouth, which is to prevent visibility of the contents and any consciously selective choice, (3) the independence of successive choices, whenever the drawn balls are put back into the urn. Of course in abstract probability and statistics the word 'choice' can be avoided and all can be done without any reference to such a model. But as soon as the abstract theory is to be applied, random choice plays an essential role." (Hans Freudenthal, "The Concept and the Role of the Model in Mathematics and Natural and Social Sciences", 1961)

"Specifically, it seems to me preferable to use, systematically: 'random' for that which is the object of the theory of probability […]; I will therefore say random process, not stochastic process. 'stochastic' for that which is valid 'in the sense of the calculus of probability': for instance; stochastic independence, stochastic convergence, stochastic integral; more generally, stochastic property, stochastic models, stochastic interpretation, stochastic laws; or also, stochastic matrix, stochastic distribution, etc. As for 'chance', it is perhaps better to reserve it for less technical use: in the familiar sense of'by chance', 'not for a known or imaginable reason', or (but in this case we should give notice of the fact) in the sense of, 'with equal probability' as in 'chance drawings from an urn', 'chance subdivision', and similar examples." (Bruno de Finetti, "Theory of Probability", 1974)

"Statisticians talk about populations. In probability books, the equivalent concept is an urn with numbered balls as a prototype for a population. In fact, when sampling from populations, it is customary to number the population and pretend the population is an urn from which we are drawing the sample." (Juana Sánchez, "Probability for Data Scientists", 2020)

"Many people mistakenly think that the defining property of a simple random sample is that every unit has an equal chance of being in the sample. However, this is not the case. A simple random sample of n units from a population of N means that every possible col‐lection of n of the N units has the same chance of being selected. A slight variant of this is the simple random sample with replacement, where the units/marbles are returned to the urn after each draw. This method also has the property that every sample of n units from a population of N is equally likely to be selected. The difference, though, is that there are more possible sets of n units because the same marble can appear more than once in the sample." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"Several key assumptions enter into this urn model, such as the assumption that the vaccine is ineffective. It’s important to keep track of the reliance on these assumptions because our simulation study gives us an approximation of the rarity of an outcome like the one observed only under these key assumptions." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

"The urn model is a simple abstraction that can be helpful for understanding variation.This model sets up a container (an urn, which is like a vase or a bucket) full of identical marbles that have been labeled, and we use the simple action of drawing marbles from the urn to reason about sampling schemes, randomized controlled experiments, and measurement error. For each of these types of variation, the urn model helps us estimate the size of the variation using either probability or simulation." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

05 July 2025

Michael J Moroney

"A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions." (Michael J Moroney, "Facts from Figures", 1951)

"Historically, Statistics is no more than State Arithmetic, a system of computation by which differences between individuals are eliminated by the taking of an average. It has been used - indeed, still is used - to enable rulers to know just how far they may safely go in picking the pockets of their subjects." (Michael J Moroney, "Facts from Figures", 1951)

"If you are young, then I say: Learn something about statistics as soon as you can. Don’t dismiss it through ignorance or because it calls for thought. [...] If you are older and already crowned with the laurels of success, see to it that those under your wing who look to you for advice are encouraged to look into this subject. In this way you will show that your arteries are not yet hardened, and you will be able to reap the benefits without doing overmuch work yourself. Whoever you are, if your work calls for the interpretation of data, you may be able to do without statistics, but you won’t do as well." (Michael J Moroney, "Facts from Figures", 1951)

"Statistics is not the easiest subject to teach, and there are those to whom anything savoring of mathematics is regarded as for ever anathema." (Michael J Moroney, "Facts from Figures", 1951)

"The statistician’s job is to draw general conclusions from fragmentary data. Too often the data supplied to him for analysis are not only fragmentary but positively incoherent, so that he can do next to nothing with them. Even the most kindly statistician swears heartily under his breath whenever this happens." (Michael J Moroney, "Facts from Figures", 1951)

"There is more than a germ of truth in the suggestion that, in all society where statisticians thrive, liberty and individuality are likely to be emasculated." (Michael J Moroney, "Facts from Figures", 1951)

30 March 2025

On Mistakes, Blunders and Errors II: Statistics and Probabilities

“It is a capital mistake to theorize before you have all the evidence. It biases the judgment.” (Sir Arthur C Doyle, “A Study in Scarlet”, 1887)

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Sir Arthur C Doyle, “The Adventures of Sherlock Holmes”, 1892)

"What real and permanent tendencies there are lie hid beneath the shifting superfices of chance, as it were a desert in which the inexperienced traveller mistakes the temporary agglomerations of drifting sand for the real configuration of the ground" (Francis Y Edgeworth, 1898)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"Poor statistics may be attributed to a number of causes. There are the mistakes which arise in the course of collecting the data, and there are those which occur when those data are being converted into manageable form for publication. Still later, mistakes arise because the conclusions drawn from the published data are wrong. The real trouble with errors which arise during the course of collecting the data is that they are the hardest to detect." (Alfred R Ilersic, "Statistics", 1959)

"The rounding of individual values comprising an aggregate can give rise to what are known as unbiased or biased errors. [...]The biased error arises because all the individual figures are reduced to the lower 1,000 [...] The unbiased error is so described since by rounding each item to the nearest 1,000 some of the approximations are greater and some smaller than the original figures. Given a large number of such approximations, the final total may therefore correspond very closely to the true or original total, since the approximations tend to offset each other. [...] With biased approximations, however, the errors are cumulative and their aggregate increases with the number of items in the series." (Alfred R Ilersic, "Statistics", 1959)

"While it is true to assert that much statistical work involves arithmetic and mathematics, it would be quite untrue to suggest that the main source of errors in statistics and their use is due to inaccurate calculations." (Alfred R Ilersic, "Statistics", 1959)

"No observations are absolutely trustworthy. In no field of observation can we entirely rule out the possibility that an observation is vitiated by a large measurement or execution error. If a reading is found to lie a very long way from its fellows in a series of replicate observations, there must be a suspicion that the deviation is caused by a blunder or gross error of some kind. [...] One sufficiently erroneous reading can wreck the whole of a statistical analysis, however many observations there are." (Francis J Anscombe, "Rejection of Outliers", Technometrics Vol. 2 (2), 1960)

"The most important and frequently stressed prescription for avoiding pitfalls in the use of economic statistics, is that one should find out before using any set of published statistics, how they have been collected, analysed and tabulated. This is especially important, as you know, when the statistics arise not from a special statistical enquiry, but are a by-product of law or administration. Only in this way can one be sure of discovering what exactly it is that the figures measure, avoid comparing the non-comparable, take account of changes in definition and coverage, and as a consequence not be misled into mistaken interpretations and analysis of the events which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"The problem of error has preoccupied philosophers since the earliest antiquity. According to the subtle remark made by a famous Greek philosopher, the man who makes a mistake is twice ignorant, for he does not know the correct answer, and he does not know that he does not know it." (Félix Borel, "Probability and Certainty", 1963)

"He who accepts statistics indiscriminately will often be duped unnecessarily. But he who distrusts statistics indiscriminately will often be ignorant unnecessarily. There is an accessible alternative between blind gullibility and blind distrust. It is possible to interpret statistics skillfully. The art of interpretation need not be monopolized by statisticians, though, of course, technical statistical knowledge helps. Many important ideas of technical statistics can be conveyed to the non-statistician without distortion or dilution. Statistical interpretation depends not only on statistical ideas but also on ordinary clear thinking. Clear thinking is not only indispensable in interpreting statistics but is often sufficient even in the absence of specific statistical knowledge. For the statistician not only death and taxes but also statistical fallacies are unavoidable. With skill, common sense, patience and above all objectivity, their frequency can be reduced and their effects minimised. But eternal vigilance is the price of freedom from serious statistical blunders." (W Allen Wallis & Harry V Roberts, "The Nature of Statistics", 1965)

"The calculus of probability can say absolutely nothing about reality [...] We have to stress this point because these attempts assume many forms and are always dangerous. In one sentence: to make a mistake of this kind leaves one inevitably faced with all sorts of fallacious arguments and contradictions whenever an attempt is made to state, on the basis of probabilistic considerations, that something must occur, or that its occurrence confirms or disproves some probabilistic assumptions." (Bruno de Finetti, "Theory of Probability", 1974)

"Mistakes arising from retrospective data analysis led to the idea of experimentation, and experience with experimentation led to the idea of controlled experiments and then to the proper design of experiments for efficiency and credibility. When someone is pushing a conclusion at you, it's a good idea to ask where it came from - was there an experiment, and if so, was it controlled and was it relevant?" (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"There are no mistakes. The events we bring upon ourselves, no matter how unpleasant, are necessary in order to learn what we need to learn; whatever steps we take, they’re necessary to reach the places we’ve chosen to go." (Richard Bach, "The Bridge across Forever", 1984)

"Correlation and causation are two quite different words, and the innumerate are more prone to mistake them than most." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"When you want to use some data to give the answer to a question, the first step is to formulate the question precisely by expressing it as a hypothesis. Then you consider the consequences of that hypothesis, and choose a suitable test to apply to the data. From the result of the test you accept or reject the hypothesis according to prearranged criteria. This cannot be infallible, and there is always a chance of getting the wrong answer, so you try and reduce the chance of such a mistake to a level which you consider reasonable." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Exploratory regression methods attempt to reveal unexpected patterns, so they are ideal for a first look at the data. Unlike other regression techniques, they do not require that we specify a particular model beforehand. Thus exploratory techniques warn against mistakenly fitting a linear model when the relation is curved, a waxing curve when the relation is S-shaped, and so forth." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Most statistical models assume error free measurement, at least of independent (predictor) variables. However, as we all know, measurements are seldom if ever perfect. Particularly when dealing with noisy data such as questionnaire responses or processes which are difficult to measure precisely, we need to pay close attention to the effects of measurement errors. Two characteristics of measurement which are particularly important in psychological measurement are reliability and validity." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"We can consider three broad classes of statistical pitfalls. The first involves sources of bias. These are conditions or circumstances which affect the external validity of statistical results. The second category is errors in methodology, which can lead to inaccurate or invalid results. The third class of problems concerns interpretation of results, or how statistical results are applied (or misapplied) to real world issues." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"This notion of 'being due' - what is sometimes called the gambler’s fallacy - is a mistake we make because we cannot help it. The problem with life is that we have to live it from the beginning, but it makes sense only when seen from the end. As a result, our whole experience is one of coming to provisional conclusions based on insufficient evidence: read ing the signs, gauging the odds." (John Haigh," Taking Chances: Winning With Probability", 1999)

"Big numbers warn us that the problem is a common one, compelling our attention, concern, and action. The media like to report statistics because numbers seem to be 'hard facts' - little nuggets of indisputable truth. [...] One common innumerate error involves not distinguishing among large numbers. [...] Because many people have trouble appreciating the differences among big numbers, they tend to uncritically accept social statistics (which often, of course, feature big numbers)." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Compound errors can begin with any of the standard sorts of bad statistics - a guess, a poor sample, an inadvertent transformation, perhaps confusion over the meaning of a complex statistic. People inevitably want to put statistics to use, to explore a number's implications. [...] The strengths and weaknesses of those original numbers should affect our confidence in the second-generation statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"A major problem with many studies is that the population of interest is not adequately defined before the sample is drawn. Don’t make this mistake. A second major source of error is that the sample proves to have been drawn from a different population than was originally envisioned." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"The difference between 'statistically significant' and 'not statistically significant' is not in itself necessarily statistically significant. By this, I mean more than the obvious point about arbitrary divisions, that there is essentially no difference between something significant at the 0.049 level or the 0.051 level. I have a bigger point to make. It is common in applied research–in the last couple of weeks, I have seen this mistake made in a talk by a leading political scientist and a paper by a psychologist–to compare two effects, from two different analyses, one of which is statistically significant and one which is not, and then to try to interpret/explain the difference. Without any recognition that the difference itself was not statistically significant." (Andrew Gelman, "The difference between ‘statistically significant’ and ‘not statistically significant’ is not in itself necessarily statistically significant", 2005)

"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it isa mistake – maybe due to a measuring or recording error." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"A naive interpretation of regression to the mean is that heights, or baseball records, or other variable phenomena necessarily become more and more 'average' over time. This view is mistaken because it ignores the error in the regression predicting y from x. For any data point xi, the point prediction for its yi will be regressed toward the mean, but the actual yi that is observed will not be exactly where it is predicted. Some points end up falling closer to the mean and some fall further." (Andrew Gelman & Jennifer Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models", 2007)

"If there is an outlier there are two possibilities: The model is wrong – after all, a theory is the basis on which we decide whether a data point is an outlier (an unexpected value) or not. The value of the data point is wrong because of a failure of the apparatus or a human mistake. There is a third possibility, though: The data point might not be an actual outlier, but part of a (legitimate) statistical fluctuation." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Another kind of error possibly related to the use of the representativeness heuristic is the gambler’s fallacy, otherwise known as the law of averages. If you are playing roulette and the last four spins of the wheel have led to the ball’s landing on black, you may think that the next ball is more likely than otherwise to land on red. This cannot be. The roulette wheel has no memory. The chance of black is just what it always is. The reason people tend to think otherwise may be that they expect the sequence of events to be representative of random sequences, and the typical random sequence at roulette does not have five blacks in a row." (Jonathan Baron, "Thinking and Deciding" 4th Ed, 2008)

"[…] humans make mistakes when they try to count large numbers in complicated systems. They make even greater errors when they attempt - as they always do - to reduce complicated systems to simple numbers." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"There is a growing realization that reported 'statistically significant' claims in statistical publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for 'probability') is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p- value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"Using a sample to estimate results in the full population is common in data analysis. But you have to be careful, because even small mistakes can quickly become big ones, given that each observation represents many others. There are also many factors you need to consider if you want to make sure your inferences are accurate." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The central limit conjecture states that most errors are the result of many small errors and, as such, have a normal distribution. The assumption of a normal distribution for error has many advantages and has often been made in applications of statistical models." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Statistical models have two main components. First, a mathematical formula that expresses a deterministic, predictable component, for example the fitted straight line that enables us to make a prediction [...]. But the deterministic part of a model is not going to be a perfect representation of the observed world [...] and the difference between what the model predicts, and what actually happens, is the second component of a model and is known as the residual error - although it is important to remember that in statistical modelling, ‘error’ does not refer to a mistake, but the inevitable inability of a model to exactly represent what we observe." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"If we don’t understand the statistics, we’re likely to be badly mistaken about the way the world is. It is all too easy to convince ourselves that whatever we’ve seen with our own eyes is the whole truth; it isn’t. Understanding causation is tough even with good statistics, but hopeless without them. [...] And yet, if we understand only the statistics, we understand little. We need to be curious about the world that we see, hear, touch, and smell, as well as the world we can examine through a spreadsheet." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Premature enumeration is an equal-opportunity blunder: the most numerate among us may be just as much at risk as those who find their heads spinning at the first mention of a fraction. Indeed, if you’re confident with numbers you may be more prone than most to slicing and dicing, correlating and regressing, normalizing and rebasing, effortlessly manipulating the numbers on the spreadsheet or in the statistical package - without ever realizing that you don’t fully understand what these abstract quantities refer to. Arguably this temptation lay at the root of the last financial crisis: the sophistication of mathematical risk models obscured the question of how, exactly, risks were being measured, and whether those measurements were something you’d really want to bet your global banking system on." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Always expect to find at least one error when you proofread your own statistics. If you don’t, you are probably making the same mistake twice." (Cheryl Russell)

Quotable Mathematics

Pages