21 April 2024

Alexander von Humboldt - Collected Quotes

"Whatever relates to extent and quantity may be represented by geometrical figures. Statistical projections which speak to the senses without fatiguing the mind, possess the advantage of fixing the attention on a great number of important facts." (Alexander von Humboldt, 1811)

"Regardless of communication between man and man, speech is a necessary condition for the thinking of the individual in solitary seclusion. In appearance, however, language develops only socially, and man understands himself only once he has tested the intelligibility of his words by trial upon others." (Wilhelm von Humboldt, "On Language", 1836)

"At the limits of circumscribed knowledge, as from some lofty island shore, the eye delights to penetrate to distant regions." (Alexander von Humboldt, "Cosmos: A Sketch of a Physical Description of the Universe", 1845)

"Conclusions based upon analogies may fill up a portion of the vast chasm which separates the certain results of a mathematical natural philosophy from conjectures verging on the extreme, and therefore obscure and barren confines of all scientific development of mind." (Alexander von Humboldt, "Cosmos: A Sketch of a Physical Description of the Universe", 1845)

"Impressions change with the varying movements of the mind, and we are led by a happy illusion to believe that we receive from the external world that with which we have ourselves invested it." (Alexander von Humboldt, "Cosmos: A Sketch of a Physical Description of the Universe", 1845)

"Nature, in the manifold signification of the word - whether considered as the universality of all that is and ever will be - as the inner moving force of all phenomena, or as their mysterious prototype - reveals itself to the simple mind and feelings of man as something earthly, and closely allied to himself."(Alexander von Humboldt, "Cosmos: A Sketch of a Physical Description of the Universe", 1845)

"The mere accumulation of unconnected observations of details, devoid of generalization of ideas, may doubtlessly have tended to create and foster the deeply rooted prejudice, that the study of the exact sciences must necessarily chill the feelings, and diminish the nobler enjoyments attendant upon a contemplation of nature." (Alexander von Humboldt, "Views of Nature: Or Contemplation of the Sublime Phenomena of Creation", 1850)

"The philosophical study of nature rises above the requirements of mere delineation, and does not consist in the sterile accumulation of isolated facts. The active and inquiring spirit of man may therefore be occasionally permitted to escape from the present into the domain of the past, to conjecture that which cannot yet be clearly determined, and thus to revel amid the ancient and ever-recurring myths of geology." (Alexander von Humboldt, "Views of Nature: Or Contemplation of the Sublime Phenomena of Creation", 1850)

"In order to depict nature in its exalted sublimity, we must not dwell exclusively on its external manifestations, but we must trace its image, reflected in the mind of man, at one time filling the dreamy land of physical myths with forms of grace and beauty, and at another developing the noble germ of artistic creations." (Alexander von Humboldt, "Cosmos: A Sketch of a Physical Description of the Universe" Vol. 2, 1869)

"Nature considered rationally, that is to say, submitted to the process of thought, is a unity in diversity of phenomena; a harmony, blending together all created things, however dissimilar in form and attributes; one great whole animated by the breath of life." (Alexander von Humboldt)

"The most dangerous worldview is the worldview of those who have not viewed the world." (Alexander von Humboldt [attributed to])

"With the simplest statements of scientific facts there must ever mingle a certain eloquence. Nature herself is sublimely eloquent. The stars as they sparkle in the firmament fill us with delight and ecstasy, and yet they all move in orbits marked out with mathematical precision." (Alexander von Humboldt)

Wilhelm von Humboldt - Collected Quotes

"All situations in which the interrelationships between extremes are involved are the most interesting and instructive." (Wilhelm von Humboldt, "The Limits of State Action", 1792)

"It is the principle of necessity towards which, as to their ultimate centre, all the ideas advanced in this essay immediately converge. In abstract theory the limits of this necessity are determined solely by considerations of man’s proper nature as a human being; but in the application we have to regard, in addition, the individuality of man as he actually exists. This prinsciple of necessity should, I think, prescribe the grand fundamental rule to which every effort to act on human beings and their manifold relations should be invariably conformed. For it is the only thing which conducts to certain and unquestionable results. The consideration of the useful, which might be opposed to it, does not admit of any true and unswerving decision." (Wilhelm von Humboldt, "The Limits of State Action", 1792)

"To inquire and to create; - these are the grand centres around which all human pursuits revolve, or at least to these objects do they all more or less directly refer." (Wilhelm von Humboldt, "The Limits of State Action", 1792)

"The diversity of languages is not a diversity of signs and sounds but a diversity of views of the world." (Wilhelm von Humboldt, 1820)

"Results are nothing; the energies which produce them and which again spring from them are everything." (Wilhelm von Humboldt,  "On Language", 1836)

"As soon as one stops searching for knowledge, or if one imagines that it need not be creatively sought in the depths of the human spirit but can be assembled extensively by collecting and classifying facts, everything is irrevocably and forever lost." (Wilhelm von Humboldt)

"[...] languages are not really means for representing already known truths, but are rather instruments for discovering previously unrecognised ones." (Wilhelm von Humboldt)

"The mutual interdependence of thought and word illuminates clearly the truth that languages are not really means for representing already known truths, but are rather instruments for discovering previously unrecognised ones. The differences between languages are not those of sounds and signs but those of differing  worldviews […] objective truth always rises from the entire energy of subjective individuality." (Wilhelm von Humboldt)

"True enjoyment comes from activity of the mind and exercise of the body; the two are ever united." (Wilhelm von Humboldt)

Armand Julin - Collected Quotes

"Among the whole sciences, statistics belongs to the social sciences and through its method it is linked to logic."  (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "Dans l'ensemble des sciences, la statistique appartient aux sciences sociales et par sa méthode elle se rattache à la logique." (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"Graphical statistics can be defined as: 'the expression of statistical facts by means of geometric processes' (Levasseur) Its general usefulness consists of replacing figures which, by their multiplicity, confuse memory, with a figure whose general appearance can be discovered all at once and, by speaking to the eyes, is more easily engraved in the memory." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "La statistique graphique peut être définie : «l'expression des faits statistiques au moyen de procédés géométriques» (Levasseur) Son utilité générale consiste à substituer aux chiffres qui, par leur multiplicité, confondent la mémoire, une figure dont l'allure générale se découvre d'un seul coup et, en parlant aux yeux, se grave plus facilement dans le souvenir."  (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"[...] statistics is the science that, through calculation, leads to an understanding of the characteristics of human societies, and its purpose is the study of masses through the enumeration of the units that compose them." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "[...] la statistique est la science qui, par le calcul, arrive à la connaissance des caractères des sociétés humaines et dont l'objet est V étude des masses au moyen du dénombrement des unités qui les composent." (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"The fundamental principle: Accepting as true only what is demonstrated to be sincere is the basis of this part of the criticism. There is no external sign of sincerity; the emphasis on sincerity is not applicable to statistics. In well-done statistics we try to introduce control points, for example by repeating the same questions in different forms; the comparison of data thus makes it possible to control the degree of sincerity." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "Le principe fondamental : N'admettre comme vrai que ce qui est démontré sincère est à la base de cette partie de la critique. Il n'existe pas de signe extérieur de la sincérité; l'accent de sincérité n'est pas applicable à la statistique. Dans les statistiques bien faites on s'efforce d'introduire des points de contrôle, par exemple en répétant les mêmes questions sous des formes différentes ; la confrontation des données permet ainsi de contrôler le degré de sincérité." (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"The essential quality of graphic representations is clarity. If the diagram fails to give a clearer impression than the tables of figures it replaces, it is useless. To this end, we will avoid complicating the diagram by including too much data." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "La qualité essentielle des représentations graphiques est la clarté. Si le diagramme ne parvient pas à donner une impression plus nette que les tableaux de chiffres qu'il remplace, il est inutile. Dans ce but, on évitera de compliquer le diagramme en y portant des données trop nombreuses." (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"The succession of phases of any statistical operation proves that the inductive method is used in statistics: 1) Recognize that a phenomenon can be observed using statistics; 2) Observe the exact nature of the phenomenon; 3) Record observations; 4) Group facts of the same nature in ad hoc tables; 5) Carry out the analysis of the observations; 6) Make the necessary calculations (totals, averages, proportions); 7) Discover similar phenomena, find causal links, indicate trends and laws; 8) Present and publish the results." (Armand Julin, "Summary for a Course of Statistics, General and Applied", 1910)

[Fr:] "La succession des phases de toute opération statistique prouve qu'on emploie en statistique la méthode inductive: 1) Reconnaître qu'un phénomème peut être observé au moyen de la statistique; 2) Observer la nature exacte du phénomène ; 3) Enregistrer les observations; 4) Grouper les faits de même nature dans des tableaux ad hoc; 5) Procéder au dépouillement des observations ; 6) Faire les calculs nécessaires (totaux, moyennes, proportions); 7) Découvrir les phénomènes semblables, trouver les liens de causalité, indiquer les tendances et les lois ; 8) Exposer et publier les résultats."  (Armand Julin, "Précis du cours de statistique, générale et appliquée", 1910)

"Graphic statistic has a role to play of its own; it is not the servant of numerical statistics but it cannot pretend, on the other hand, to precede or displace it." (Armand Julin) [?]

On Dispersion II: Statistics

"The term dispersion is used to indicate the facts that within a given group, the items differ from one another in size or in other words, there is lack of uniformity in their sizes." (Willford I King, "The Elements of Statistical Method", 1912)

"[…] statistical literacy. That is, the ability to read diagrams and maps; a 'consumer' understanding of common statistical terms, as average, percent, dispersion, correlation, and index number."  (Douglas Scates, "Statistics: The Mathematics for Social Problems", 1943)

"The fact that index numbers attempt to measure changes of items gives rise to some knotty problems. The dispersion of a group of products increases with the passage of time, principally because some items have a long-run tendency to fall while others tend to rise. Basic changes in the demand is fundamentally responsible. The averages become less and less representative as the distance from the period increases." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"Dispersion or spread is the degree of the scatter or variation of the variables about a central value." (Bertram C Brookes & W F L Dick, "Introduction to Statistical Method", 1969)

"Linear regression assumes that in the population a normal distribution of error values around the predicted Y is associated with each X value, and that the dispersion of the error values for each X value is the same. The assumptions imply normal and similarly dispersed error distributions." (Fred C Pampel, "Linear Regression: A primer", 2000)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves."  (Bart Kosko, "Noise", 2006)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Dispersion is the measure of the variation of the items." (Arthur L Bowley)

"Dispersion is a measure of the extent to which the individual items vary." (Lewis R Connor)

On Dispersion I: Trivia

"Either all things proceed from one intelligent source and come together as in one body, and the part ought not to find fault with what is done for the benefit of the whole; or there are only atoms, and nothing else than a mixture and dispersion. Why, then, art thou disturbed? Say to this ruling faculty, Art thou dead, art thou corrupted, art thou playing the hypocrite, art thou become a beast, dost thou herd and feed with the rest?" (Marcus Aurelius, "Meditations". cca. 121–180 AD)

"Look at everything that exists, and observe that it is already in dissolution and change, and as it were putrefaction or dispersion, or that everything is so constituted in nature as to die." (Marcus Aurelius, "Meditations". cca. 121–180 AD)

"To know the quantum mechanical state of a system implies, in general, only statistical restrictions on the results of measurements. It seems interesting to ask if this statistical element be thought of as arising, as in classical statistical mechanics, because the states in question are averages over better defined states for which individually the results would be quite determined. These hypothetical 'dispersion free' states would be specified not only by the quantum mechanical state vector but also by additional 'hidden variables' - 'hidden' because if states with prescribed values of these variables could actually be prepared, quantum mechanics would be observably inadequate." (John S Bell, "On the problem of hidden variables in quantum mechanics" [in "Reviews of Modern Physics"], 1966)

"[...] the influence of a single butterfly is not only a fine detail-it is confined to a small volume. Some of the numerical methods which seem to be well adapted for examining the intensification of errors are not suitable for studying the dispersion of errors from restricted to unrestricted regions. One hypothesis, unconfirmed, is that the influence of a butterfly's wings will spread in turbulent air, but not in calm air." (Edward N Lorenz, [talk] 1972)

"Determinism was eroded during the nineteenth century and a space was cleared for autonomous laws of chance. The idea of human nature was displaced by a model of normal people with laws of dispersion. These two transformations were parallel and fed into each other. Chance made the world seem less capricious; it was legitimated because it brought order out of chaos. The greater the level of indeterminism in our conception of the world and of people, the higher the expected level of control." (Ian Hacking, "The Taming of Chance", 1990)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves."  (Bart Kosko, "Noise", 2006)

"We can simplify the relationships between fragility, errors, and antifragility as follows. When you are fragile, you depend on things following the exact planned course, with as little deviation as possible - for deviations are more harmful than helpful. This is why the fragile needs to be very predictive in its approach, and, conversely, predictive systems cause fragility. When you want deviations, and you don’t care about the possible dispersion of outcomes that the future can bring, since most will be helpful, you are antifragile. Further, the random element in trial and error is not quite random, if it is carried out rationally, using error as a source of information. If every trial provides you with information about what does not work, you start zooming in on a solution - so every attempt becomes more valuable, more like an expense than an error. And of course you make discoveries along the way." (Nassim N Taleb, "Antifragile: Things that gain from disorder", 2012)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

On Statistics: Statistical Fallacies III

"No doubt statistics can be easily misinterpreted; and are often very misleading when first applied to new problems. But many of the worst fallacies involved in the misapplications of statistics are definite and can be definitely exposed, till at last no one ventures to repeat them even when addressing an uninstructed audience: and on the whole arguments which can be reduced to statistical forms, though still in a backward condition, are making more sure and more rapid advances than any others towards obtaining the general acceptance of all who have studied the subjects to which they refer." (Alfred Marshall, "Principles of Economics", 1890)

"Without an adequate understanding of the statistical methods, the investigator in the social sciences may be like the blind man groping in a dark room for a black cat that is not there. The methods of Statistics are useful in an over-widening range of human activities in any field of thought in which numerical data may be had." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937)

"Science of Statistics is the useful servant but only of great value to those who understand its proper use." (Willford I King, "The Elements of Statistical Method", 1912)

"Many people use statistics as a drunkard uses a street lamp - for support rather than illumination. It is not enough to avoid outright falsehood; one must be on the alert to detect possible distortion of truth. One can hardly pick up a newspaper without seeing some sensational headline based on scanty or doubtful data." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"He who accepts statistics indiscriminately will often be duped unnecessarily. But he who distrusts statistics indiscriminately will often be ignorant unnecessarily. There is an accessible alternative between blind gullibility and blind distrust. It is possible to interpret statistics skillfully. The art of interpretation need not be monopolized by statisticians, though, of course, technical statistical knowledge helps. Many important ideas of technical statistics can be conveyed to the non-statistician without distortion or dilution. Statistical interpretation depends not only on statistical ideas but also on ordinary clear thinking. Clear thinking is not only indispensable in interpreting statistics but is often sufficient even in the absence of specific statistical knowledge. For the statistician not only death and taxes but also statistical fallacies are unavoidable. With skill, common sense, patience and above all objectivity, their frequency can be reduced and their effects minimised. But eternal vigilance is the price of freedom from serious statistical blunders." (W Allen Wallis & Harry V Roberts, "The Nature of Statistics", 1965)

"[…] it is not enough to say: 'There's error in the data and therefore the study must be terribly dubious'. A good critic and data analyst must do more: he or she must also show how the error in the measurement or the analysis affects the inferences made on the basis of that data and analysis." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Typically, data analysis is messy, and little details clutter it. Not only confounding factors, but also deviant cases, minor problems in measurement, and ambiguous results lead to frustration and discouragement, so that more data are collected than analyzed. Neglecting or hiding the messy details of the data reduces the researcher's chances of discovering something new." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

On Laws V: The Law of Statistical Regularity

"This statistical regularity in moral affairs fully establishes their being under the presidency of law. Man is seen to be an enigma only as an individual: in the mass he is a mathematical problem." (Robert Chambers, "Vestiges of the Natural History of Creation", 1844) 

"the law of statistical regularity lays down that the moderately large number of items chosen at random from a large group are almost sure on the average to possess the characteristics of the large group." (Willford I King, "The Elements of Statistical Method", 1912)

"The principle underlying sampling is that a set of objects taken at random from a larger group tends to reproduce the characteristics of that larger group: this is called the Law of Statistical Regularity. There are exceptions to this rule, and a certain amount of judgment must be exercised, especially when there are a few abnormally large items in the larger group. With erratic data, the accuracy of sampling can often be tested by comparing several samples. On the whole, the larger the sample the more closely will it tend to resemble the population from which it is taken; too small a sample would not give reliable results." (Lewis R Connor, "Statistics in Theory and Practice", 1932)

"It is the task of science, as a collective human undertaking, to describe from the external side, (on which alone agreement is possible), such statistical regularity as there is in a world in which every event has a unique aspect, and to indicate where possible the limits of such description. It is not part of its task to make imaginative interpretation of the internal aspect of reality - what it is like, for example, to be a lion, an ant or an ant hill, a liver cell, or a hydrogen ion. The only qualification is in the field of introspective psychology in which each human being is both observer and observed, and regularities may be established by comparing notes. Science is thus a limited venture. It must act as if all phenomena were deterministic at least in the sense of determinable probabilities." (Sewall Wright, "Gene and Organism", American Naturalist 87, 1953)

"The epistemological value of probability theory is based on the fact that chance phenomena, considered collectively and on a grand scale, create non-random regularity." (Andrey Kolmogorov, "Limit Distributions for Sums of Independent Random Variables", 1954)

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." (Charles A E Goodhart, "Problems of Monetary Management: the U.K. Experience", 1975)

 "The law of statistical regularity lays down that a group of objects chosen at random from a larger group tends to possess the characteristics of that large group (universe)." (Lewis R Connor)

On Averages II

"An average value is a single value within the range of the data that is used to represent all of the values in the series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937)

"An average is a single value selected from a group of values to represent them in some way, a value which is supposed to stand for whole group of which it is part, as typical of all the values in the group." (Albert E Waugh, "Elements of Statistical Methods" 3rd Ed., 1952)

"An average does not tell the full story. It is hardly fully representative of a mass unless we know the manner in which the individual items scatter around it. A further description of the series is necessary if we are to gauge how representative the average is." (George Simpson & Fritz Kafka, "Basic Statistics", 1952)

"An average is sometimes called a 'measure of central tendency' because individual values of the variable usually cluster around it. Averages are useful, however, for certain  types of data in which there is little or no central tendency." (William A Spirr & Charles P Bonini, "Statistical Analysis for Business Decisions" 3rd Ed., 1967)

"The most widely used mathematical tools in the social sciences are statistical, and the prevalence of statistical methods has given rise to theories so abstract and so hugely complicated that they seem a discipline in themselves, divorced from the world outside learned journals. Statistical theories usually assume that the behavior of large numbers of people is a smooth, average 'summing-up' of behavior over a long period of time. It is difficult for them to take into account the sudden, critical points of important qualitative change. The statistical approach leads to models that emphasize the quantitative conditions needed for equilibrium-a balance of wages and prices, say, or of imports and exports. These models are ill suited to describe qualitative change and social discontinuity, and it is here that catastrophe theory may be especially helpful." (Alexander Woodcock & Monte Davis, "Catastrophe Theory", 1978)

"The arithmetic mean has another familiar property that will be useful to remember. The sum of the deviations of the values from their mean is zero, and the sum of the squared deviations of the values about the mean is a minimum. That is to say, the sum of the squared deviations is less than the sum of the squared deviations about any other value." (Charles T Clark & Lawrence L Schkade, "Statistical Analysis for Administrative Decisions", 1979)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"A very different - and very incorrect - argument is that successes must be balanced by failures (and failures by successes) so that things average out. Every coin flip that lands heads makes tails more likely. Every red at roulette makes black more likely. […] These beliefs are all incorrect. Good luck will certainly not continue indefinitely, but do not assume that good luck makes bad luck more likely, or vice versa." (Gary Smith, "Standard Deviations", 2014)

"Average deviation is the average amount of scatter of the items in a distribution from either the mean or the median, ignoring the signs of the deviations. The average that is taken of the scatter is an arithmetic mean, which accounts for the fact that this measure is often called the mean deviation." (Charles T Clark & Lawrence L Schkade)

14 April 2024

Misquoted: Andrew Lang's Using Statistics for Support rather than Illumination

The quote is from Andrew Lang's speech from 1910 (see [3]) referenced in several other places (see [4], [5], [6]) without specifying the source:

"Politicians use statistics in the same way that a drunk uses lamp-posts - for support rather than illumination." (Andrew Lang, [speech] 1910)

I like this quote because it reflects by a metaphor one of misuses of Statistics, people looking for supporting their beliefs, faith, opinions, conviction and/or biases (by twisting the numbers), rather than for changing their mind (illumination). Here Statistics refer mainly to data obtained by statistical methods, no matter whether they were obtained by simple aggregations or more complex techniques (including data visualization). It applies to politics, business as well as to daily life situations. In extremis, it has to do with cherry-picking or filtering rooted in data.

The Quote Investigator [1] traces back the metaphor behind the quote to Alfred E Housman who in a translation of Marcus Manilius' Astronomicon edition from 1903, where referring to manuscripts (abbreviated MS for singular and MSS for plural) said:

"And critics who treat MS evidence as rational men treat all evidence, and test it by reason and by the knowledge which they have acquired, these are blamed for rashness and capriciousness by gentlemen who use MSS as drunkards use lamp-posts, - not to light them on their way but to dissimulate their instability." [2] (Alfred E Housman, 1903)

Here are the three references mentioned above via the Quote Investigator (though there seem to be other earlier sources whose text is not publicly available):

"I shall try not to use statistics as a drunken man uses lamp-posts, for support rather than for illumination;" [4] (Francis Yeats-Brown, 1937)

 "For blue books are particularly prone to use their statistics not as a living record of social progress but (to quote a deservedly immortal phrase of Andrew Lang) ‘as a drunken man uses lamp-posts - for support rather than for illumination’." [5] (G A N Lowndes, 1937)

"Their case, as has been amply proved by these recapitulations to-night, is a very lame case indeed. The few new facts which the Debate has elicited from that side of the House have been used by them, as was said in another connection, as a drunken man uses lampposts - more for support than for illumination." [6] (McEwene Hansard, [speech] 1937) 

Several similar formulations of the metaphor can be found in other later works:

"Many use statistics as a drunken man uses o lamp-post - for support rather than for illumination." (The Lancet, 1941)

"Many people use statistics as a drunkard uses a street lamp - for support rather than illumination. It is not enough to avoid outright falsehood; one must be on the alert to detect possible distortion of truth. One can hardly pick up a newspaper without seeing some sensational headline based on scanty or doubtful data." (Anna C Rogers, "Graphic Charts Handbook", 1961)

 "Of course, you all know the old story that some people use statistics the way an inebriate uses a lamppost - for support rather than for illumination. It is not really that bad at all times. Statistics are indeed used for illumination, the difficulty is that everybody is trying to illuminate a different point." (Hyman L Lewis, [in Gerhard Bry's "Business Cycle Indicators for States and Regions"] 1961)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Beware of the man who, like the drunk with a lamppost, uses numbers for support rather than for illumination." (Lawrence Malkin, "The National Debt", 1987)

"It is worth mentioning the statement of a statistician which says 'Statistics should not be used as a blind man uses a lamp-post for support instead of illumination.'" (Padmalochan Hazarika, "A Textbook of Business Statistics", 2007)

"Statistics should not be used as a blind man uses a lamp post for support rather than for illumination." (C B Gupta & Vijay Gupta, "Introduction to Statistical Methods", 2009)

"An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts - for support rather than for illumination." (Anthony Carpi & Anne Egger, "The Process of Science", 2010) [attributed in current form to Lang]

"Statistics, as such, do not prove anything. They are simply tools in the hands of the statisticians. If a statistician misuses the data, then the blame lies squarely on him and not on the subject matter. A competent doctor can cure a disease by making good use of the medicine but the same medicine in the hands of an incompetent doctor becomes a poison. The fault in this case is not o the medicine but of the unqualified doctor. In the same way, Statistics is never faulty but the fault lies with the users. In fact, Statistics should not be relied upon blindly nor distrusted outright. 'Statistics should not be used as a blind man uses a lamp post for support rather than for illumination, whereas its real purpose is to serve as illumination and not as a support.'" (TR Jain & VK Ohri, "Introductory Microeconomics for Class 11" 2023)

Other attributions of the quote are given to Mark Twain, David Ogilvy and others:

"People commonly use statistics like a drunk uses a lamppost: for support rather than for illumination." (R Preston McAfee, "Competitive Solutions: The Strategist's Toolkit", 2009) [attributed to Mark Twain]

 "I notice increasing reluctance on the part of marketing executives to use judgement; they are coming to rely too much on research and they use it as a drunkard uses a lamp post - for support rather than for illumination." (David Ogilvy, "Confessions of an Advertising Man", 1971) 

Variations of the metaphor entered other fields as well:

"No business can safely run with accounts that are being used principally for support rather than for illumination." (Mark Thomas et al, "The Complete CEO",  2006)

Previous Post <<||>> Next Post 

References:
[1] Quote Investigator (2014) People Use Statistics as a Drunk Uses a Lamppost - For Support Rather Than Illumination (link)
[2] Marcus Manilius translated by Alfred E Housman (1903) Astronomicon
[3] Alan L. Mackay (1977) The Harvest of a Quiet Eye
[4] Francis Yeats-Brown (1937) Lancer at Large
[5] G A N Lowndes (1937) The Silent Social Revolution: An Account of the Expansion of Public Education in England and Wales 1895-1935
[6] McEwene Hansard (1937) speech

13 April 2024

On Significance II

 "Science usually amounts to a lot more than blind trial and error. Good statistics consists of much more than just significance tests; there are more sophisticated tools available for the analysis of results, such as confidence statements, multiple comparisons, and Bayesian analysis, to drop a few names. However, not all scientists are good statisticians, or want to be, and not all people who are called scientists by the media deserve to be so described." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"The idea of statistical significance is valuable because it often keeps us from announcing results that later turn out to be nonresults. A significant result tells us that enough cases were observed to provide reasonable assurance of a real effect. It does not necessarily mean, though, that the effect is big enough to be important." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"A tendency to drastically underestimate the frequency of coincidences is a prime characteristic of innumerates, who generally accord great significance to correspondences of all sorts while attributing too little significance to quite conclusive but less flashy statistical evidence." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"Statistical significance testing can involve a tautological logic in which tired researchers, having collected data on hundreds of subjects, then conduct a statistical test to evaluate whether there were a lot of subjects, which the researchers already know, because they collected the data and know they are tired. This tautology has created considerable damage as regards the cumulation of knowledge." (Bruce Thompson, "Two and One-Half Decades of Leadership in Measurement and Evaluation", Journal of Counseling & Development 70 (3), 1992)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"If you flip a coin three times and it lands on heads each time, it's probably chance. If you flip it a hundred times and it lands on heads each time, you can be pretty sure the coin has heads on both sides. That's the concept behind statistical significance - it's the odds that the correlation (or other finding) is real, that it isn't just random chance." (T Colin Campbell, "The China Study", 2004)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"For the study of the topology of the interactions of a complex system it is of central importance to have proper random null models of networks, i.e., models of how a graph arises from a random process. Such models are needed for comparison with real world data. When analyzing the structure of real world networks, the null hypothesis shall always be that the link structure is due to chance alone. This null hypothesis may only be rejected if the link structure found differs significantly from an expectation value obtained from a random model. Any deviation from the random null model must be explained by non-random processes." (Jörg Reichardt, "Structure in Complex Networks", 2009)

On Significance III

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers." (Gary Smith, "Standard Deviations", 2014)

"In short, statistical significance does not mean your result has any practical significance. As for statistical insignificance, it doesn’t tell you much. A statistically insignificant difference could be nothing but noise, or it could represent a real effect that can be pinned down only with more data." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Statistical significance refers to the probability that something is true. It’s a measure of how probable it is that the effect we’re seeing is real (rather than due to chance occurrence), which is why it’s typically measured with a p-value. P, in this case, stands for probability. If you accept p-values as a measure of statistical significance, then the lower your p-value is, the less likely it is that the results you’re seeing are due to chance alone." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

On Significance I

 "What the use of P [the significance level] implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred." (Harold Jeffreys, "Theory of Probability", 1939)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things: (a) a test of significance, to be used on the data first of all, and (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney, "Facts from Figures", 1951)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

Related Posts Plugin for WordPress, Blogger...

On Data: Longitudinal Data

  "Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One o...