20 March 2022

On Inquiry XI: Inquiry in Data Science I

"There is no inquiry which is not finally reducible to a question of Numbers; for there is none which may not be conceived of as consisting in the determination of quantities by each other, according to certain relations." (Auguste Comte, "The Positive Philosophy", 1830)

"[...] the data with which any scientific inquiry has to do are trivialities in some other bearing than that one in which they are of account." (Thorstein Veblen, "The Place of Science in Modern Civilisation and Other Essays", 1906)

"The postulate of randomness thus resolves itself into the question, 'of what population is this a random sample?' which must frequently be asked by every practical statistician." (Ronald Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922)

"Statistics are numerical statements of facts in any department of inquiry, placed in relation to each other; statistical methods are devices for abbreviating and classifying the statements and making clear the relations." (Arthur L Bowley, "An Elementary Manual of Statistics", 1934)

"Only by the analysis and interpretation of observations as they are made, and the examination of the larger implications of the results, is one in a satisfactory position to pose new experimental and theoretical questions of the greatest significance." (John A Wheeler, "Elementary Particle Physics", American Scientist, 1947)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Almost any sort of inquiry that is general and not particular involves both sampling and measurement […]. Further, both the measurement and the sampling will be imperfect in almost every case. We can define away either imperfection in certain cases. But the resulting appearance of perfection is usually only an illusion." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"At root what is needed for scientific inquiry is just receptivity to data, skill in reasoning, and yearning for truth. Admittedly, ingenuity can help too." (Willard v O Quine, "The Web of Belief", 1970)

"The purpose of models is not to fit the data but to sharpen the questions." (Samuel Karlin, 1983)

"Statistical models for data are never true. The question whether a model is true is irrelevant. A more appropriate question is whether we obtain the correct scientific conclusion if we pretend that the process under study behaves according to a particular statistical model." (Scott Zeger, "Statistical reasoning in epidemiology", American Journal of Epidemiology, 1991)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"Consideration needs to be given to the most appropriate data to be collected. Often the temptation is to collect too much data and not give appropriate attention to the most important. Filing cabinets and computer files world-wide are filled with data that have been collected because they may be of interest to someone in future. Most is never of interest to anyone and if it is, its existence is unknown to those seeking the information, who will set out to collect the data again, probably in a trial better designed for the purpose. In general, it is best to collect only the data required to answer the questions posed, when setting up the trial, and plan another trial for other data in the future, if necessary." (P Portmann & H Ketata, "Statistical Methods for Plant Variety Evaluation", 1997)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

George B Dyson - Collected Quotes

"An Internet search engine is a finite-state, deterministic machine, except at those junctures where people, individually and collectiv...