29 October 2024

David G Green - Collected Quotes

"Although it might be intuitively apparent that a system is complex, defining complexity has proved difficult to pin down with numerous definitions on record. As yet there is no agreed theory of complexity. Much of the mathematics is intractable and computer simulation plays a major part." (Terry R J Bossomaier & David G Green, 2000) 

"Although many natural phenomena may result from the interaction of complex entities, the details of the components may be unimportant. In the discussion of neural networks, the individual neuron turns out to be a highly sophisticated biological system. But the collective properties of neurons may be captured by spin-glass models, in which the neuron is simplified to a binary quantity […] " (Terry R J Bossomaier & David G Green, 2000) 

"Interaction: the other major source of complexity is the interaction of many autonomous, adaptive agents. Again, there are many questions to ask about the agents, the nature of the interaction and the circumstances in which complex surface phenomena result. (Terry R J Bossomaier & David G. Green, 2000)

"Iteration: fractals and chaos result from repetition of simple operations. These generating rules produce complex phenomena. There are many interesting questions to ask about how to describe the processes, how to measure the resulting complexity, whether we can work backwards from the reult to the rules and so on. (Terry R J Bossomaier & David G. Green, 2000)

"Of course what we would all like to see is a general theory of complex systems or complexity. Despite several promising candidates the selection process is still under way. Maybe there is no universal theory, but there are certainly common paradigms and methods which have proved to be useful across a wide area." (Terry R J Bossomaier & David G Green, 2000) 

" The evident power of simple heuristics […] teaches us the important lesson that global behavior patterns, and social organization, can emerge out of local interactions. Organisms do not necessarily need to have an over-riding plan nor do they require awareness of the large-scale. Grand patterns and processes can emerge as the nett effect of small-scale, local behavior.

"If entropy must increase, then how is it possible (say) for all the variety of the living world to persist? The usual answer to the above question is that living systems are open systems, not closed, so the law does not apply locally. However this answer is somewhat unsatisfying. In effect all systems are open systems, since everything interacts with its surroundings to some degree." (David G Green, 2000) 

"The really crucial question in multi-object systems is whether local interactions do grow into large-scale patterns." (David G Green, 2000) 

"The self-similarity on different scales arises because growth often involves iteration of simple, discrete processes (e.g. branching). These repetitive processes can often be summarized as sets of simple rules." (David G Green, 2000) 

27 October 2024

Paul Klee - Collected Quotes

"Our initial perplexity before nature is explained by our seeing at first the small outer branches and not penetrating to the main branches or the trunk. But once this is realized, one will perceive a repetition of the whole law even in the outermost leaf and turn it to good use." (Paul Klee, [diary entry] 1904)

"When looking at any significant work of art, remember that a more significant one probably has had to be sacrificed." (Paul Klee, [diary entry] 1904)

"The beautiful, which is perhaps inseparable from art, is not after all tied to the subject, but to the pictorial representation. In this way and in no other does art overcome the ugly without avoiding it." (Paul Klee, [diary entry] 1905)

"Things are not quite so simple with 'pure' art as it is dogmatically claimed. In the final analysis, a drawing simply is no longer a drawing, no matter how self-sufficient its execution may be. It is a symbol, and the more profoundly the imaginary lines of projection meet higher dimensions, the better. In this sense I shall never be a pure artist as the dogma defines him. We higher creatures are also mechanically produced children of God, and yet intellect and soul operate within us in completely different dimensions." (Paul Klee, [diary entry] 1905)

"Nature can afford to be prodigal in everything, the artist must be frugal down to the last detail."  Paul Klee, [diary entry] 1909)

"First of all, the art of living; then as my ideal profession, poetry and philosophy, and as my real profession, plastic arts; in the last resort, for lack of income, illustrations." (Paul Klee, cca. 1910

"Graphic work as the expressive movement of the hand holding the recording pencil.... is so fundamentally different from dealing with tone and color that one can use this technique quite well in the dark, even in the blackest night. On the other hand, tone (movement from light to dark) presupposes some light, and color presupposes a great deal of light." (Paul Klee, 1912)

"We document, explain, justify, construct, organize: these are good things, but we do not succeed in coming to the whole [...]. But we may as well calm down: construction is not absolute. Our virtue is this: by cultivating the exact we have laid the foundations for a science of art, including the unknown X." (Paul Klee, "Statement of 1917"

"A tendency toward the abstract is inherent in linear expression: graphic imagery being confined to outlines has a fairy-like quality and at the same time can achieve great precision." (Paul Klee, "Creative Credo", 1920)

"Things appear to assume a broader and more diversified meaning, often seemingly contradicting the rational experience of yesterday. There is a striving to emphasize the essential character of the accidental." (Paul Klee, "Creative Credo", 1920)

"For the artist communication with nature remains the most essential condition. The artist is human; himself nature; part of nature within natural space." (Paul Klee, 1923)

"It is possible that a picture will move far away from Nature and yet find its way back to reality. The faculty of memory, experience at a distance produces pictorial associations." (Paul Klee, cca. 1925)

"Thought is the medially between earth and world. The broader the magnitude of his reach, the more painful man's tragic limitation. Thought is the medially between earth and world. The broader the magnitude of his reach, the more painful man's tragic limitation. To get where motion is interminate." ( Paul Klee, "Pedagogical Sketch Book, 1925)

"The longer a line, the more of the time element it contains. Distance is time whereas a surface is apprehended more in terms of the moment." (Paul Klee, "Exact Experiments in the Realm of Art", 1927)

"What had already been done for music by the end of the eighteenth century has at last been begun for the pictorial arts. Mathematics and physics furnished the means in the form of rules to be followed and to be broken. In the beginning it is wholesome to be concerned with the functions and to disregard the finished form. Studies in algebra, in geometry, in mechanics characterize teaching directed towards the essential and the functional, in contrast to apparent. One learns to look behind the façade, to grasp the root of things. One learns to recognize the undercurrents, the antecedents of the visible. One learns to dig down, to uncover, to find the cause, to analyze." (Paul Klee, "Bauhaus prospectus", 1929)

"Art should be like a holiday: something to give a man the opportunity to see things differently and to change his point of view." (Paul Klee)

"It is interesting to observe how real the object remains, in spite of all abstractions." (Paul Klee)

26 October 2024

Richard B Braithwaite - Collected Quotes

"It has been a fortunate fact in the modern history of physical science that the scientist constructing a new theoretical system has nearly always found that the mathematics [...] required [...] had already been worked out by pure mathematicians for their own amusement [...] The moral for statesmen would seem to be that, for proper scientific 'planning' , pure mathematics should be endowed fifty years ahead of scientists." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)

"[...] no batch of observations, however large, either definitively rejects or definitively fails to reject the hypothesis H0." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)

"The peaks of science may appear to be floating in the clouds, but their foundations are in the hard facts of experience." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)

"The peculiarity of [...] statistical hypotheses is that they are not conclusively refutable by any experience." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)

"The ultimate justification for any scientific belief will depend upon the main purpose for which we think scientifically - that of predicting and thereby controlling the future." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)

"The world is not made up of empirical facts with the addition of the laws of nature: what we call the laws of nature are conceptual devices by which we organize our empirical knowledge and predict the future." (Richard B Braithwaite, "Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science", 1953)





Howard Wainer - Collected Quotes

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"The essence of a graphic display is that a set of numbers having both magnitudes and an order are represented by an appropriate visual metaphor - the magnitude and order of the metaphorical representation match the numbers. We can display data badly by ignoring or distorting this concept." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984)

"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"A graphic display has many purposes, but it achieves its highest value when it forces us to see what we were not expecting." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Nothing that had been produced before was even close. Even today, after more than two centuries of graphical experience, Playfair’s graphs remain exemplary standards for clearcommunication of quantitative phenomena. […] Graphical forms were available before Playfair, but they were rarely used to plot empirical information." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Oftentimes a statistical graphic provides the evidence for a plausible story, and the evidence, though perhaps only circumstantial, can be quite convincing. […] But such graphical arguments are not always valid. Knowledge of the underlying phenomena and additional facts may be required." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Placing a fact within a context increases its value greatly. […] . An efficacious way to add context to statistical facts is by embedding them in a graphic. Sometimes the most helpful context is geographical, and shaded maps come to mind as examples. Sometimes the most helpful context is temporal, and time-based line graphs are the obvious choice. But how much time? The ending date (today) is usually clear, but where do you start? The starting point determines the scale. […] The starting point and hence the scale are determined by the questions that we expect the graph to answer." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Simpson’s Paradox can occur whenever data are aggregated. If data are collapsed across a subclassification (such as grades, race, or age), the overall difference observed may not represent what is going on. Standardization can help correct this, but nothing short of random assignment of individuals to groups will prevent the possibility of yet another subclassificatiion, as yet unidentified, changing things around again. But I believe that knowing of the possibility helps us, so that we can contain the enthusiasm of our impulsive first inferences." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The appearance, and hence the perception, of any statistical graphic is massively influenced by the choice of scale. If the scale of the vertical axis is too narrow relative to the scale of the horizontal axis, random meanders look meaningful." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The difficult task of properly setting the scale of a graph remains difficult but not mysterious. There is agreement among experts spanning two hundred years. The default option should be to choose a scale that fills the plot with data. We can deviate from this under circumstances when it is better not to fill the plot with data, but those circumstances are usually clear. It is important to remember that the sin of using too small a scale is venial; the viewer can correct it. The sin of using too large a scale cannot be corrected without access to the original data; it can be mortal." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Usually the effectiveness of a good display increases with the complexity of the data. When there are only a few points, almost anything will do; even a pie chart with only three or four categories is usually comprehensible." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Thus when we look at, or prepare, a time-based statistical graphic, it is important to ask what is the right time scale, the right context, for the questions of greatest interest. The answer to this question is sometimes complex, but the very act of asking it provides us with some protection against surprises." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The only thing we know for sure about a missing data point is that it is not there, and there is nothing that the magic of statistics can do change that. The best that can be managed is to estimate the extent to which missing data have influenced the inferences we wish to draw." (Howard Wainer, "14 Conversations About Three Things", Journal of Educational and Behavioral Statistics Vol. 35(1, 2010)

"For an analyst to willfully avoid learning about the science is akin to malfeasance. Of course, it is likely that a deep understanding both of the science and of data analytic methods does not reside in the same person. When it does not, data analysis should be done jointly. It is my understanding that data mining is not often done as a team. This is unfortunate, for then it is too easy to miss what might have been found." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

"Too often there is a disconnect between the people who run a study and those who do the data analysis. This is as predictable as it is unfortunate. If data are gathered with particular hypotheses in mind, too often they (the data) are passed on to someone who is tasked with testing those hypotheses and who has only marginal knowledge of the subject matter. Graphical displays, if prepared at all, are just summaries or tests of the assumptions underlying the tests being done. Broader displays, that have the potential of showing us things that we had not expected, are either not done at all, or their message is not able to be fully appreciated by the data analyst." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)


24 October 2024

Clay Helberg - Collected Quotes

"Another key element in making informative graphs is to avoid confounding design variation with data variation. This means that changes in the scale of the graphic should always correspond to changes in the data being represented." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Another trouble spot with graphs is multidimensional variation. This occurs where two-dimensional figures are used to represent one-dimensional values. What often happens is that the size of the graphic is scaled both horizontally and vertically according to the value being graphed. However, this results in the area of the graphic varying with the square of the underlying data, causing the eye to read an exaggerated effect in the graph." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"It may be helpful to consider some aspects of statistical thought which might lead many people to be distrustful of it. First of all, statistics requires the ability to consider things from a probabilistic perspective, employing quantitative technical concepts such as 'confidence', 'reliability', 'significance'. This is in contrast to the way non-mathematicians often cast problems: logical, concrete, often dichotomous conceptualizations are the norm: right or wrong, large or small, this or that." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"[...] many non-mathematicians hold quantitative data in a sort of awe. They have been lead to believe that numbers are, or at least should be, unquestionably correct." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Most statistical models assume error free measurement, at least of independent (predictor) variables. However, as we all know, measurements are seldom if ever perfect. Particularly when dealing with noisy data such as questionnaire responses or processes which are difficult to measure precisely, we need to pay close attention to the effects of measurement errors. Two characteristics of measurement which are particularly important in psychological measurement are reliability and validity." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Remember that a p-value merely indicates the probability of a particular set of data being generated by the null model - it has little to say about the size of a deviation from that model (especially in the tails of the distribution, where large changes in effect size cause only small changes in p-values)." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"There are a number of ways that statistical techniques can be misapplied to problems in the real world. Three of the most common hazards are designing experiments with insufficient power, ignoring measurement error, and performing multiple comparisons." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"We can consider three broad classes of statistical pitfalls. The first involves sources of bias. These are conditions or circumstances which affect the external validity of statistical results. The second category is errors in methodology, which can lead to inaccurate or invalid results. The third class of problems concerns interpretation of results, or how statistical results are applied (or misapplied) to real world issues." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

References:
[1] Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995 [link]

20 October 2024

On Probability (2000 - )

"In the laws of probability theory, likelihood distributions are fixed properties of a hypothesis. In the art of rationality, to explain is to anticipate. To anticipate is to explain." (Eliezer S. Yudkowsky, "A Technical Explanation of Technical Explanation", 2005)

"I have always thought that statistical design and sampling from populations should be the first courses taught, but all elementary courses I know of start with statistical methods or probability. To me, this is putting the cart before the horse!" (Walter Federer, "A Conversation with Walter T Federer", Statistical Science Vol 20, 2005)

"For some scientific data the true value cannot be given by a constant or some straightforward mathematical function but by a probability distribution or an expectation value. Such data are called probabilistic. Even so, their true value does not change with time or place, making them distinctly different from  most statistical data of everyday life." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In fact, H [entropy] measures the amount of uncertainty that exists in the phenomenon. If there were only one event, its probability would be equal to 1, and H would be equal to 0 - that is, there is no uncertainty about what will happen in a phenomenon with a single event because we always know what is going to occur. The more events that a phenomenon possesses, the more uncertainty there is about the state of the phenomenon. In other words, the more entropy, the more information." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models.  [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Entropy is a measure of amount of uncertainty or disorder present in the system within the possible probability distribution. The entropy and amount of unpredictability are directly proportional to each other." (G Suseela & Y Asnath V Phamila, "Security Framework for Smart Visual Sensor Networks", 2019)

On Probability (1975 - 1999)

"Of course, we know the laws of trial and error, of large numbers and probabilities. We know that these laws are part of the mathematical and mechanical fabric of the universe, and that they are also at play in biological processes. But, in the name of the experimental method and out of our poor knowledge, are we really entitled to claim that everything happens by chance, to the exclusion of all other possibilities?" (Albert Claude, "The Coming of Age of the Cell", Science, 1975)

"We often use the ideas of chance, likelihood, or probability in everyday language. For example, 'It is unlikely to rain today', 'The black horse will probably win the next race', or 'A playing card selected at random from a pack is unlikely to be the ace of spades' . Each of these remarks, if accepted at face value, is likely to reflect the speaker's expectation based on experience gained in the same position, or similar positions, on many  previous occasions. In order to be quantitative about probability, we focus on this aspect of repeatable situations." (Peter Lancaster, "Mathematics: Models of the Real World", 1976)

"The theory of probability is the only mathematical tool available to help map the unknown and the uncontrollable. It is fortunate that this tool, while tricky, is extraordinarily powerful and convenient." (Benoit Mandelbrot, "The Fractal Geometry of Nature", 1977)

"In decision theory, mathematical analysis shows that once the sampling distribution, loss function, and sample are specified, the only remaining basis for a choice among different admissible decisions lies in the prior probabilities. Therefore, the logical foundations of decision theory cannot be put in fully satisfactory form until the old problem of arbitrariness (sometimes called 'subjectiveness') in assigning prior probabilities is resolved." (Edwin T Jaynes, "Prior Probabilities", 1978)

"Another reason for the applied statistician to care about Bayesian inference is that consumers of statistical answers, at least interval estimates, commonly interpret them as probability statements about the possible values of parameters. Consequently, the answers statisticians provide to consumers should be capable of being interpreted as approximate Bayesian statements." (Donald B Rubin, "Bayesianly justifiable and relevant frequency calculations for the applied statistician", Annals of Statistics 12(4), 1984)

"In the path-integral formulation, the essence of quantum physics may be summarized with two fundamental rules: (1). The classical action determines the probability amplitude for a specific chain of events to occur, and (2) the probability that either one or the other chain of events occurs is determined by the probability amplitudes corresponding to the two chains of events. Finding these rules represents a stunning achievement by the founders of quantum physics." (Anthony Zee, "Fearful Symmetry: The Search for Beauty in Modern Physics", 1986)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain ‘exact’ posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don’t see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science vol. 10, 1995)

"Events may appear to us to be random, but this could be attributed to human ignorance about the details of the processes involved." (Brain S Everitt, "Chance Rules", 1999)

On Convergence IV

"Once more, an invariably-recurring lesson of geological history, at whatever point its study is taken up: the lesson of the almost infinite slowness of the modification of living forms. The lines of the pedigrees of living things break off almost before they begin to converge." (Thomas H Huxley, On the Formation of Coal, 1870)

"Analytic functions are those that can be represented by a power series, convergent within a certain region bounded by the so-called circle of convergence. Outside of this region the analytic function is not regarded as given a priori ; its continuation into wider regions remains a matter of special investigation and may give very different results, according to the particular case considered." (Felix Klein, "Sophus Lie", [lecture] 1893)

"Particular landforms or surface morphologies may be generated, in some cases, by several different processes, sets of environmental controls, or developmental histories. This convergence to similar forms despite variations in processes and controls is called equifinality." (Jonathan Phillips, "Simplexity and the Reinvention of Equifinality", Geographical Analysis Vol. 29 (1), 1997)

"The underlying reason for convergence seems to be that all organisms are under constant scrutiny of natural selection and are also subject to the constraints of the physical and chemical factors that severely limit the action of all inhabitants of the biosphere. Put simply, convergence shows that in a real world not all things are possible." (Simon C Morris, "The Crucible of Creation", 1998)

"Equifinality is the principle which states that morphology alone cannot be used to reconstruct the mode of origin of a landform on the grounds that identical landforms can be produced by a number of alternative processes, process assemblages or process histories. Different processes may lead to an apparent similarity in the forms produced. For example, sea-level change, tectonic uplift, climatic change, change in source of sediment or water or change in storage may all lead to river incision and a convergence of form." (Olav Slaymaker, "Equifinality", 2004)

"Convergence is, in my opinion, not only deeply fascinating but, curiously, it is as often overlooked. More importantly, it hints at the existence of a deeper structure to biology. It helps us to delineate a metaphorical map across which evolution must navigate. In this sense the Darwinian mechanisms and the organic substrate we call life are really a search engine to discover particular solutions, including intelligence and - risky thought - perhaps deeper realities?" (Simon C Morris,  "Aliens like us?", Astronomy and Geophysics Vol. 46 (4), 2005)

"Sometimes the most important fit statistic you can get is ‘convergence not met’ - it can tell you something is wrong with your model." (Oliver Schabenberger, "Applied Statistics in Agriculture Conference", 2006)

On Regression IV

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"Stepwise regression is probably the most abused computerized statistical technique ever devised. If you think you need stepwise regression to solve a particular problem you have, it is almost certain that you do not. Professional statisticians rarely use automated stepwise regression." (Leland Wilkinson, "SYSTAT", 1984)

"Someone has characterized the user of stepwise regression as a person who checks his or her brain at the entrance of the computer center." (Dick R Wittink, "The application of regression analysis", 1988)

"Multiple regression, like all statistical techniques based on correlation, has a severe limitation due to the fact that correlation doesn't prove causation. And no amount of measuring of 'control' variables can untangle the web of causality. What nature hath joined together, multiple regression cannot put asunder." (Richard Nisbett, "2014 : What scientific idea is ready for retirement?", 2013)

"Regression does not describe changes in ability that happen as time passes […]. Regression is caused by performances fluctuating about ability, so that performances far from the mean reflect abilities that are closer to the mean." (Gary Smith, "Standard Deviations", 2014)

"We encounter regression in many contexts - pretty much whenever we see an imperfect measure of what we are trying to measure. Standardized tests are obviously an imperfect measure of ability. [...] Each experimental score is an imperfect measure of “ability,” the benefits from the layout. To the extent there is randomness in this experiment - and there surely is - the prospective benefits from the layout that has the highest score are probably closer to the mean than was the score." (Gary Smith, "Standard Deviations", 2014)

"When a trait, such as academic or athletic ability, is measured imperfectly, the observed differences in performance exaggerate the actual differences in ability. Those who perform the best are probably not as far above average as they seem. Nor are those who perform the worst as far below average as they seem. Their subsequent performances will consequently regress to the mean." (Gary Smith, "Standard Deviations", 2014)

"Regression describes the relationship between an exploratory variable (i.e., independent) and a response variable (i.e., dependent). Exploratory variables are also referred to as predictors and can have a frequency of more than 1. Regression is being used within the realm of predictions and forecasting. Regression determines the change in response variable when one exploratory variable is varied while the other independent variables are kept constant. This is done to understand the relationship that each of those exploratory variables exhibits." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

On Statistics: Bayesian Statistics

"Another reason for the applied statistician to care about Bayesian inference is that consumers of statistical answers, at least interval estimates, commonly interpret them as probability statements about the possible values of parameters. Consequently, the answers statisticians provide to consumers should be capable of being interpreted as approximate Bayesian statements." (Donald B Rubin, "Bayesianly justifiable and relevant frequency calculations for the applied statistician", Annals of Statistics 12(4), 1984)

"The practicing Bayesian is well advised to become friends with as many numerical analysts as possible." (James Berger, "Statistical Decision Theory and Bayesian Analysis", 1985)

"Subjective probability, also known as Bayesian statistics, pushes Bayes' theorem further by applying it to statements of the type described as 'unscientific' in the frequency definition. The probability of a theory (e.g. that it will rain tomorrow or that parity is not violated) is considered to be a subjective 'degree of belief - it can perhaps be measured by seeing what odds the person concerned will offer as a bet. Subsequent experimental evidence then modifies the initial degree of belief, making it stronger or weaker according to whether the results agree or disagree with the predictions of the theory in question." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain 'exact' posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don't see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science, 1995)

"Bayesian computations give you a straightforward answer you can understand and use. It says there is an X% probability that your hypothesis is true-not that there is some convoluted chance that if you assume the null hypothesis is true, you’ll get a similar or more extreme result if you repeated your experiment thousands of times. How does one interpret THAT!" (Steven Goodman, "Bayes offers a new way to make sense of numbers", Science 19, 1999)

"Bayesian methods are complicated enough, that giving researchers user-friendly software could be like handing a loaded gun to a toddler; if the data is crap, you won’t get anything out of it regardless of your political bent." (Brad Carlin, "Bayes offers a new way to make sense of numbers", Science 19, 1999)

"I sometimes think that the only real difference between Bayesian and non-Bayesian hierarchical modelling is whether random effects are labeled with Greek or Roman letters." (Peter Diggle, "Comment on Bayesian analysis of agricultural field experiments", Journal of Royal Statistical Society B vol. 61, 1999)

"I believe that there are many classes of problems where Bayesian analyses are reasonable, mainly classes with which I have little acquaintance." (John Tukey, "The life and professional contributions of John W. Tukey, The Annals of Statistics", Vol 30, 2001)

"Bayesian statistics give us an objective way of combining the observed evidence with our prior knowledge (or subjective belief) to obtain a revised belief and hence a revised prediction of the outcome of the coin’s next toss. [...] This is perhaps the most important role of Bayes’s rule in statistics: we can estimate the conditional probability directly in one direction, for which our judgment is more reliable, and use mathematics to derive the conditional probability in the other direction, for which our judgment is rather hazy. The equation also plays this role in Bayesian networks; we tell the computer the forward  probabilities, and the computer tells us the inverse probabilities when needed." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"We thus echo the classical Bayesian literature in concluding that ‘noninformative prior information’ is a contradiction in terms. The flat prior carries information just like any other; it represents the assumption that the effect is likely to be large. This is often not true. Indeed, the signal-to-noise ratio s is often very low and then it is necessary to shrink the unbiased estimate. Failure to do so by inappropriately using the flat prior causes overestimation of effects and subsequent failure to replicate them." (Erik van Zwet & Andrew Gelman, "A proposal for informative default priors scaled by the standard error of estimates", The American Statistician 76, 2022)

Andrew Gelman - Collected Quotes

"The idea of optimization transfer is very appealing to me, especially since I have never succeeded in fully understanding the EM algorithm." (Andrew Gelman, "Discussion", Journal of Computational and Graphical Statistics vol 9, 2000)

"The difference between 'statistically significant' and 'not statistically significant' is not in itself necessarily statistically significant. By this, I mean more than the obvious point about arbitrary divisions, that there is essentially no difference between something significant at the 0.049 level or the 0.051 level. I have a bigger point to make. It is common in applied research–in the last couple of weeks, I have seen this mistake made in a talk by a leading political scientist and a paper by a psychologist–to compare two effects, from two different analyses, one of which is statistically significant and one which is not, and then to try to interpret/explain the difference. Without any recognition that the difference itself was not statistically significant." (Andrew Gelman, "The difference between ‘statistically significant’ and ‘not statistically significant’ is not in itself necessarily statistically significant", 2005)

"A naive interpretation of regression to the mean is that heights, or baseball records, or other variable phenomena necessarily become more and more 'average' over time. This view is mistaken because it ignores the error in the regression predicting y from x. For any data point xi, the point prediction for its yi will be regressed toward the mean, but the actual yi that is observed will not be exactly where it is predicted. Some points end up falling closer to the mean and some fall further." (Andrew Gelman & Jennifer Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models", 2007)

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman, "The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)

"The arguments I lay out are, briefly, that graphs are a distraction from more serious analysis; that graphs can mislead in displaying compelling patterns that are not statistically significant and that could easily enough be consistent with chance variation; that diagnostic plots could be useful in the development of a model but do not belong in final reports; that, when they take the place of tables, graphs place the careful reader one step further away from the numerical inferences that are the essence of rigorous scientific inquiry; and that the effort spent making flashy graphics would be better spent on the substance of the problem being studied." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)

"Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It is basically a trade-off: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis does not need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks", Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"To put it simply, we communicate when we display a convincing pattern, and we discover when we observe deviations from our expectations. These may be explicit in terms of a mathematical model or implicit in terms of a conceptual model. How a reader interprets a graphic will depend on their expectations. If they have a lot of background knowledge, they will view the graphic differently than if they rely only on the graphic and its surrounding text." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks", Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013) 

"Flaws can be found in any research design if you look hard enough. […] In our experience, it is good scientific practice to refine one's research hypotheses in light of the data. Working scientists are also keenly aware of the risks of data dredging, and they use confidence intervals and p-values as a tool to avoid getting fooled by noise. Unfortunately, a by-product of all this struggle and care is that when a statistically significant pattern does show up, it is natural to get excited and believe it. The very fact that scientists generally don't cheat, generally don't go fishing for statistical significance, makes them vulnerable to drawing strong conclusions when they encounter a pattern that is robust enough to cross the p < 0.05 threshold." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There are many roads to statistical significance; if data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth. Realistically, though, a researcher will come into a study with strong substantive hypotheses, to the extent that, for any given data set, the appropriate analysis can seem evidently clear. But even if the chosen data analysis is a deterministic function of the observed data, this does not eliminate the problem posed by multiple comparisons." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There is a growing realization that reported 'statistically significant' claims in statistical publications  are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for 'probability') is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p- value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"I agree with the general message: 'The right variables make a big difference for accuracy. Complex statistical methods, not so much.' This is similar to something Hal Stern told me once: the most important aspect of a statistical analysis is not what you do with the data, it’s what data you use." (Andrew Gelman, "The most important aspect of a statistical analysis is not what you do with the data, it’s what data you use", 2018)

"We thus echo the classical Bayesian literature in concluding that ‘noninformative prior information’ is a contradiction in terms. The flat prior carries information just like any other; it represents the assumption that the effect is likely to be large. This is often not true. Indeed, the signal-to-noise ratios is often very low and then it is necessary to shrink the unbiased estimate. Failure to do so by inappropriately using the flat prior causes overestimation of effects and subsequent failure to replicate them." (Erik van Zwet & Andrew Gelman, "A proposal for informative default priors scaled by the standard error of estimates", The American Statistician 76, 2022)

"Taking a model too seriously is really just another way of not taking it seriously at all." (Andrew Gelman)

19 October 2024

Michael Friendly - Collected Quotes

"Like good writing, producing an effective graphical display requires an understanding of purpose - what is to be communicated, and to whom." (Michael Friendly, "Gallery of Data Visualization", 1991)

"It is common to think of statistical graphics and data visualization as relatively modern developments in statistics. In fact, the graphic representation of quantitative information has deep roots. These roots reach into the histories of the earliest map-making and visual depiction, and later into thematic cartography, statistics and statistical graphics, medicine, and other fields. Along the way, developments in technologies (printing, reproduction) mathematical theory and practice, and empirical observation and recording, enabled the wider use of graphics and new advances in form and content." (Michael Friendly. "A brief history of data visualization", 2006)

"The graphic portrayal of quantitative information has deep roots. These roots reach into histories of thematic cartography, statistical graphics, and data visualization, which are intertwined with each other." (Michael Friendly. "Milestones in the history of thematic cartography, statistical graphics, and data visualization", 2008) 

"Algorithmic calculation can give only pseudo-random numbers, but some methods come closer than others in behaving like quantities that are truly random, such as numbers obtained from tossing a very large number of dice." (Michael Friendly. "Milestones in the history of thematic cartography, statistical graphics, and data visualization", 2008) 

"But to a ballet dancer, the art is in getting all the body parts to do those things in sync with a musical score to tell a wordless story of emotion entirely through change in position over time. In data visualization, as in physics and ballet, motion is a manifestation of the relation between time and space, and so the recording and display of motion added time as a fourth dimension to the abstract world of data." (Michael Friendly. "Milestones in the history of thematic cartography, statistical graphics, and data visualization", 2008) 

"Correlation does not imply causation: often some other missing third variable is influencing both of the variables you are correlating. […] The need for a scatterplot arose when scientists had to examine bivariate relations between distinct variables directly. As opposed to other graphic forms - pie charts, line graphs, and bar charts - the scatterplot offered a unique advantage: the possibility to discover regularity in empirical data (shown as points) by adding smoothed lines or curves designed to pass 'not through, but among them', so as to pass from raw data to a theory-based description, analysis, and understanding." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"However, just as in cooking, the details matter: the wrong spice can ruin the stew. In graphing data, different methods or graphical features can make it easier or harder to perceive and understand relationships or comparisons from the same data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Indeed, among all forms of statistical graphics, the scatterplot may be considered the most versatile and generally useful invention in the entire history of statistical graphics. Essential characteristics of a scatterplot are that two quantitative variables are measured on the same observational units (workers); the values are plotted as points referred to perpendicular axes; and the goal is to show something about the relation between these variables, typically how the ordinate variable, y, varies with the abscissa variable, x." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021

"Its primary function was to make previously invisible phenomena subject to direct inspection in a graphic display […] The graphic method had another function, that of communication to the scientific community and educated readers. These displays made complex phenomena palpable and concrete." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Our central questions in this book are 'How did the graphic depiction of numbers arise?' and more importantly, 'Why?' What led to the key innovations in graphs and diagrams that are commonplace today? What were the circumstances or scientific problems that made visual depiction more useful than mere words and numbers? Finally, how did these graphic inventions make a difference in comprehending natural and social phenomena and communicating that understanding?" (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"[...] scatterplots had advantages over earlier graphic forms: the ability to see clusters, patterns, trends, and relations in a cloud of points. Perhaps most importantly, it allowed the addition of visual annotations (point symbols, lines, curves, enclosing contours, etc.) to make those relationships more coherent and tell more nuanced stories." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"The general principles of starting with a well-defined question, engaging in careful observation, and then formulating hypotheses and assessing the strength of evidence for and against them became known as the scientific method." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"The plotting of real data had a remarkable, and largely unanticipated, benefit. It often forced viewers to see what they hadn’t expected. The frequency with which this happened gave birth to the empirical modern approach to science which welcomes the plotting of observed data values with the goal of investigating suggestive patterns." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Visual displays of empirical information are too often thought to be just compact summaries that, at their best, can clarify a muddled situation. This is partially true, as far as it goes, but it omits the magic. […] sometimes, albeit too rarely, the combination of critical questions addressed by important data and illuminated by evocative displays can achieve a transcendent, and often wholly unexpected, result. At their best, visualizations can communicate emotions and feelings in addition to cold, hard facts."  (Michael Friendly. "Milestones in the history of thematic cartography, statistical graphics, and data visualization", 2008) 

"We are accustomed to intellectual diffusion taking place from the natural and physical sciences into the social sciences; certainly that is the direction taken for both calculus and the scientific method. But statistical graphics in particular, and statistics in general, took the reverse route." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"We live on islands surrounded by seas of data. Some call it 'big data'. In these seas live various species of observable phenomena. Ideas, hypotheses, explanations, and graphics also roam in the seas of data and can clarify the waters or allow unsupported species to die. These creatures thrive on visual explanation and scientific proof. Over time new varieties of graphical species arise, prompted by new problems and inner visions of the fishers in the seas of data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

On Method VI (Scientific Method)

"[...] scientific method is simply the attempt to acquire knowledge of general laws directly or indirectly by experience, by the use of our five senses. The only limitations that can be assigned to the applicability of this process are those due to the character of experience. Anything that is logically related to experience by discoverable laws and is capable of description in general terms can be dealt with by the scientific method." (Arthur D Ritchie, "Scientific Method: An Inquiry Into the Character and Validity of Natural Laws", 1923)

"Science attempts to establish an understanding of all types of phenomena. Many different explanations can sometimes be given that agree qualitatively with experiments or observations. However, when theory and experiment quantitatively agree, then we can usually be more confident in the validity of the theory. In this manner mathematics becomes an integral part of the scientific method." (Richard Haberman, "Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow", 1998)

"Scientific method is not much different from our day-to-day ways of learning about the world. Without really thinking about the steps or the standards, common sense invokes the same process of evidence and reasoning as scientists more explicitly follow." (Peter Kosso, "A Summary of Scientific Method", 2011)

"Scientific method is the gateway into scientific discoveries that in turn prompt technological advances and cultural influences." (Hugh G Gauch Jr., "Scientific Method in Brief", 2012)

"The traditional scientific method is hypothesis driven. The researcher formulates a theory of how the world works, and then seeks to support or reject this hypothesis based on data." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Its primary function was to make previously invisible phenomena subject to direct inspection in a graphic display […] The graphic method had another function, that of communication to the scientific community and educated readers. These displays made complex phenomena palpable and concrete." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"The general principles of starting with a well-defined question, engaging in careful observation, and then formulating hypotheses and assessing the strength of evidence for and against them became known as the scientific method." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"We are accustomed to intellectual diffusion taking place from the natural and physical sciences into the social sciences; certainly that is the direction taken for both calculus and the scientific method. But statistical graphics in particular, and statistics in general, took the reverse route." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

06 October 2024

On Construction VII: Mental Models

"Physics is the attempt at the conceptual construction of a model of the real world and its lawful structure." (Albert Einstein, [letter to Moritz Schlick] 1931)

"[…] the process of scientific discovery may be regarded as a form of art. This is best seen in the theoretical aspects of Physical Science. The mathematical theorist builds up on certain assumptions and according to well understood logical rules, step by step, a stately edifice, while his imaginative power brings out clearly the hidden relations between its parts. A well-constructed theory is in some respects undoubtedly an artistic production." (Ernest Rutherford, 1932)

"In the realm of physics it is perhaps only the theory of relativity which has made it quite clear that the two essences, space and time, entering into our intuition, have no place in the world constructed by mathematical physics. Colours are thus 'really' not even æther-vibrations, but merely a series of values of mathematical functions in which occur four independent parameters corresponding to the three dimensions of space, and the one of time." (Hermann Weyl, "Space, Time, Matter", 1952)

"In physics it is usual to give alternative theoretical treatments of the same phenomenon. We construct different models for different purposes, with different equations to describe them. Which is the right model, which the 'true' set of equations? The question is a mistake. One model brings out some aspects of the phenomenon; a different model brings out others. Some equations give a rougher estimate for a quantity of interest, but are easier to solve. No single model serves all purposes best." (Nancy Cartwright, "How the Laws of Physics Lie", 1983)

"Physics is like that. It is important that the models we construct allow us to draw the right conclusions about the behaviour of the phenomena and their causes. But it is not essential that the models accurately describe everything that actually happens; and in general it will not be possible for them to do so, and for much the same reasons. The requirements of the theory constrain what can be literally represented. This does not mean that the right lessons cannot be drawn. Adjustments are made where literal correctness does not matter very much in order to get the correct effects where we want them; and very often, as in the staging example, one distortion is put right by another. That is why it often seems misleading to say that a particular aspect of a model is false to reality: given the other constraints that is just the way to restore the representation." (Nancy Cartwright, "How the Laws of Physics Lie", 1983)

"[…] most earlier attempts to construct a theory of complexity have overlooked the deep link between it and networks. In most systems, complexity starts where networks turn nontrivial. No matter how puzzled we are by the behavior of an electron or an atom, we rarely call it complex, as quantum mechanics offers us the tools to describe them with remarkable accuracy. The demystification of crystals-highly regular networks of atoms and molecules-is one of the major success stories of twentieth-century physics, resulting in the development of the transistor and the discovery of superconductivity. Yet, we continue to struggle with systems for which the interaction map between the components is less ordered and rigid, hoping to give self-organization a chance." (Albert-László Barabási, "Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life", 2002)

"Just as physicists have created models of the atom based on observed data and intuitive synthesis of the patterns in their data, so must designers create models of users based on observed behaviors and intuitive synthesis of the patterns in the data. Only after we formalize such patterns can we hope to systematically construct patterns of interaction that smoothly match the behavior patterns, mental models, and goals of users. Personas provide this formalization." (Alan Cooper et al, "About Face 3: The Essentials of Interaction Design", 2007)

"There are actually two sides to the success of mathematics in explaining the world around us (a success that Wigner dubbed ‘the unreasonable effectiveness of mathematics’), one more astonishing than the other. First, there is an aspect one might call ‘active’. When physicists wander through nature’s labyrinth, they light their way by mathematics - the tools they use and develop, the models they construct, and the explanations they conjure are all mathematical in nature. This, on the face of it, is a miracle in itself. […] But there is also a ‘passive’ side to the mysterious effectiveness of mathematics, and it is so surprising that the 'active' aspect pales by comparison. Concepts and relations explored by mathematicians only for pure reasons - with absolutely no application in mind—turn out decades (or sometimes centuries) later to be the unexpected solutions to problems grounded in physical reality!" (Mario Livio, "Is God a Mathematician?", 2011)

On Construction VIII: Science

"Science gains from it [the pendulum] more than one can expect. With its huge dimensions, the apparatus presents qualities that one would try in vain to communicate by constructing it on a small [scale], no matter how carefully. Already the regularity of its motion promises the most conclusive results. One collects numbers that, compared with the predictions of theory, permit one to appreciate how far the true pendulum approximates or differs from the abstract system called 'the simple pendulum'." (Jean-Bernard-Léon Foucault, "Demonstration Experimentale du Movement de Rotation de la Terre", 1851)

"The invention of a new symbol is a step in the advancement of civilisation. Why were the Greeks, in spite of their penetrating intelligence and their passionate pursuit of Science, unable to carry Mathematics farther than they did? and why, having formed the conception of the Method of Exhaustions, did they stop short of that of the Differential Calculus? It was because they had not the requisite symbols as means of expression. They had no Algebra. Nor was the place of this supplied by any other symbolical language sufficiently general and flexible; so that they were without the logical instruments necessary to construct the great instrument of the Calculus." (George H Lewes "Problems of Life and Mind", 1873)

"We shall call this universal organizational science the 'Tektology'. The literal translation of this word from the Greek is 'the theory of construction'. 'Construction' is the most generaI and suitable synonym for the modern concept of 'organization'. [...] The aim of tektology is to systematize organizational experience; this science is clearly empirical and should draw its conclusions by way of induction." (Alexander Bogdanov, "Tektology: The Universal Organizational Science" Vol. I, 1913)

"It would be a mistake to suppose that a science consists entirely of strictly proved theses, and it would be unjust to require this. […] Science has only a few apodeictic propositions in its catechism: the rest are assertions promoted by it to some particular degree of probability. It is actually a sign of a scientific mode of thought to find satisfaction in these approximations to certainty and to be able to pursue constructive work further in spite of the absence of final confirmation." (Sigmund Freud, "Introductory Lectures on Psycho-Analysis", 1916)

"Science is a magnificent force, but it is not a teacher of morals. It can perfect machinery, but it adds no moral restraints to protect society from the misuse of the machine. It can also build gigantic intellectual ships, but it constructs no moral rudders for the control of storm tossed human vessel. It not only fails to supply the spiritual element needed but some of its unproven hypotheses rob the ship of its compass and thus endangers its cargo." (William J Bryan, "Undelivered Trial Summation Scopes Trial", 1925)

"Science aims at constructing a world which shall be symbolic of the world of commonplace experience." (Sir Arthur S Eddington, "The Nature of the Physical World", 1928)

"No doctrinal system in physical science, or indeed perhaps in any science, will alter its content of its own accord. Here we always need the pressure of outer circumstances. Indeed the more intelligible and comprehensive a theoretical system is the more obstinately it will resist all attempts at reconstruction or expansion." (Max Planck, "Where is Science Going?", 1932)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"Mathematics as an expression of the human mind reflects the active will, the contemplative reason, and the desire for aesthetic perfection. Its basic elements are logic and intuition, analysis and construction, generality and individuality. Though different traditions may emphasize different aspects, it is only the interplay of these antithetic forces and the struggle for their synthesis that constitute the life, usefulness, and supreme value of mathematical science." (Richard Courant & Herbert Robbins, "What Is Mathematics?", 1941)

"A theoretical science unaware that those of its constructs considered relevant and momentous are destined eventually to be framed in concepts and words that have a grip on the educated community and become part and parcel of the general world picture - a theoretical science [...]" (Erwin Schrödinger, "Are There Quantum Jumps?", The British Journal for the Philosophy of Science Vol. 3, 1952)

"[...] sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work - that is, correctly to describe phenomena from a reasonably wide area. Furthermore, it must satisfy certain aesthetic criteria - that is, in relation to how much it describes, it must be rather simple." (John von Neumann, "Method in the physical sciences", 1955)

"We realize, however, that all scientific laws merely represent abstractions and idealizations expressing certain aspects of reality. Every science means a schematized picture of reality, in the sense that a certain conceptual construct is unequivocally related to certain features of order in reality […]" (Ludwig von Bertalanffy, "General System Theory", 1968)

"Many people believe that reasoning, and therefore science, is a different activity from imagining. But this is a fallacy […] Reasoning is constructed with movable images just as certainly as poetry is." (Jacob Bronowski, "Visionary Eye", 1978)

"[…] the pursuit of science is more than the pursuit of understanding. It is driven by the creative urge, the urge to construct a vision, a map, a picture of the world that gives the world a little more beauty and coherence than it had before." (John A Wheeler, "Geons, Black Holes, and Quantum Foam: A Life in Physics", 1998)

On Constructs VII - Mind

"Ideas are substitutions which require a secondary process when what is symbolized by them is translated into the images and experiences it replaces; and this secondary process is frequently not performed at all, generally only performed to a very small extent. Let anyone closely examine what has passed in his mind when he has constructed a chain of reasoning, and he will be surprised at the fewness and faintness of the images which have accompanied the ideas." (George H Lewes "Problems of Life and Mind", 1873)

"The mind of man, learning consciously and unconsciously lessons of experience, gradually constructs a mental image of its surroundings - as the mariner draws a chart of strange coasts to guide him in future voyages, and to enable those that follow after him to sail the same seas with ease and safety." (William C Dampier, "The Recent Development of Physical Science", 1904)

"[…] we can only study Nature through our senses - that is […] we can only study the model of Nature that our senses enable our minds to construct; we cannot decide whether that model, consistent though it be, represents truly the real structure of Nature; whether, indeed, there be any Nature as an ultimate reality behind its phenomena." (William C Dampier, "The Recent Development of Physical Science", 1904)

"While the stuff from which our world picture is build is yielded exclusively from the sense organs as organs of the mind, so that every man's world picture is and always remains a construct of his mind and cannot be proved to have any other existence […]" (Erwin Schrodinger, "What is Life?", 1944)

"Science begins with the world we have to live in, accepting its data and trying to explain its laws. From there, it moves toward the imagination: it becomes a mental construct, a model of a possible way of interpreting experience." (Northrop Frye, "The Educated Imagination", 1964)

"We never have any understanding of any subject matter except in terms of our own mental constructs of 'things' and 'happenings' of that subject matter." (Douglas T Ross, "Structured analysis (SA): A language for communicating ideas", IEEE Transactions on Software Engineering Vol. 3 (1), 1977)

"Perhaps we all lose our sense of reality to the precise degree to which we are engrossed in our own work, and perhaps that is why we see in the increasing complexity of our mental constructs a means for greater understanding, even while intuitively we know that we shall never be able to fathom the imponderables that govern our course through life." (Winfried G Sebald, "The Rings of Saturn", 1995)

"The seemingly stable scene you normally see is really a mental model that you construct - the eyes are actually darting all around, producing a retinal image as jerky as an amateur video, and some of what you thought you saw was instead filled in from memory." (William H Calvin, "How Brains Think", 1996)

"A general limitation of the human mind is its imperfect ability to reconstruct past states of knowledge, or beliefs that have changed. Once you adopt a new view of the world (or any part of it), you immediately lose much of your ability to recall what you used to believe before your mind changed." (Daniel Kahneman, "Thinking, Fast and Slow", 2011)

Abraham Kaplan - Collected Quotes

"By and large, then, the important terms of any science are significant because of their semantics, not their syntax: they are not notational, but reach out to the world which gives the science its subject-matter. The meaning of such terms results rom a process of conceptualization of the subject-matter." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Concepts, then, mark out the paths by which we may move most freely in the logical space. They identify nodes or junctions in the network of relationships, termini at which we can halt while preserving the maximum range of choice as to where to go next." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Constructs are terms which, though not observational either directly or indirectly, may be applied and even defined on the basis of the observables. [...] Constructs, in other words, have systemic as well as observational meaning specified by horizontal rather than vertical connections Strictly speaking, the difference between constructs and theoretical terms can be localized in the nature of vertical connections alone:  for constructs the relation to observations is definitional, while for theoretical terms it is a matter of empirical fact." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Give a small boy a hammer, and he will find that everything he encounters needs pounding. It comes as no particular surprise to discover that a scientist formulates problems in a way which requires for their solution just those techniques in which he himself is especially skilled." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"In consequence, we are caught up in a paradox, one which might be called the paradox of conceptualization. The proper concepts are needed to formulate a good theory, but we need a good theory to arrive at the proper concepts." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Methods are techniques sufficient general to be common to all sciences, or to a significant part of them. Alternatively, they are logical or philosophical principles sufficiently specific to relate especially to science as distinguished from other human enterprises and interests. Thus, methods include such procedures as forming concepts and hypotheses making observations and measurements, performing experiments, building models and theories, providing explanations, and making predictions. The aim of methodology, then, is to describe and analyze these methods, throwing light on their limitations and resources, clarifying their presupposition and consequences, relating their potentialities to the twilight zone at the frontiers of knowledge." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"[…] statistical techniques are tools of thought, and not substitutes for thought." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"The autonomy of inquiry is in no way incompatible with the mature dependency of the several sciences on one another. Nor does this autonomy imply that the individual scientist is accountable only to himself." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"We are caught up in a paradox, one which might be called the paradox of conceptualization. The proper concepts are needed to formulate a good theory, but we need a good theory to arrive at the proper concepts." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"What knowledge requires of experience, and what experience provides, is an independence of our mere think-so. The pleasure principle governing the life of the infant gives way to the reality principle as wishes encounter obstacles to their fulfilment. The word 'object', it has been said, can be understood as referring to that which objects. That is objective which insists on its own rights regardless of our wishes, and only experience can transmit its claims to us." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"What we call 'intuition' is any logic-in-use which is (1) preconscious, and (2) outside the inference schema for which we reconstructions. We speak of intuition, in short, M hen neither we nor the discoverer himself knows quite how he arrives at his discoveries, while the frequency or pattern of their occurrence makes us reluctant to ascribe them merely to chance." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

Book available in the Internet Archive.

05 October 2024

From Parts to Wholes (1980-1989)

"The principle that whole entities exhibit properties which are meaningful only when attributed to the whole, not to its parts - e.g. the smell of ammonia. Every model of human activity system exhibits properties as a whole entity which derive from it component activities and their structure, but cannot be reduced to them." (Peter Checkland, "Systems Thinking, Systems Practice", 1981)

"[Hierarchy is] the principle according to which entities meaningfully treated as wholes are built up of smaller entities which are themselves wholes […] and so on. In hierarchy, emergent properties denote the levels." (Peter Checkland, "Systems Thinking, Systems Practice", 1981)

"Degradation of a system as a whole does not mean that all its elements are beginning to disintegrate. Regress is a contradictory process: the whole falls apart but certain elements in it may progress. What is more, a system as a whole may progress while certain of its elements fall into decay. Thus, the progressive development of biological forms as a whole goes hand in hand with the degradation of certain species." (Alexander Spirkin, "Dialectical Materialism", 1983)

"Structure is the type of connection between the elements of a whole. […] . Structure is a composite whole, or an internally organised content. […] Structure implies not only the position of its elements in space but also their movement in time, their sequence and rhythm, the law of mutation of a process. So structure is actually the law or set of laws that determine a system's composition and functioning, its properties and stability." (Alexander Spirkin, "Dialectical Materialism", 1983)

"The defining attribute of harmony is a relationship between the elements of the whole in which the development of one of them is a condition for the development of the others or vice versa. In art, harmony may be understood as a form of relationship in which each element, while retaining a relative independence, contributes greater expressiveness to the whole and, at the same time and because of this, more fully expresses its own essence. Beauty may be defined as harmony of all the parts, united by that to which they belong in such a way that nothing can be added or taken away or changed without detriment to the whole." (Alexander Spirkin, "Dialectical Materialism", 1983)

02 October 2024

On Numbers: Large Numbers II

"A good description of the data summarizes the systematic variation and leaves residuals that look structureless. That is, the residuals exhibit no patterns and have no exceptionally large values, or outliers. Any structure present in the residuals indicates an inadequate fit. Looking at the residuals laid out in an overlay helps to spot patterns and outliers and to associate them with their source in the data." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"Factoring big numbers is a strange kind of mathematics that closely resembles the experimental sciences, where nature has the last and definitive word. […] as with the experimental sciences, both rigorous and heuristic analyses can be valuable in understanding the subject and moving it forward. And, as with the experimental sciences, there is sometimes a tension between pure and applied practitioners." (Carl B Pomerance, "A Tale of Two Sieves", The Notices of the American Mathematical Society 43, 1996)

"Clearly, the mean is greatly influenced by extreme values, but it can be appropriate for many situations where extreme values do not arise. To avoid misuse, it is essential to know which summary measure best reflects the data and to use it carefully. Understanding the situation is necessary for making the right choice. Know the subject!" (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Big numbers warn us that the problem is a common one, compelling our attention, concern, and action. The media like to report statistics because numbers seem to be 'hard facts' - little nuggets of indisputable truth. [...] One common innumerate error involves not distinguishing among large numbers. [...] Because many people have trouble appreciating the differences among big numbers, they tend to uncritically accept social statistics (which often, of course, feature big numbers)." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005) 

"Outliers or influential data points can be defined as data values that are extreme or atypical on either the independent (X variables) or dependent (Y variables) variables or both. Outliers can occur as a result of observation errors, data entry errors, instrument errors based on layout or instructions, or actual extreme values from self-report data. Because outliers affect the mean, the standard deviation, and correlation coefficient values, they must be explained, deleted, or accommodated by using robust statistics." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Comparisons are the lifeblood of empirical studies. We can’t determine if a medicine, treatment, policy, or strategy is effective unless we compare it to some alternative. But watch out for superficial comparisons: comparisons of percentage changes in big numbers and small numbers, comparisons of things that have nothing in common except that they increase over time, comparisons of irrelevant data. All of these are like comparing apples to prunes." (Gary Smith, "Standard Deviations", 2014)

"It is not enough to give a single summary for a distribution - we need to have an idea of the spread, sometimes known as the variability. [...] The range is a natural choice, but is clearly very sensitive to extreme values [...] In contrast the inter-quartile range (IQR) is unaffected by extremes. This is the distance between the 25th and 75th percentiles of the data and so contains the ‘central half’ of the numbers [...] Finally the standard deviation is a widely used measure of spread. It is the most technically complex measure, but is only really appropriate for well-behaved symmetric data since it is also unduly influenced by outlying values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

Previous Post <<||>> Next Post

On Statisticians (2000 -)

"Statistics is, or should be, about scientific investigation and how to do it better, but many statisticians believe it is a branch of mathematics." (George Box, AmStat News 2000)

"We statisticians must accept much of the blame for cavalier attitudes toward Type I errors. When we teach practitioners in other scientific fields that multiplicity is not important, they believe us, and feel free to thrash their data set mercilessly, until it finally screams “uncle” and relinquishes significance. The recent conversion of the term 'data mining' to mean a statistical good rather than a statistical evil also contributes to the problem." (Peter H Westfall, "Applied Statistics in Agriculture", Proceedings of the 13th annual conference), 2001)

"At its core statistics is not about cleverness and technique, but rather about honesty. Its real contribution to society is primarily moral, not technical. It is about doing the right thing when interpreting empirical information. Statisticians are not the world’s best computer scientists, mathematicians, or scientific subject matter specialists. We are (potentially, at least) the best at the principled collection, summarization, and analysis of data." (Stephen B Vardeman & Max D Morris, "Statistics and Ethics: Some Advice for Young Statisticians", The American Statistician vol 57, 2003)

"Today [...] we have high-speed computers and prepackaged statistical routines to perform the necessary calculations [...] statistical software will no more make one a statistician than would a scalpel turn one into a neurosurgeon. Allowing these tools to do our thinking for us is a sure recipe for disaster." (Phillip Good & Hardin James, "Common Errors in Statistics and How to Avoid Them", 2003)

"Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algorithms are more scalable than statisticians ever thought possible. Formal statistical theory is more pervasive than computer scientists had realized." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004

"Statistics - A subject which most statisticians find difficult but which many physicians are experts on." (Stephen Senn, "Statistical Issues in Drug Development" 2nd Ed, 2007)

"[...] statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment." (Kaiser Fung, "Numbers Rule the World", 2010)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010)

"The p-value is a concept so misaligned with intuition that no civilian can hold it firmly in mind. Nor can many statisticians." (Matt Briggs, "Why do statisticians answer silly questions that no one ever asks?", Significance Vol. 9(1), 2012)

"Diagrams furnish only approximate information. They do not add anything to the meaning of the data and, therefore, are not of much use to a statistician or research worker for further mathematical treatment or statistical analysis. On the other hand, graphs are more obvious, precise and accurate than the diagrams and are quite helpful to the statistician for the study of slopes, rates of change and estimation, (interpolation and extrapolation), wherever possible." (S C Gupta & Indra Gupta, "Business Statistics", 2013)

"Good design is an important part of any visualization, while decoration (or chart-junk) is best omitted. Statisticians should also be careful about comparing themselves to artists and designers; our goals are so different that we will fare poorly in comparison." (Hadley Wickham, "Graphical Criticism: Some Historical Notes", Journal of Computational and Graphical Statistics Vol. 22(1), 2013) 

"Missing data is the blind spot of statisticians. If they are not paying full attention, they lose track of these little details. Even when they notice, many unwittingly sway things our way. Most ranking systems ignore missing values." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Statisticians set a high bar when they assign a cause to an effect. [...] A model that ignores cause–effect relationships cannot attain the status of a model in the physical sciences. This is a structural limitation that no amount of data - not even Big Data - can surmount." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Optimization is more than finding the best simulation results. It is itself a complex and evolving field that, subject to certain information constraints, allows data scientists, statisticians, engineers, and traders alike to perform reality checks on modeling results." (Chris Conlan, "Automated Trading with R: Quantitative Research and Platform Development", 2016)

"The tricky part is that there aren’t really any hard- and- fast rules when it comes to identifying outliers. Some economists say an outlier is anything that’s a certain distance away from the mean, but in practice it’s fairly subjective and open to interpretation. That’s why statisticians spend so much time looking at data on a case-by-case basis to determine what is - and isn’t - an outlier." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"Some scientists (e.g., econometricians) like to work with mathematical equations; others (e.g., hard-core statisticians) prefer a list of assumptions that ostensibly summarizes the structure of the diagram. Regardless of language, the model should depict, however qualitatively, the process that generates the data - in other words, the cause-effect forces that operate in the environment and shape the data generated." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Statisticians are sometimes dismissed as bean counters. The sneering term is misleading as well as unfair. Most of the concepts that matter in policy are not like beans; they are not merely difficult to count, but difficult to define. Once you’re sure what you mean by 'bean', the bean counting itself may come more easily. But if we don’t understand the definition, then there is little point in looking at the numbers. We have fooled ourselves before we have begun."(Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

Related Posts Plugin for WordPress, Blogger...

On Data: Longitudinal Data

  "Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One o...