20 November 2025

On Statistics (2010-2019)

"Need to consider outliers as they can affect statistics such as means, standard deviations, and correlations. They can either be explained, deleted, or accommodated" (using either robust statistics or obtaining additional data to fill-in). Can be detected by methods such as box plots, scatterplots, histograms or frequency distributions." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Outliers or influential data points can be defined as data values that are extreme or atypical on either the independent" (X variables) or dependent" (Y variables) variables or both. Outliers can occur as a result of observation errors, data entry errors, instrument errors based on layout or instructions, or actual extreme values from self-report data. Because outliers affect the mean, the standard deviation, and correlation coefficient values, they must be explained, deleted, or accommodated by using robust statistics." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010)

"Statistics is the discipline of using data samples to support claims about populations." (Allen B Downey,"Think Stats: Probability and Statistics for Programmers", 2011)

"Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions." (Ron Larson & Betsy Farber,"Elementary Statistics: Picturing the World" 5th Ed., 2011)

"Most importantly, much of statistics involves clear thinking rather than numbers. And much, at least much of the statistical principles that reporters can most readily apply, is good sense." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Regression analysis, like all forms of statistical inference, is designed to offer us insights into the world around us. We seek patterns that will hold true for the larger population. However, our results are valid only for a population that is similar to the sample on which the analysis has been done." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Statistical cognition is concerned with obtaining cognitive evidence about various statistical techniques and ways to present data. It’s certainly important to choose an appropriate statistical model, use the correct formulas, and carry out accurate calculations. It’s also important, however, to focus on understanding, and to consider statistics as communication between researchers and readers." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Statistical cognition is the empirical study of how people understand, and misunderstand, statistical concepts and presentations." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Statistical inference is really just the marriage of two concepts that we’ve already discussed: data and probability" (with a little help from the central limit theorem)." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Statistical inference is the drawing of conclusions about the world" (more specifically: about some population) from our sample data." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Statistics cannot be any smarter than the people who use them. And in some cases, they can make smart people do dumb things." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"[… ] statistics is about understanding the role that variability plays in drawing conclusions based on data. […] Statistics is not about numbers; it is about data - numbers in context. It is the context that makes a problem meaningful and something worth considering." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"[…] statistics is a method of pursuing truth. At a minimum, statistics can tell you the likelihood that your hunch is true in this time and place and with these sorts of people. This type of pursuit of truth, especially in the form of an event’s future likelihood, is the essence of psychology, of science, and of human evolution." (Arthhur Aron et al, "Statistics for Phsychology" 6th Ed., 2012)

"Statistics is the scientific discipline that provides methods to help us make sense of data. […] The field of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation." (Roxy Peck & Jay L Devore, "Statistics: The Exploration and Analysis of Data" 7th Ed, 2012)

"The central limit theorem tells us that in repeated samples, the difference between the two means will be distributed roughly as a normal distribution." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. [...] Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. […] Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models. [...] Statistical inference assumes that you have a sample that is known to have come from one universe." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"The law of large numbers is a law of mathematical statistics. It states that when random samples are sufficiently large they match the population extremely closely. […] The 'law' of small numbers is a widespread human misconception that even small samples match the population closely." (Geoff Cumming, "Understanding the New Statistics", 2012)

"The predictive value of any theory relies on the constancy of the underlying relations among variables. Our analyses also fail to fully capture systems that develop chaos, in which the tiniest change in the initial conditions may produce entirely different end results, prohibiting any long- term predictions. Mathematicians have developed statistics and probability to deal with such shortcomings, but mathematics itself is limited […]" (Mario Livio, "Why Math Works", ["The Best Writing of Mathematics: 2012"] 2012)

"Complexity has the propensity to overload systems, making the relevance of a particular piece of information not statistically significant. And when an array of mind-numbing factors is added into the equation, theory and models rarely conform to reality." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"Part of a meaningful quantitative analysis is to look at models and try to figure out their deficiencies and the ways in which they can be improved. A more subtle challenge for statistical methods is to explore systematically potential modeling errors in order to assess the quality of the model predictions. This kind of uncertainty about the adequacy of a model or model family is not only relevant for econometricians outside the model but potentially also for agents inside the models." (Lars P Hansen, "Uncertainty Outside and Inside Economic Models", [Nobel lecture] 2013)

"Statistics is the art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data." (Alan Agresti & Christine Franklin,"Statistics: The Art and Science of Learning from Data" 3rd Ed., 2013)

"The field of economics is not exempt from the consequences of chaos and complexity. Marketplaces are indeterminate; value is subjective; and outcomes are subject to interpretation. Economic forecasting is just as nebulous, being based on the probability of statistical information that may or may not be accurate." (Lawrence K Samuels, "Defense of Chaos", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Don’t just do the calculations. Use common sense to see whether you are answering the correct question, the assumptions are reasonable, and the results are plausible. If a statistical argument doesn’t make sense, think about it carefully - you may discover that the argument is nonsense." (Gary Smith, "Standard Deviations", 2014)

"In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality - a point well made by George Box in his oft-cited remark that all models are wrong, but some are useful"." (David Hand, "Wonderful examples, but let's not close our eyes", Statistical Science 29, 2014)

"In the absence of clear information - in the absence of reliable statistics - people did what they had always done: filtered available information through the lens of their worldview." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics is a science that helps us make decisions and draw conclusions in the presence of variability." (Douglas C Montgomery & George C Runger,"Applied Statistics and Probability for Engineers" 6th Ed., 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"Typical symbols used in mathematics are operationals, groupings, relations, constants, variables, functions, matrices, vectors, and symbols used in set theory, logic, number theory, probability, and statistics. Individual symbols may not have much effect on a mathematician’s creative thinking, but in groups they acquire powerful connections through similarity, association, identity, resemblance and repeated imagery. ¿ey may even create thoughts that are below awareness." (Joseph Mazur. "Enlightening symbols: a short history of mathematical notation and its hidden powers", 2014)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers." (Gary Smith, "Standard Deviations", 2014)

"A system which is usually composed of large number of possibly heterogeneous interacting agents, which are seen to exhibit emergent behavior. Emergence implies that system level behavior" (macro level) cannot be inferred from observation of individual level behavior of its constituents" (micro level). This absence of explicit links between the micro and macro levels makes complex systems especially difficult to analyze using traditional statistical and analytical techniques to study the dynamics of behavior. One typically requires the use of bottom up simulation based methods to study such systems. Complex systems are ubiquitous - markets, societies, social networks, the Internet, weather, ecosystems, are just a few examples." (Stephen E Glavin & Abhijit Sengupta, "Modelling of Consumer Goods Markets: An Agent-Based Computational Approach", Handbook of Research on Managing and Influencing Consumer Behavior, 2015)

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Statistics is an integral part of the quantitative approach to knowledge. The field of statistics is concerned with the scientific study of collecting, organizing, analyzing, and drawing conclusions from data." (Kandethody M Ramachandran & Chris P Tsokos,"Mathematical Statistics with Applications in R" 2nd Ed., 2015)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

 "Statistics can be defined as a collection of techniques used when planning a data collection, and when subsequently analyzing and presenting data." (Birger S Madsen,"Statistics for Non-Statisticians", 2016)

"Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. […] Statistics is the science of learning from data." (Moore McCabe & Alwan Craig,"The Practice of Statistics for Business and Economics" 4th Ed., 2016)

"Optimization is more than finding the best simulation results. It is itself a complex and evolving field that, subject to certain information constraints, allows data scientists, statisticians, engineers, and traders alike to perform reality checks on modeling results." (Chris Conlan, "Automated Trading with R: Quantitative Research and Platform Development", 2016)

"Random means without reason - unpredictable - lawless. That little word random describes a key difference between ordinary classical mechanics and quantum mechanics. […] In classical physics only ignorance of the fine details or lack of control over them causes statistical randomness […] In principle, though not in practice, randomness is absent from classical physics." (Hans C von Baeyer, "QBism: The future of quantum physics", 2016)

"Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. […] Statistics is the science of learning from data." (Moore McCabe & Alwan Craig, "The Practice of Statistics for Business and Economics" 4th Ed., 2016)

"The foundations of a discipline are inseparable from the rules of its game, without which there is no discipline, just idle talk. The foundations of science reside in its epistemology, meaning that they lie in the mathematical formulation of knowledge, structured experimentation, and statistical characterization of validity. Rules impose limitations. These may be unpleasant, but they arise from the need to link ideas in the mind to natural phenomena. The mature scientist must overcome the desire for intuitive understanding and certainty, and must live with stringent limitations and radical uncertainty." (Edward R Dougherty, "The Evolution of Scientific Knowledge: From certainty to uncertainty", 2016)

"Many of us feel intimidated by numbers and so we blindly accept the numbers we’re handed. This can lead to bad decisions and faulty conclusions. We also have a tendency to apply critical thinking only to things we disagree with. In the current information age, pseudo-facts masquerade as facts, misinformation can be indistinguishable from true information, and numbers are often at the heart of any important claim or decision. Bad statistics are everywhere." (Daniel J Levitin, "Weaponized Lies", 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"One final warning about the use of statistical models" (whether linear or otherwise): The estimated model describes the structure of the data that have been observed. It is unwise to extend this model very far beyond the observed data." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"One way to lie with statistics is to compare things - datasets, populations, types of products - that are different from one another, and pretend that they’re not. As the old idiom says, you can’t compare apples with oranges." (Daniel J Levitin, "Weaponized Lies", 2017)

"Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions." (Michael Sullivan, "Statistics: Informed Decisions Using Data", 5th Ed., 2017)

"Statistics is the science that relates data to specific questions of interest. This includes devising methods to gather data relevant to the question, methods to summarize and display the data to shed light on the question, and methods that enable us to draw answers to the question that are supported by the data." (William M Bolstad & James M Curran, "Introduction to Bayesian Statistics" 3rd Ed., 2017)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

"The central limit conjecture states that most errors are the result of many small errors and, as such, have a normal distribution. The assumption of a normal distribution for error has many advantages and has often been made in applications of statistical models." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"What properties should a good statistical estimator have? Since we are dealing with probability, we start with the probability that our estimate will be very close to the true value of the parameter. We want that probability to become greater and greater as we get more and more data. This property is called consistency. This is a statement about probability. It does not say that we are sure to get the right answer. It says that it is highly probable that we will be close to the right answer." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"When we use algebraic notation in statistical models, the problem becomes more complicated because we cannot 'observe' a probability and know its exact number. We can only estimate probabilities on the basis of observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Again, classical statistics only summarizes data, so it does not provide even a language for asking [a counterfactual] question. Causal inference provides a notation and, more importantly, offers a solution. As with predicting the effect of interventions [...], in many cases we can emulate human retrospective thinking with an algorithm that takes what we know about the observed world and produces an answer about the counterfactual world." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Just as they did thirty years ago, machine learning programs" (including those with deep neural networks) operate almost entirely in an associational mode. They are driven by a stream of observations to which they attempt to fit a function, in much the same way that a statistician tries to fit a line to a collection of points. Deep neural networks have added many more layers to the complexity of the fitted function, but raw data still drives the fitting process. They continue to improve in accuracy as more data are fitted, but they do not benefit from the 'super-evolutionary speedup'. " (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Some scientists" (e.g., econometricians) like to work with mathematical equations; others" (e.g., hard-core statisticians) prefer a list of assumptions that ostensibly summarizes the structure of the diagram. Regardless of language, the model should depict, however qualitatively, the process that generates the data - in other words, the cause-effect forces that operate in the environment and shape the data generated." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"Any fool can fit a statistical model, given the data and some software. The real challenge is to decide whether it actually fits the data adequately. It might be the best that can be obtained, but still not good enough to use." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"This common view of statistics as a basic ‘bag of tools’ is now facing major challenges. First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems [...]. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

On Accuracy (1800-1899)

"Statistical accounts are to be referred to as a dictionary by men of riper years, and by young men as a grammar, to teach them the rel...