"Building statistical models is just like this. You take a real situation with real data, messy as this is, and build a model that works to explain the behavior of real data." (Martha Stocking, New York Times, 2000)"
"Numeracy is the ability to process, interpret and communicate numerical, quantitative, spatial, statistical, even mathematical information, in ways that are appropriate for a variety of contexts, and that will enable a typical member of the culture or subculture to participate effectively in activities that they value." (Jeff Evans,"Adults´ Mathematical Thinking and Emotion", 2000)
"One of the remarkable aspects of the distribution of prime numbers is their tendency to exhibit global regularity and local irregularity. The prime numbers behave like the ‘ideal gases"’which physicists are so fond of. Considered from an external point of view, the distribution is -in broad terms -deterministic, but as soon as we try to describe the situation at a given point, statistical fluctuations occur as in a game of chance where it is known that on average the heads will match the tail but where, at any one moment, the next throw cannot be predicted." (Gerald Tenenbaum & Michael M France,"The Prime Numbers and Their Distribution", 2000)
"The role of graphs in probabilistic and statistical modeling is threefold: (1) to provide convenient means of expressing substantive assumptions; (2) to facilitate economical representation of joint probability functions; and (3) to facilitate efficient inferences from observations." (Judea Pearl, "Causality: Models, Reasoning, and Inference", 2000)
"All human knowledge - including statistics - is created through people's actions; everything we know is shaped by our language, culture, and society. Sociologists call this the social construction of knowledge. Saying that knowledge is socially constructed does not mean that all we know is somehow fanciful, arbitrary, flawed, or wrong. For example, scientific knowledge can be remarkably accurate, so accurate that we may forget the people and social processes that produced it." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)
"Innumeracy - widespread confusion about basic mathematical ideas - means that many statistical claims about social problems don't get the critical attention they deserve. This is not simply because an innumerate public is being manipulated by advocates who cynically promote inaccurate statistics. Often, statistics about social problems originate with sincere, well-meaning people who are themselves innumerate; they may not grasp the full implications of what they are saying. Similarly, the media are not immune to innumeracy; reporters commonly repeat the figures their sources give them without bothering to think critically about them." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)
"One goal of statistics is to extract information from the data about the underlying mechanism producing the data. The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)
"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems. [...] The best samples are those that come as close as possible to being random." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)
"Statistical mechanics is the science of predicting the observable properties of a many-body system by studying the statistics of the behaviour of its individual constituents, be they atoms, molecules, photons etc. It provides the link between macroscopic and microscopic states. […] classical thermodynamics. This is a subject dealing with the very large. It describes the world that we all see in our daily lives, knows nothing about atoms and molecules and other very small particles, but instead treats the universe as if it were made up of large-scale continua. […] quantum mechanics. This is the other end of the spectrum from thermodynamics; it deals with the very small. It recognises that the universe is made up of particles: atoms, electrons, protons and so on. One of the key features of quantum mechanics, however, is that particle behaviour is not precisely determined" (if it were, it would be possible to compute, at least in principle, all past and future behaviour of particles, such as might be expected in a classical view). Instead, the behaviour is described through the language of probabilities." (A Mike Glazer & Justin S Wark, "Statistical Mechanics: A survival guide", 2001)
"The goals in statistics are to use data to predict and to get information about the underlying data mechanism. Nowhere is it written on a stone tablet what kind of model should be used to solve problems involving data. To make my position clear, I am not against data models per se. In some situations they are the most appropriate way to solve the problem. But the emphasis needs to be on the problem and on the data." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)
"The roots of statistics, as in science, lie in working with data and checking theory against data." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)
"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools." (Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16(3), 2001)
"Every statistical procedure relies on certain assumptions for correctness. Errors in testing hypotheses come about either because the assumptions underlying the chosen test are not satisfied or because the chosen test is less powerful than other competing procedures."(Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)
"Most statistical procedures rely on two fundamental assumptions: that the observations are independent of one another and that they are identically distributed. If your methods of collection fail to honor these assumptions, then your analysis must fail also." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)
"One can be highly functionally numerate without being a mathematician or a quantitative analyst. It is not the mathematical manipulation of numbers" (or symbols representing numbers) that is central to the notion of numeracy. Rather, it is the ability to draw correct meaning from a logical argument couched in numbers. When such a logical argument relates to events in our uncertain real world, the element of uncertainty makes it, in fact, a statistical argument." (Eric R Sowey,"The Getting of Wisdom: Educating Statisticians to Enhance Their Clients' Numeracy", The American Statistician 57(2), 2003)
"The greatest error associated with the use of statistical procedures is to make the assumption that one single statistical methodology can suffice for all applications. […] But one methodology can never be better than another, nor can estimation replace hypothesis testing or vice versa. Every methodology has a proper domain of application and another set of applications for which it fails. Every methodology has its drawbacks and its advantages, its assumptions, and its sources of error." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)
"The vast majority of errors in Statistics - and not incidentally, in most human endeavors - arise from a reluctance (or even an inability) to plan. Some demon (or demonic manager) seems to be urging us to cross the street before we’ve had the opportunity to look both ways. Even on those rare occasions when we do design an experiment, we seem more obsessed with the mechanics than with the concepts that underlie it." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)
"Use statistics as a guide to decision making rather than a mandate." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)
"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious" (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"In the physics of complex systems we can introduce a statistical concept, a measure of randomness, called entropy. In a quiet equilibrium, like hot onion soup sitting in a thermos bottle with no escaping heat, the entropy remains constant in time. However, in violent nonequilibrium processes, like shattering glass or explosions, the entropy always increases. Essentially, entropy, as a measure of randomness, will always increase when a very ordered initial condition leads to a very disordered final state through the normal laws of physics. The fact that entropy at best stays the same in equilibrium, or increases in all other processes, is called the second law of thermodynamics." (Leon M Lederman & Christopher T Hill, "Symmetry and the Beautiful Universe", 2004)
"Probability is a mathematical language for quantifying uncertainty." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"Statistical inference, or 'learning' as it is called in computer science, is the process of using data to infer the distribution that generated the data." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)
"[…] studying methods for parametric models is useful for two reasons. First, there are some cases where background knowledge suggests that a parametric model provides a reasonable approximation. […] Second, the inferential concepts for parametric models provide background for understanding certain nonparametric methods." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"The frequentist point of view is based on the following postulates:" (F1) Probability refers to limiting relative frequencies. Probabilities are objective properties of the real world." (F2) Parameters are i xed, unknown constants. Because they are not fluctuating, no useful probability statements can be made about parameters." (F3) Statistical procedures should be designed to have well-defined long run frequency properties." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algorithms are more scalable than statisticians ever thought possible. Formal statistical theory is more pervasive than computer scientists had realized." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)
"One way of generating hypotheses is to collect data and look for patterns. Often, however, it is difficult to see any pattern from a set of data, which may just be a list of numbers. Graphs and descriptive statistics are very useful for summarising and displaying data in ways that may reveal patterns." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)
"Probability is sometimes called the language of statistics. […] The probability of an event occurring might be described as the likelihood of it happening. […] In a formal sense the word "probability" is used only when an event or experiment is repeatable and the long term likelihood of a certain outcome can be determined." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)
"Quantum physicists today are reconciled to randomness at the individual event level, but to expect causality to underlie statistical quantum phenomena is reasonable. Suppose a person shakes an ink pen such that ink spots are formed on a white wall, in what appears for all intents and purposes, randomly. Let us further suppose the random ink spots accumulate to form precise pictures of different known persons' faces every time. We will not regard the overall result to be a happenchance; we are apt to suspect there must be a 'method' to the person who is shaking the ink pen." (Ravi Gomatam) [response to Nobel Laureate Steven Weinberg's article "Einstein's Mistakes", Physics Today Vol. 59 (4), 2005]
"Statistics is the branch of mathematics that uses observations and measurements called data to analyze, summarize, make inferences, and draw conclusions based on the data gathered." (Allan G Bluman,"Probability Demystified", 2005)
"The basic idea of going from an estimate to an inference is simple. Drawing the conclusion with confidence, and measuring the level of confidence, is where the hard work of professional statistics comes in." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)
"The use of a t-test makes three assumptions. The first is that the data are normally distributed. The second is that each sample has been taken at random from its respective population and the third is that for an independent sample test, the variances are the same. It has, however, been shown that t-tests are actually very ‘robust’ – that is, they will still generate statistics that approximate the t distribution and give realistic probabilities even when the data show considerable departure from normality and when sample variances are dissimilar." (Steve McKillup, "Statistics Explained: An Introductory Guide for Life Scientists", 2005)
"Two things explain the importance of the normal distribution:" (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'." (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)
"With our heads spinning in the world of coincidence and chaos, we nevertheless must make decisions and take steps into the minefield of our future. To avoid explosive missteps, we rely on data and statistical reasoning to inform our thinking." (Michael Starbird, "Coincidences, Chaos, and All That Math Jazz", 2005)
"An essential feature of mathematics and statistics, particularly at a higher level, is the use of shorthand notation for a variety of concepts and measures. While this can be a strength in terms of providing conciseness and precision, statistical notation often proves to be an obstacle for learners in the early stages of learning." (Alan Graham, "Developing Thinking in Statistics", 2006)
"However statistically improbable the entity you seek to explain by invoking a designer, the designer himself has got to be at least as improbable." (Richard Dawkins, "The God Delusion", 2006)
"It is common to think of statistical graphics and data visualization as relatively modern developments in statistics. In fact, the graphic representation of quantitative information has deep roots. These roots reach into the histories of the earliest map-making and visual depiction, and later into thematic cartography, statistics and statistical graphics, medicine, and other fields. Along the way, developments in technologies (printing, reproduction) mathematical theory and practice, and empirical observation and recording, enabled the wider use of graphics and new advances in form and content." (Michael Friendly. "A brief history of data visualization", 2006)
"It makes no sense to seek a single best way to represent knowledge - because each particular form of expression also brings its particular limitations. For example, logic-based systems are very precise, but they make it hard to do reasoning with analogies. Similarly, statistical systems are useful for making predictions, but do not serve well to represent the reasons why those predictions are sometimes correct." (Marvin Minsky, "The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind", 2006)
"Physically, the stability of the dynamics is characterized by the sensitivity to initial conditions. This sensitivity can be determined for statistically stationary states, e.g. for the motion on an attractor. If this motion demonstrates sensitive dependence on initial conditions, then it is chaotic. In the popular literature this is often called the 'Butterfly Effect', after the famous 'gedankenexperiment' of Edward Lorenz: if a perturbation of the atmosphere due to a butterfly in Brazil induces a thunderstorm in Texas, then the dynamics of the atmosphere should be considered as an unpredictable and chaotic one. By contrast, stable dependence on initial conditions means that the dynamics is regular." (Ulrike Feudel et al, "Strange Nonchaotic Attractors", 2006)
"Sometimes the most important fit statistic you can get is ‘convergence not met’ - it can tell you something is wrong with your model." (Oliver Schabenberger, "Applied Statistics in Agriculture Conference", 2006)
"Statistics can certainly pronounce a fact, but they cannot explain it without an underlying context, or theory. Numbers have an unfortunate tendency to supersede other types of knowing. […] Numbers give the illusion of presenting more truth and precision than they are capable of providing." (Ronald J Baker, "Measure what Matters to Customers: Using Key Predictive Indicators", 2006)
"What sets statistics apart from the rest of mathematics is that in statistics events occur under conditions of uncertainty. Whereas in pure mathematics all even numbers possess the property of evenness, a statistical variable may take a range of different values that are usually unpredictable in advance." (Alan Graham, "Developing Thinking in Statistics", 2006)
"Frequentist statistics assumes that there is a 'true' state of the world" (e.g. the difference between species in predation probability) which gives rise to a distribution of possible experimental outcomes. The Bayesian framework says instead that the experimental outcome - what we actually saw happen - is the truth, while the parameter values or hypotheses have probability distributions. The Bayesian framework solves many of the conceptual problems of frequentist statistics: answers depend on what we actually saw and not on a range of hypothetical outcomes, and we can legitimately make statements about the probability of different hypotheses or parameter values." (Ben Bolker, "Ecological Models and Data in R", 2007)"
"Most modern statistics uses an approach called maximum likelihood estimation, or approximations to it. For a particular statistical model, maximum likelihood finds the set of parameters" (e.g. seed removal rates) that makes the observed data" (e.g. the particular outcomes of predation trials) most likely to have occurred. Based on a model for both the deterministic and stochastic aspects of the data, we can compute the likelihood" (the probability of the observed outcome) given a particular choice of parameters. We then find the set of parameters that makes the likelihood as large as possible, and take the resulting maximum likelihood estimates" (MLEs) as our best guess at the parameters." (Ben Bolker, "Ecological Models and Data in R", 2007)
"Normally distributed variables are everywhere, and most classical statistical methods use this distribution. The explanation for the normal distribution’s ubiquity is the Central Limit Theorem, which says that if you add a large number of independent samples from the same distribution the distribution of the sum will be approximately normal." (Ben Bolker, "Ecological Models and Data in R", 2007)
"[…] statistical thinking, though powerful, is never as easy or automatic as simply plugging numbers into formulas. In order to use statistical methods appropriately, you need to understand their logic, not just the computing rules." (Ann E Watkins et al,"Statistics in Action: Understanding a World of Data", 2007)
"The dichotomy of mathematical vs. statistical modeling says more about the culture of modeling and how different disciplines go about thinking about models than about how we should actually model ecological systems. A mathematician is more likely to produce a deterministic, dynamic process model without thinking very much about noise and uncertainty" (e.g. the ordinary differential equations that make up the Lotka-Volterra predator prey model). A statistician, on the other hand, is more likely to produce a stochastic but static model, that treats noise and uncertainty carefully but focuses more on static patterns than on the dynamic processes that produce them" (e.g. linear regression)." (Ben Bolker, "Ecological Models and Data in R", 2007)
"It is impossible to construct a model that provides an entirely accurate picture of network behavior. Statistical models are almost always based on idealized assumptions, such as independent and identically distributed" (i.i.d.) interarrival times, and it is often difficult to capture features such as machine breakdowns, disconnected links, scheduled repairs, or uncertainty in processing rates." (Sean Meyn, "Control Techniques for Complex Networks", 2008)
"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)
"[…] statistics is the key discipline for predicting the future or for making inferences about the unknown, or for producing convenient summaries of data." (David J Hand,"Statistics: A Very Short Introduction", 2008)
"[Statistics] is the technology of extracting meaning from data." (David J Hand,"Statistics: A Very Short Introduction", 2008)
"Probability is the science of uncertainty. It provides precise mathematical rules for understanding and analyzing our own ignorance." (Michael J Evans & Jeffrey S Rosenthal, "Probability and Statistics: The Science of Uncertainty", 2009)
"Statistics is the art of learning from data. It is concerned with the collection of data, their subsequent description, and their analysis, which often leads to the drawing of conclusions." (Sheldon M Ross,"Introductory Statistics" 3rd Ed., 2009)
"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)
No comments:
Post a Comment