22 April 2021

On Data (2000-2009)

"Building statistical models is just like this. You take a real situation with real data, messy as this is, and build a model that works to explain the behavior of real data." (Martha Stocking, New York Times, 2000)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No matter what the data, and no matter how the values are arranged and presented, you must always use some method of analysis to come up with an interpretation of the data.
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"While all data contain noise, some data contain signals. Before you can detect a signal, you must filter out the noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] you simply cannot make sense of any number without a contextual basis. Yet the traditional attempts to provide this contextual basis are often flawed in their execution. [...] Data have no meaning apart from their context. Data presented without a context are effectively rendered meaningless." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical modeling: The two cultures" Statistical Science 16(3), 2001)

"The more data we have, the more likely we are to drown in it." (Nassim N Taleb, "Fooled by Randomness", 2001)

"Data is a fact of life. As time goes by, we collect more and more data, making our original reason for collecting the data harder to accomplish. We don't collect data just to waste time or keep busy; we collect data so that we can gain knowledge, which can be used to improve the efficiency of our organization, improve profit margins, and on and on. The problem is that as we collect more data, it becomes harder for us to use the data to derive this knowledge. We are being suffocated by this raw data, yet we need to find a way to use it." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Good communication is not just data transfer. You need to show people something that addresses their anxieties, that accepts their anger, that is credible in a very gut-level sense, and that evokes faith in the vision." (John Kotter, "The Heart of Change: Real-Life Stories of How People Change Their Organizations", 2002)

"Data is transformed into graphics to understand. A map, a diagram are documents to be interrogated. But understanding means integrating all of the data. In order to do this it’s necessary to reduce it to a small number of elementary data. This is the objective of the 'data treatment' be it graphic or mathematic." (Jacques Bertin, [interview] 2003)

"The use of computers shouldn't ignore the objectives of graphics, that are: 
 1) Treating data to get information. 
 2) Communicating, when necessary, the information obtained." (Jacques Bertin, [interview] 2003)

"Thought, without the data on which to structure that thought, leads nowhere." (Victor J Stenger, "Has Science Found God?: The Latest Results in the Search for Purpose in the Universe", 2003)

"[...] because observations are all we have, we take them seriously. We choose hard data and the framework of mathematics as our guides, not unrestrained imagination or unrelenting skepticism, and seek the simplest yet most wide-reaching theories capable of explaining and predicting the outcome of today’s and future experiments." (Brian Greene, "The Fabric of the Cosmos", 2004)

"The only way to look into the future is use theories since conclusive data is only available about the past." (Clayton Christensen et al., "Seeing What’s Next: Using the Theories of Innovation to Predict Industry Change", 2004)

"[...] when data is presented in certain ways, the patterns can be readily perceived. If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception‐based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading." (Colin Ware, "Information Visualization: Perception for Design" 2nd Ed., 2004)

"The best scientists aren't the ones who know the most data; they're the ones who know what they're looking for." (Noam Chomsky, [Guardian] 2005)

"With our heads spinning in the world of coincidence and chaos, we nevertheless must make decisions and take steps into the minefield of our future. To avoid explosive missteps, we rely on data and statistical reasoning to inform our thinking." (Michael Starbird, "Coincidences, Chaos, and All That Math Jazz", 2005)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

On Data: Longitudinal Data

  "Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One o...