26 October 2024

Howard Wainer - Collected Quotes

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984) 

"The essence of a graphic display is that a set of numbers having both magnitudes and an order are represented by an appropriate visual metaphor - the magnitude and order of the metaphorical representation match the numbers. We can display data badly by ignoring or distorting this concept." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984)

"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"A graphic display has many purposes, but it achieves its highest value when it forces us to see what we were not expecting." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Nothing that had been produced before was even close. Even today, after more than two centuries of graphical experience, Playfair’s graphs remain exemplary standards for clearcommunication of quantitative phenomena. […] Graphical forms were available before Playfair, but they were rarely used to plot empirical information." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Oftentimes a statistical graphic provides the evidence for a plausible story, and the evidence, though perhaps only circumstantial, can be quite convincing. […] But such graphical arguments are not always valid. Knowledge of the underlying phenomena and additional facts may be required." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Placing a fact within a context increases its value greatly. […] . An efficacious way to add context to statistical facts is by embedding them in a graphic. Sometimes the most helpful context is geographical, and shaded maps come to mind as examples. Sometimes the most helpful context is temporal, and time-based line graphs are the obvious choice. But how much time? The ending date (today) is usually clear, but where do you start? The starting point determines the scale. […] The starting point and hence the scale are determined by the questions that we expect the graph to answer." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Simpson’s Paradox can occur whenever data are aggregated. If data are collapsed across a subclassification (such as grades, race, or age), the overall difference observed may not represent what is going on. Standardization can help correct this, but nothing short of random assignment of individuals to groups will prevent the possibility of yet another subclassificatiion, as yet unidentified, changing things around again. But I believe that knowing of the possibility helps us, so that we can contain the enthusiasm of our impulsive first inferences." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The appearance, and hence the perception, of any statistical graphic is massively influenced by the choice of scale. If the scale of the vertical axis is too narrow relative to the scale of the horizontal axis, random meanders look meaningful." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The difficult task of properly setting the scale of a graph remains difficult but not mysterious. There is agreement among experts spanning two hundred years. The default option should be to choose a scale that fills the plot with data. We can deviate from this under circumstances when it is better not to fill the plot with data, but those circumstances are usually clear. It is important to remember that the sin of using too small a scale is venial; the viewer can correct it. The sin of using too large a scale cannot be corrected without access to the original data; it can be mortal." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Usually the effectiveness of a good display increases with the complexity of the data. When there are only a few points, almost anything will do; even a pie chart with only three or four categories is usually comprehensible." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Thus when we look at, or prepare, a time-based statistical graphic, it is important to ask what is the right time scale, the right context, for the questions of greatest interest. The answer to this question is sometimes complex, but the very act of asking it provides us with some protection against surprises." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The only thing we know for sure about a missing data point is that it is not there, and there is nothing that the magic of statistics can do change that. The best that can be managed is to estimate the extent to which missing data have influenced the inferences we wish to draw." (Howard Wainer, "14 Conversations About Three Things", Journal of Educational and Behavioral Statistics Vol. 35(1, 2010)

"For an analyst to willfully avoid learning about the science is akin to malfeasance. Of course, it is likely that a deep understanding both of the science and of data analytic methods does not reside in the same person. When it does not, data analysis should be done jointly. It is my understanding that data mining is not often done as a team. This is unfortunate, for then it is too easy to miss what might have been found." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

"Too often there is a disconnect between the people who run a study and those who do the data analysis. This is as predictable as it is unfortunate. If data are gathered with particular hypotheses in mind, too often they (the data) are passed on to someone who is tasked with testing those hypotheses and who has only marginal knowledge of the subject matter. Graphical displays, if prepared at all, are just summaries or tests of the assumptions underlying the tests being done. Broader displays, that have the potential of showing us things that we had not expected, are either not done at all, or their message is not able to be fully appreciated by the data analyst." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)


No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

On Data: Longitudinal Data

  "Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One o...