"Statisticians are sometimes dismissed as bean counters. The sneering term is misleading as well as unfair. Most of the concepts that matter in policy are not like beans; they are not merely difficult to count, but difficult to define. Once you’re sure what you mean by 'bean', the bean counting itself may come more easily. But if we don’t understand the definition, then there is little point in looking at the numbers. We have fooled ourselves before we have begun." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)
"The whole discipline of statistics is built on measuring or counting things. […] it is important to understand what is being measured or counted, and how. It is surprising how rarely we do this. Over the years, as I found myself trying to lead people out of statistical mazes week after week, I came to realize that many of the problems I encountered were because people had taken a wrong turn right at the start. They had dived into the mathematics of a statistical claim - asking about sampling errors and margins of error, debating if the number is rising or falling, believing, doubting, analyzing, dissecting - without taking the ti- me to understand the first and most obvious fact: What is being measured, or counted? What definition is being used?" (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)
"There are many ways for error to creep into facts and figures that seem entirely straightforward. Quantities can be miscounted. Small samples can fail to accurately reflect the properties of the whole population. Procedures used to infer quantities from other information can be faulty. And then, of course, numbers can be total bullshit, fabricated out of whole cloth in an effort to confer credibility on an otherwise flimsy argument. We need to keep all of these things in mind when we look at quantitative claims. They say the data never lie - but we need to remember that the data often mislead." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)
"Mathematicians extract a shape’s homology from its chain complex, which provides structured data about the shape’s component parts and their boundaries - exactly what you need to describe holes in every dimension. […] The definition of homology is rigid enough that a computer can use it to find and count holes, which helps establish the rigor typically required in mathematics. It also allows researchers to use homology for an increasingly popular pursuit: analyzing data." (Kelsey Houston-Edwards, "How Mathematicians Use Homology to Make Sense of Topology", Quanta Magazine, 2021)
"Adjusting scale is an important practice in data visualization. While the log transform is versatile, it doesn’t handle all situations where skew or curvature occurs. For example, at times the values are all roughly the same order of magnitude and the log transformation has little impact. Another transformation to consider is the square root transformation, which is often useful for count data." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)
"With qualitative data, the bar plot serves a similar role to the histogram. The bar plot gives a visual presentation of the “popularity” or frequency of different groups. However, we cannot interpret the shape of the bar plot in the same way as a histogram. Tails and symmetry do not make sense in this setting. Also, the frequency of a category is represented by the height of the bar, and the width carries no information. The two bar charts that follow display identical information about the number of breeds in a category; the only difference is in the width of the bars. In the extreme, the rightmost plot eliminates the bars entirely and represents each count by a single dot." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)
No comments:
Post a Comment