"A study that leaves out data is waving a big red flag. A decision to include or exclude data sometimes makes all the difference in the world. This decision should be based on the relevance and quality of the data, not on whether the data support or undermine a conclusion that is expected or desired."
"A very different - and very incorrect - argument is that
successes must be balanced by failures (and failures by successes) so that
things average out. Every coin flip that lands heads makes tails more likely.
Every red at roulette makes black more likely. […] These beliefs are all
incorrect. Good luck will certainly not continue indefinitely, but do not
assume that good luck makes bad luck more likely, or vice versa."
"Another way to secure statistical significance is to use the
data to discover a theory. Statistical tests assume that the researcher starts
with a theory, collects data to test the theory, and reports the
results - whether statistically significant or not. Many people work in the other
direction, scrutinizing the data until they find a pattern and then making up a
theory that fits the pattern."
"Comparisons are the lifeblood of empirical studies. We can’t
determine if a medicine, treatment, policy, or strategy is effective unless we
compare it to some alternative. But watch out for superficial comparisons: comparisons
of percentage changes in big numbers and small numbers, comparisons of things
that have nothing in common except that they increase over time, comparisons of
irrelevant data. All of these are like comparing apples to prunes."
"Data clusters are everywhere, even in random data. Someone
who looks for an explanation will inevitably find one, but a theory that fits a
data cluster is not persuasive evidence. The found explanation needs to make
sense and it needs to be tested with uncontaminated data."
"Data without theory can fuel a speculative stock market
bubble or create the illusion of a bubble where there is none. How do we tell
the difference between a real bubble and a false alarm? You know the answer: we
need a theory. Data are not enough. […] Data without theory is alluring, but
misleading."
"Don’t just do the calculations. Use common sense to see whether you are answering the correct question, the assumptions are reasonable, and the results are plausible. If a statistical argument doesn’t make sense, think about it carefully - you may discover that the argument is nonsense."
"Graphs should not be mere decoration, to amuse the easily bored. A useful graph displays data accurately and coherently, and helps us understand the data. Chartjunk, in contrast, distracts, confuses, and annoys. Chartjunk may be well-intentioned, but it is misguided. It may also be a deliberate attempt to mystify."
"How can we tell the difference between a good theory and quackery? There are two effective antidotes: common sense and fresh data. If it is a ridiculous theory, we shouldn’t be persuaded by anything less than overwhelming evidence, and even then be skeptical. Extraordinary claims require extraordinary evidence. Unfortunately, common sense is an uncommon commodity these days, and many silly theories have been seriously promoted by honest researchers."
"If somebody ransacks data to find a pattern, we still need a theory that makes sense. On the other hand, a theory is just a theory until it is tested with persuasive data."
"[…] many gamblers believe in the fallacious law of averages because they are eager to find a profitable pattern in the chaos created by random chance."
"Numbers are not inherently tedious. They can be illuminating, fascinating, even entertaining. The trouble starts when we decide that it is more important for a graph to be artistic than informative."
"Provocative assertions are provocative precisely because they are counterintuitive - which is a very good reason for skepticism."
"Regression does not describe changes in ability that happen as time passes […]. Regression is caused by performances fluctuating about ability, so that performances far from the mean reflect abilities that are closer to the mean."
"Remember that even random coin flips can yield striking,
even stunning, patterns that mean nothing at all. When someone shows you a
pattern, no matter how impressive the person’s credentials, consider the
possibility that the pattern is just a coincidence. Ask why, not what. No
matter what the pattern, the question is: Why should we expect to find this
pattern?"
"Self-selection bias occurs when people choose to be in the data - for example, when people choose to go to college, marry, or have children. […] Self-selection bias is pervasive in 'observational data', where we collect data by observing what people do. Because these people chose to do what they are doing, their choices may reflect who they are. This self-selection bias could be avoided with a controlled experiment in which people are randomly assigned to groups and told what to do."
"Self-selection bias occurs when we compare people who made
different choices without thinking about why they made these choices. […] Our
conclusions would be more convincing if choice was removed […]"
"The omission of zero magnifies the ups and downs in the data, allowing us to detect changes that might otherwise be ambiguous. However, once zero has been omitted, the graph is no longer an accurate guide to the magnitude of the changes. Instead, we need to look at the actual numbers." (Gary Smith, "Standard Deviations", 2014)
"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged."
"We are genetically predisposed to look for patterns and to believe that the patterns we observe are meaningful. […] Don’t be fooled into thinking that a pattern is proof. We need a logical, persuasive explanation and we need to test the explanation with fresh data."
"We are hardwired to make sense of the world around us - to
notice patterns and invent theories to explain these patterns. We underestimate
how easily pat - terns can be created by inexplicable random events - by good
luck and bad luck."
"We are seduced by patterns and we want explanations for these patterns. When we see a string of successes, we think that a hot hand has made success more likely. If we see a string of failures, we think a cold hand has made failure more likely. It is easy to dismiss such theories when they involve coin flips, but it is not so easy with humans. We surely have emotions and ailments that can cause our abilities to go up and down. The question is whether these fluctuations are important or trivial."
"We naturally draw conclusions from what we see […]. We should also think about what we do not see […]. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward."
"We encounter regression in many contexts - pretty much whenever we see an imperfect measure of what we are trying to measure. Standardized tests are obviously an imperfect measure of ability. [...] Each experimental score is an imperfect measure of 'ability', the benefits from the layout. To the extent there is randomness in this experiment - and there surely is - the prospective benefits from the layout that has the highest score are probably closer to the mean than was the score." (Gary Smith, "Standard Deviations", 2014)
"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers."
No comments:
Post a Comment