Professors offer six ways to tease the truth out of stats
Betsey Stevenson and Justin Wolfers, a pair of professors in the public policy and economics departments at the University of Michigan, weren’t addressing sports when they wrote Six Ways to Separate Lies From Statistics for Bloomberg.com last week.
But you may have noticed that sports and statistics aren’t exactly strangers. Sportswriters can learn a lot from the piece.
The news hook for Stevenson and Wolfers was a study that found major errors in a 2010 Harvard study that purported to show a correlation between high national debt and slow growth. The Harvard study has been widely used to support austerity measures.
Sportswriters rarely deal with issues as momentous as whether a government should enact austerity measures in an attempt to get a slow economy moving.
I’m kidding, of course. Sports is way more important than that stuff.
And with the influence of the sabermetric movement in baseball and its spread to other sports, we increasingly see arguments that arise from complex data. Anyone who made it through the third grade ought to be comfortable comparing yards-per-carry, points-per-game or batting averages, but what’s a liberal arts major to do when presented a thesis that grew out of sophisticated number crunching?
Stevens and Wofers address this, and it’s as if they were talking about old-school sportswriters who dismiss statistical analysis by saying the statheads should get their heads out of a spreadsheet—located, no doubt, in mom’s basement—and watch a game:
Given the complexity, it’s understandable that people might fall for the old aphorism that “liars figure and figures lie,” that you can say anything with statistics. But this is silly. You can say anything in English, too. Indeed, our nation’s opinion pages are filled with slanted nonsense written entirely in English.
Sports pages too.
Here, in brief, are their six rules separating “the useful research from the dross.” You should read the piece for the full explanations:
- Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion.
- Don’t confuse “statistically significant” with something actually mattering.
- Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists.
- Don’t fall into the trap of thinking about an empirical finding as “right” or “wrong.” At best, data provide an imperfect guide.
- Don’t mistake correlation for causation.
- Always ask “so what?”
I would put 5 and 6 at the top. I think mistaking correlation for causation is the single biggest mistake sportswriters make with stats. Think of the truism that football teams should “establish the run” because teams that outrush their opponents tend to win games. In fact, teams that are ahead in games tend to run a lot, so the truism is backwards: Winning causes an advantage in rushing yards, not vice-versa.
The question “so what” comes in very handy when amateur statisticians—many of whom work in the truck for TV broadcasters—start throwing numbers at you.
I’d add a related question: What’s the context? As David Grabiner pointed out in a blog post titled “The Sabermetric Manifesto”:
No statistic can be useful without proper context, a measure of opportunities. There were more crimes committed in New York than in Boston last year, but this doesn’t say much about the relative safety of the cities; to make such a comparison, you would need to compare crime rates.