Simpson’s Paradox — Another Example of the Power of Understanding
Posted by: Mathew Crawford in Economics, Education, MathematicsAn old friend of mine from high school math team, Natalie Haynes, recently mentioned that she impressed her boss at work by identifying a Simpson’s paradox that helped the company she works for better understand some important data. Simpson’s paradox is unfortunately very common. As data begins to get the least bit complex (which happens very quickly with the inclusion of variables that aren’t perfectly correlated), Simpson’s paradox necessarily becomes inherent in some slicing of the data — so long as you look at the data from the “wrong angle”. Even worse, that wrong angle very often masquerades as the right angle because it’s usually a “simple angle”.
The prevalence of Simpson’s paradox motivated the way I wrote the MIST Academy conditional probability lessons, which include one example of a Simpson’s paradox, then challenges the reader to unravel a storyline problem where the reader’s opinion of the data is typically anchored by the “wrong angle” and a lack of understanding of Bayesian statistics.
Natalie’s mention of her triumph spotting a Simpson’s paradox at work also helped me recognize something I’ve been doing for a while almost instinctively and without thought: I mentally correct for Simpson’s paradox or imagine potential corrections.
I recall vaguely the point at which I began to understand grammar rules well. By “well” I specifically mean that I started spotting grammar mistakes everywhere: newspapers and magazines in particularly — not to mention advertisements and anywhere people might write in complete sentences. As recognition of errors became more natural and automatic, it began to irritate me that I spotted quite so many errors by professional writers. Then came acceptance, yadda yadda yadda.
The same evolution of understanding, recognition, and acceptance took place as I learned about conditional probability and analogous statistical concepts. Now I see poor (or at least hastily accepted) interpretations of data everywhere. It’s almost ubiquitous. Several times while in college I spotted Simpson’s paradox in papers posted on doors in the Psychology building and offered to talk to researchers about the problem, which was the first time I really and truly got the first-hand feeling that not all scientists like to discover that they might be wrong. Though to be fair I should mention that I’ve had much better conversations with most scientists about these kinds of data problems.
Here’s an example from a professional economist:
Does that mean that college isn’t worth it? Not exactly. In fact, given the crappy economy, a college degree is more valuable than ever, a point that Levitt makes in a recent Freakonomics Radio podcast. The most telling statistic as to the value of college: the unemployment rate among college graduates is less than half (4.5%) than people with only a high school diploma (9.7%). (See the BLS employment status table here.)
There may be arguments in favor of the high tuition costs at universities, but this one fails hard, and it’s disturbing to see a highly noted economist (and one whose works I enjoy) use it as the “most telling statistic”. Other variables need to be considered. On average, college graduates have higher IQs, more money, and better understanding of the structure of the society they live in to begin with than those with less education, so there is a clear selection bias. If college involved involved nothing but football games and keg stands we would expect college graduates to achieve higher levels of employment. Even the nature of the jobs considered matters. Recently the increase in minimum wage knocked over a million jobs out of the economy. Few if any of those jobs included college educated workers, but plenty included those with a high school diploma, setting up a clear conditional that must be resolved before the statistics above are particularly meaningful. Really, these only begin to scratch the surface of my best arguments that important conditionals are missing in this case — they’re just the ones that are easiest to point out in one paragraph. A whole book could be written on the conditionals that affect the above statistics and its damning to the economics community that few economists seem to recognize that. This point should at the very least be mentioned, and the moment it gets mentioned the economist would have to step back and reevaluate the story told by the statistics.
It may certainly be the case that college is economically worth the high costs, but any assertion that these statistics anchor the debate is absurd.
I see it everywhere. Eventually I fell prey to acceptance. I see it everywhere. I sometimes complain about it to anyone who will listen (not that often). I do nothing about it.
Well, I did eventually write some really cool classes about it for my students. Now it’s a standard part of the curriculum I use with students as young as 10 years old. These really are some of the more creative classes I’ve written, so I’m particularly proud of them. But the reason I spent so much time thinking about them (and plan to spend even more in the near future) is because I’m shaking off the acceptance that the world needs to be this way. It doesn’t. And there is opportunity!
People will always misinterpret statistics, but motivated problem solvers can find ways to minimize the impact of the problem. I once considered building a medical data consulting service that I figured would be based primarily on hunting down and extinguishing the damage caused by Simpson’s paradox and analogous Bayesian misunderstandings in health care/medicine — which is quite extensive and costly both in terms of dollars and lives! I’ve decided to both blog about this idea and encourage others to take up this cause professionally so that I don’t have to change vocations to make a difference.
Health care is a slam dunk example. You’d have a hard time convincing me that we can’t save billions of dollars annually in health care costs if we set some really smart people about the task of ferreting out all the problematic interpretations of data. But there are examples of damaging data misinterpretations all over the world — including particularly valuable problems to solve in the business world. In fact, an old friend of mine just found one…
Entries (RSS)