Wednesday, 11 September 2013

More on individuals and averages

I seem to keep hitting on this idea: the individual may not be well described by the average. In fact, it's possible for no individual to be well described by an average. It's an important point because it really strikes at the heart of where a lot of science reporting goes wrong.

I'm not the only one saying this: Jamil Zaki, a psychologist at Stanford, has a great post on a Scientific American blog going into detail about exactly this idea. His point is that psychology deals with averages, and sometimes there's a lot of variation around those averages that isn't often reported.

Zaki only discusses psychology in his article, but of course the idea extends beyond that. Any science that deals with populations and tries to extract generalized information from them carries the same caveat. "Populations" as I'm using it don't even have to be people; they could be animals or even stars, or events, or days. In other words, most of science, including all of economics and medicine, is covered here.

So the weather in one season in one part of the world may not be well described by a global average temperature (It was minus 20 here yesterday! What happened to global warming, eh?) Your risk of cardiac disease may not be well described by the average for other people with similar habits and backgrounds to you. It's even true that, if you smoke, the decrease in your lifespan may not be well described by the average decrease in life expectancy for smokers.

The average is simply one measure of a population. It might be a good way of describing things; it might not be. As an example, I could take the average height of my family. Adding myself, my spouse, and our toddler, and dividing by three gives me something around four and a half feet. That's not anywhere near any of our heights; in this case the average is simply an irrelevant measure.

More commonly, the average isn't a bad measure per se, it's just incomplete. What you usually need is the average, plus some indication of how spread out the population is around that average. The standard deviation is one such measure.

Any reputable scientific paper will have calculated many measures for the population it's looking at. Here's a list of, among other things, various measures that can be applied to a population. It's a little bewildering, which is likely why most media reports focus on one number and strip away the complicating details.

What to make of all this? Well, don't start smoking. Even if there is a certain amount of variance in the data, it's foolish to assume that you'll be an outlier. For well-established health issues, the average is, more often then not, a good guide.

Moving outside of that, if the study is new it's always worth asking, what's the variation around the average they're reporting? We looked at a study a while back in which the a connection between autism and induced labour was reported; the actual research paper showed that the variance around their average results was so large that it threatened to undermine the conclusions.

Knowing when an average is a bad measure is a little harder. Often when this happens the person reporting the average is deliberately using a poor measure to make themselves look better. Economic data is a prime example. Whenever you see the GDP per person, unadjusted for inflation, you can safely discount that number as worthless. The average simply isn't a good measure for the typical person's income. Politicians report it, though, because it's a quantity that governments can reliably increase through monetary policy, even if life for the typical person hasn't changed.

The key point here is that populations are complex. Any time you see them reduced to a single number, it's worth asking, "What am I missing here?" And, as Zaki points out, it is not always about you.

No comments:

Post a Comment