sports celebration.jpg

Week 17: Regression to the Mean

Why the champion team won't win next season and early results in science are so often wrong.

 

What is regression to the mean? Definition and examples.

If your favorite team won the championship last year, what does that mean for their chances for winning next season? This is an important question, often with money or pride on the line (The League, anyone?). To the extent this is due to skill (the team is in good condition, top coach etc.), their win signals that it's more likely they'll win next year. But the greater the extent this is due to luck (other teams embroiled in a drug scandal, favourable draw, draft picks turned out well etc.), the less likely it is they'll win next year. This is because of the statistical concept of regression to the mean. 

What is regression to the mean?

Suppose you run some tests and get some results (some extremely good, some extremely bad, and some in the middle). Because there’s some chance involved in running them, when you run the test again on the ones that were both extremely good and bad, they’re more likely to be closer to the ones in the middle. That’s regression to the mean.

A toy example

Imagine you’re a teacher and set your students a true/false test with 100 questions, and your students, clever as they are, flip a coin to choose an answer: heads = true; tails = false. You would expect the average of test scores to be 50. Of course, through sheer luck, some students will score significantly above 50 and some substantially below 50. If you naively took your top performing 10% of students and give them a second test using the same strategy, the mean score would be expected to be close to 50. Thus your top performing students would “regress” all the way back to the mean of all students who took the original test.

If, on the other hand, there’s no chance involved with your students test scores, you would expect there to be no regression to the mean and the top 10% of students to be the same in the first and second test. Most situations are in between these two extremes, and you expect there to be some regression to the mean (and how much depends on how much chance there is involved, or how noisy it is).

Other examples of regression to the mean

In science

If one trial suggests that health chemical YK7483 is outperforming all other treatments for lymphatic filariasis (looking this up is not for the faint-hearted), you shouldn’t put all your faith in that result. When you do a second test of YK7483, it’s more likely to be closer to the mean the second time you test it. If you took the value at face value, and didn’t project for the fact that it will likely regress to the mean, you’d misplace your money. In one systematic study of this effect John Ioannidis analyzed "49 of the most highly regarded research findings in medicine over the previous 13 years" and found 16% of the studies were contradicted, 16% had effects that were smaller in the second study than in the first, 24% remained largely unchallenged and only 44% were replicated. And recall, these are the most highly regarded research findings which you would expect to be more reliable, not just any old sample. 

In life

Your organisation has a great quarter, meeting and exceeding all the targets set. If the underlying reasons for its performance are unchanged, it will do less well the next quarter. 

All of that might be a bit depressing, but consider that the opposite is also true. Abnormally bad events are likely to be less bad the next time the happen! If last year was a horrible year for you, you should expect things to get better. If your favorite team finished in last place in the previous season, they should do better this year! 

Also check out

  1. Why the tails come apart, LessWrong
  2. Hot hand fallacy, Wikipedia
  3. Law of large numbers, Wikipedia

The teaching example paraphrased from the Wikipedia entry. 

Get one concept every week in your inbox