What is signal and noise?
The signal is the meaningful information that you’re actually trying to detect. The noise is the random, unwanted variation or fluctuation that interferes with the signal. To get a sense of this, imagine trying to tune into a radio station. Ok, you don’t use radio anymore, so imagine your dad can’t call you to get help setting up his Spotify, so is trying to tune into a radio station. He turns the dial but it’s just picking up white noise and, after a few frustrating minutes, he manages to pick up a signal and tune into a station.
The same is true in statistics — there is something you’re trying to actually measure (say, how many Americans want to leave for Canada), but the data could be noisy (by including everyone who just makes a trip over the border to buy affordable medication). Noisy data are data from which it is hard to determine the true effect.
Examples of signal vs noise
If I speak German, for most people, there will be no signal, just noise, although Claus can detect the actual signal.
How accurate are the polls in predicting the election? If the data are noisy (for example, because it’s a small sample size, has low external validity, or small effect size), the poll numbers won’t correlate well with a change in the chance of a different President.
Does money make you happier? The signal (correlation between income and happiness) would be noisy because of confounders — you’d expect people who earn more to be happier because they are in positions of higher social status, they have better working conditions, being happier could cause people to be rich etc. Turns out there is some signal amongst the noise though.
Also check out
- The Signal and the Noise: Why So Many Predictions Fail – but Some Don't), Nate Silver
- Effect size, Wikipedia