Why Bayesian Statistics
Bayesian analysis changed how I think about statistics. So why do I feel like a sleazy car salesman when I pitch it?
Because I’m selling a tool without knowing what problem you’re trying to solve. Like any tool, Bayes has limits. So instead of listing features, I’ll show you a problem where it shines.
The problem
When COVID hit, my team at Enveritas helped governments in developing countries measure how fast the disease spread. We wanted to know: what percentage of people in Jimma, Ethiopia had COVID?
Early on, no one knew. Governments couldn’t decide what to do without data.
We chose antibody tests. People with antibodies had been infected. Test weekly, track the spread.
The base rate fallacy
One problem: antibody tests are not perfect. They’re actually quite shitty.
Take a good test: it catches everyone who has antibodies (0% false negatives) and only flags 5% of healthy people by mistake (5% false positives).
You test positive. What’s the chance you had COVID?
Most people say 95%. Wrong. It depends on how common COVID is.
If half the population has it, then yes—95% chance. But if only 2% have it (early pandemic levels), your odds drop to 29%.
The base rate matters. Our brains skip it. That’s the base rate fallacy.
Enter Bayes
Bayes’ theorem fixes this:
P(antibodies | +) = P(+ | antibodies) × P(antibodies) / P(+)
Where:
- antibodies: you actually have them
- +: you tested positive
Our assumptions:
Numerator:
- P(COVID) = 2%
- P(+ | COVID) = 100% — the test catches all cases
Denominator:
We break down P(+):
P(+) = P(+ | COVID) × P(COVID) + P(+ | no COVID) × P(no COVID)
- P(+ | no COVID) = 5% — false positive rate
- P(no COVID) = 98%
Plug it in:
P(COVID | +) = (100% × 2%) / (100% × 2% + 5% × 98%) ≈ 29%
A positive test means only 29% odds you had COVID. Crazy low for a “good” test.
How can a test with 5% false positives and 0% false negatives tell you so little?
If you expected more certainty, you fell for the base rate fallacy. Most people do.1
Why this happens
When COVID is rare (2%), false positives outnumber true positives. The math is simple: 5% of the 98% without COVID is bigger than 100% of the 2% with it.
Here’s an example. Imagine an island of 100 people where COVID never arrived—call it COFREE. Test everyone: you get about 5 positives, all false.
Now one person gets infected. Test again: still about 5 false positives, plus 1 true positive. Six positive tests total. Which one is real? You can’t tell. That’s why a positive test only means a 1-in-6 chance (≈17%) of actual infection.
How Bayes saved our project
Back to Jimma. We couldn’t just count positives—that would overcount infections. We couldn’t ignore the data either.
Bayes let us flip the question. Instead of “what’s the chance someone has COVID given a positive test?” we asked: “given these test results, what’s the true infection rate?”
We built a model that accounted for test errors. It gave us the real infection rate from the messy data, with honest error bars.
That’s what I love about Bayes: it forces you to state your assumptions, update them with evidence, and face your uncertainty. No hiding behind p-values.
We told policymakers: “Prevalence is probably between X% and Y%. Here’s our confidence. Here’s what would change it.” Better than one number that could be ten times wrong.
If you’ve ever looked at a test result and wondered “what does this really mean?"—reach for Bayes. Not because it’s fancy. Because it makes you count what you know and what you don’t.
Aubrey Clayton’s Bernoulli’s Fallacy traces how this error has wrecked scientific research for centuries. ↩︎