Color | Signups |
---|---|
Black | 100 |
Blue | 25 |
Imagine you’ve just sent flyers for an event. You wanted to know if people were more likely to respond if you used blue instead of black, so you sent copies in both colors. You tracked who got which so you could tie it back. The signup date has come and gone, and you can now see the results.
Well that seems conclusive. 80% of our signups came from people who got the black version, so it’s time to order up a bunch of black ink for our next event. People clearly like it more, so blue ink can take a long walk off a short pier, as far as we’re concerned.
Except… now that you think about it, you didn’t have that much blue ink in the first place, so you made fewer copies. Maybe we need to take a look at how many of each we sent as well.
Color | Sent | Signups | Response Rate |
---|---|---|---|
Black | 1,000 | 100 | 10% |
Blue | 100 | 25 | 25% |
Well, glad we looked - the blue version had a much higher conversion rate, which will offset the extra cost of color printing. But we came awfully close to making the wrong conclusion. What happened?
It’s a fairly common mistake, but one we need to be wary of. We were only looking at the total number of signups, in which case we saw more black. However; we neglected the base population, that is, how different the total number of black vs blue were sent. The first question we were (unintentionally) answering was “given that they signed up, how likely is it they received a black or blue flyer”, when what we wanted to answer instead was “given the color flyer received, how likely is someone to sign up?”.
This type of thinking may show up in more consequential areas as well. For instance, we may learn that 53% of those killed in car accidents were wearing a seatbelt, and conclude that wearing one is basically a coin flip when it comes to safety. But when we learn that 90% of riders are wearing one (source), this changes our view considerably - 10% of the population making up half the fatalities is pretty stark. Even before seeing the raw numbers, we can infer from the population proportions that there’s likely a strong casual effect.
We’d want to take the time to break things down by age, vehicle type, speed, time of day, etc. to control for other things that could be impacting the data. That’s well out of scope for this example, but doing subgroup analysis is always a sound idea.
So next time you see a report showing the percentage breakdown of a population, be sure to ask if those numbers are really telling the story you’re being told they are, or if there’s further context that would explain a group’s odds of showing up there in the first place.