Sample Size and Percentages: What to Expect

People are really good at comparing numbers. If you have 1,000 apples and I have 10, you can tell me pretty quickly that you have 990 more apples than I do. You can also put that into some context - we know intuitively that that’s a lot of apples - so it’s helpful as well.

Percentages are numbers as well, so they should be easy to measure as well. If Jack takes part in a Euchre tournament wins 60% of his games, then he’s pretty good. Right?

Well, maybe. Firstly, we need to define how we’re looking at that number. If we’re simply asking how Jack played, then we use the 60% as what we call a descriptive statistic. That is, it tells us a story about what happened. If we want to use it to measure how we think he’ll do in the future, then it becomes a predictive statistic, and its usefulness will depend on how many games he played.

In this case, it turns out he played 10. Knowing that, we can calculate what’s called the Standard Error. “Error” in this case doesn’t mean a mistake, rather, it refers to range we expect for the win percentage. If you’ve ever seen a poll with a +/- number, that’s what this is - they expect that between x and y percent of the population agree with whatever they’re polling about.

Boring Mathy Stuff If you don’t care to know the math, feel free to skip ahead. The standard error is defined as:

sqrt(Win Percentage * (1 - Win Percentage) / Games )

If we plug in Jack’s numbers, we get:

sqrt((.6 * .4) / 10)
sqrt(0.24) / 10)
sqrt(0.024)
0.015

We can then use the standard error along with the win percentage to get our range of what we think his talent really is.

Games	Wins	Win Percentage	Standard Error	Lower Range	Higher Range
10	6	60.0%	15.5%	44.5%	75.5%

So based on 10 games, we expect that if Jack played 1,000 games, his winning percentage would be somewhere between 44.5% and 77.5%. This makes sense - 60% isn’t that far off from 50%, which is just a coin flip. It’s more, though, so we think he’s probably a winning player, although we still think there’s a chance he’s not.

Technical aside: In a real-world situation, we’d look at a lot more data - such as the winning percentage of all Euchre players, quality of opponents, quality of partners, etc. And if most players fell in a lesser range - say 40% to 50% - we’d apply additional fancy math to bring our expectations in line with the general populace. We’re going to gloss over that here, just know there are other ways of estimating talent.

We know intuitively that 10 games isn’t a lot. So we have another tournament (it’s the Midwest, after all, these things happen all the time) and Jack plays another 10 games. Conveniently for our illustration, he wins exactly six more, so now we can simply double our totals. Let’s see what that does to our estimates:

Games	Wins	Win Percentage	Standard Error	Lower Range	Higher Range
20	12	60.0%	11.0%	49.0%	71.0%

Now that we have more information our standard error has shrunk. We think it a lot less likely that he’s a losing player, although we also think it’s less likely that he’s a superstar. That said, going +/- 11% is still not super-helpful. Let’s see how that changes as we add more games to Jack’s totals.

Games	Wins	Win Percentage	Standard Error	Lower Range	Higher Range
10	6	60.0%	15.5%	44.5%	75.5%
20	12	60.0%	11.0%	49.0%	71.0%
30	18	60.0%	8.9%	51.1%	68.9%
40	24	60.0%	7.7%	52.3%	67.7%
50	30	60.0%	6.9%	53.1%	66.9%
60	36	60.0%	6.3%	53.7%	66.3%
70	42	60.0%	5.9%	54.1%	65.9%
80	48	60.0%	5.5%	54.5%	65.5%
90	54	60.0%	5.2%	54.8%	65.2%
100	60	60.0%	4.9%	55.1%	64.9%

The more games we add, the more we think it likely that Jack’s true talent is around 60%. Now, in this example, he’s Mr. Consistent, winning 6 out of every 10 games no matter what, so we’re in no duh, Sherlock territory. Let’s take a look at a slightly more realistic scenario, this time envisioning what his winning percentage is every ten games. To do this, we’ll simulate the data - imagine Dr. Strange looping Dormammu in a time loop, only in this case we’re creating a digital copy of Jack and making him play Euchre over and over again, each time with slightly different results. (No Jacks were harmed in the making of this data.)

Games	Win Percentage	Standard Error	Lower Range	Higher Range
10	40.0%	15.5%	24.5%	55.5%
20	50.0%	11.2%	38.8%	61.2%
30	53.3%	9.1%	44.2%	62.4%
40	55.0%	7.9%	47.1%	62.9%
50	60.0%	6.9%	53.1%	66.9%
60	56.7%	6.4%	50.3%	63.1%
70	58.6%	5.9%	52.7%	64.5%
80	57.5%	5.5%	52.0%	63.0%
90	60.0%	5.2%	54.8%	65.2%
100	57.0%	5.0%	52.0%	62.0%

In this example, Jack hit 100 games with a 57% win percentage, but we expect his him to be between 52% and 62% going forward. But since we’re making our virtual Jack play over and over again, let’s take a look at another set of 100 games.

Technical aside: What we’re now looking at is how many games we expect someone with a true 60% win talent to perform in a random set of 100 games. We’re looking to illustrate something else, but just bear in mind we’re hijacking one process to show another.

Games	Win Percentage	Standard Error	Lower Range	Higher Range
10	60.0%	15.5%	44.5%	75.5%
20	60.0%	11.0%	49.0%	71.0%
30	66.7%	8.6%	58.1%	75.3%
40	67.5%	7.4%	60.1%	74.9%
50	70.0%	6.5%	63.5%	76.5%
60	73.3%	5.7%	67.6%	79.0%
70	68.6%	5.5%	63.0%	74.1%
80	67.5%	5.2%	62.3%	72.7%
90	66.7%	5.0%	61.7%	71.6%
100	64.0%	4.8%	59.2%	68.8%

Well, this one looks more promising, anyway - 64% success rate, and we think he’s at least a 59% player. But our last sim was at 57%, so we had an 8 point swing. This shows why sample size is so critical - we can get very different numbers within even 100 games, even though in both cases he’s a 60% winner.

To get a better feel for this, let’s make our virtual Jack play 100 games of Euchre 1,000 times. We can then see how much his win % varies by looking at all of our sims at once. The blue line is the average of all our sims put together.

Notice how our range gets smaller the more we play, and how our simulated game range matches closely with our standard error. This drives in our key takeaway: sample sizes always have some randomness, even at what most of use would consider to be a large amount. (We think 100 is large, but in this context it’s not as large as we’d like). So when you see a percentage and want to predict what it will do in the future, do so knowing that the amount of data is meaningful as well for making that prediction.

Appendix: We can see how the standard error changes with both the success rate and sample size. Note it’s largest at 50%, since that’s our standard coin toss. Also note that it’s symeterical - 0.9 and 0.1 have the same values. Since the formula uses success rate * (1-success rate), then 0.9 * 0.1 and 0.1 * 0.9 have the same value. This is also why 0 and 1 have no error. 1 * 0 is zero.

Success Rate	50	100	150	200	250	300	350	400	450	500	550	600
0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
5.0%	3.1%	2.2%	1.8%	1.5%	1.4%	1.3%	1.2%	1.1%	1.0%	1.0%	0.9%	0.9%
10.0%	4.2%	3.0%	2.4%	2.1%	1.9%	1.7%	1.6%	1.5%	1.4%	1.3%	1.3%	1.2%
15.0%	5.0%	3.6%	2.9%	2.5%	2.3%	2.1%	1.9%	1.8%	1.7%	1.6%	1.5%	1.5%
20.0%	5.7%	4.0%	3.3%	2.8%	2.5%	2.3%	2.1%	2.0%	1.9%	1.8%	1.7%	1.6%
25.0%	6.1%	4.3%	3.5%	3.1%	2.7%	2.5%	2.3%	2.2%	2.0%	1.9%	1.8%	1.8%
30.0%	6.5%	4.6%	3.7%	3.2%	2.9%	2.6%	2.4%	2.3%	2.2%	2.0%	2.0%	1.9%
35.0%	6.7%	4.8%	3.9%	3.4%	3.0%	2.8%	2.5%	2.4%	2.2%	2.1%	2.0%	1.9%
40.0%	6.9%	4.9%	4.0%	3.5%	3.1%	2.8%	2.6%	2.4%	2.3%	2.2%	2.1%	2.0%
45.0%	7.0%	5.0%	4.1%	3.5%	3.1%	2.9%	2.7%	2.5%	2.3%	2.2%	2.1%	2.0%
50.0%	7.1%	5.0%	4.1%	3.5%	3.2%	2.9%	2.7%	2.5%	2.4%	2.2%	2.1%	2.0%
55.0%	7.0%	5.0%	4.1%	3.5%	3.1%	2.9%	2.7%	2.5%	2.3%	2.2%	2.1%	2.0%
60.0%	6.9%	4.9%	4.0%	3.5%	3.1%	2.8%	2.6%	2.4%	2.3%	2.2%	2.1%	2.0%
65.0%	6.7%	4.8%	3.9%	3.4%	3.0%	2.8%	2.5%	2.4%	2.2%	2.1%	2.0%	1.9%
70.0%	6.5%	4.6%	3.7%	3.2%	2.9%	2.6%	2.4%	2.3%	2.2%	2.0%	2.0%	1.9%
75.0%	6.1%	4.3%	3.5%	3.1%	2.7%	2.5%	2.3%	2.2%	2.0%	1.9%	1.8%	1.8%
80.0%	5.7%	4.0%	3.3%	2.8%	2.5%	2.3%	2.1%	2.0%	1.9%	1.8%	1.7%	1.6%
85.0%	5.0%	3.6%	2.9%	2.5%	2.3%	2.1%	1.9%	1.8%	1.7%	1.6%	1.5%	1.5%
90.0%	4.2%	3.0%	2.4%	2.1%	1.9%	1.7%	1.6%	1.5%	1.4%	1.3%	1.3%	1.2%
95.0%	3.1%	2.2%	1.8%	1.5%	1.4%	1.3%	1.2%	1.1%	1.0%	1.0%	0.9%	0.9%
100.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%