The Binomial Distribution Explained
You guess all 20 questions on your multiple-choice exam. What are the chances that you pass?
Which student hasn’t been there? It’s exam time and the multiple-choice questions are waiting for you. Can’t be that hard to guess yourself to success, can it?
An exam has 20 questions. Each question has four choices. Exactly one choice is correct for each question. You need at least 10 correct answers to pass. What is the probability of passing if you pick one random answer for each question?
Before you read on (or scroll down to check the answer), I want you to make a rough guess. Just (mentally) tick one of the boxes above.
This is a common problem that can be described using the Binomial Distribution. And we encounter a lot of such problems in our everyday lives.
For example:
You roll the dice 3 times. What are the chances of getting exactly one six? What are the chances of getting at least one six?
You play the lottery every day for one year. What are the chances of winning at least once?
You make 3 penalty shots. What are the chances of hitting the goal at least two times?
The probability of having a certain disease is 1 %. What are the chances that in a group of 10 people at least two people have the disease?
All of these problems can be described using a Binomial Distribution. Before we look at it mathematically, let’s look at it from a more general perspective.
What do we need for a Binomial Distribution?
- one base experiment which will be repeated many times (n times).
- only two outcomes in the base experiment: success or no success; property fulfilled or not fulfilled.
- a success probability p that does not change from one experiment to the next (this means the experiments are independent).
What are we looking for?
We are looking for the probability of a certain number of successes. The order in which the successes happen does not matter. We don’t care whether we answered the first five questions or the last five questions on our exam correctly. What matters is that we answered exactly five questions correctly. The goal is to pass the exam — not to pass the exam by answering the first x questions correctly.
The Exam Problem — Mathematically
To understand the problem, let’s look at a simplified version of a multiple-choice exam first. Let’s say our exam only has three questions and for each question, there are four possible answers out of which exactly one is correct and the other three are incorrect. You are only allowed to mark one question and you do follow these rules. You pass the exam if you answer at least two questions correctly. Let’s call this event A and find out how to calculate the probability for this event.
If you have no clue on how to solve such a problem, the easiest or most intuitive way to start is by drawing a probability tree.
For every question, there are two outcomes: Either you answer correctly or you don’t. If you pick a random answer, the probability of guessing the right answer is one out of four, 1/4, or 0.25. Consequently, the probability of guessing wrong is a lot higher at 3/4 or 0.75.
After answering the first question, we answer the second question. Again, we have the same two possible outcomes: Correct answer or incorrect answer. We repeat the process for the third question.
This leads to the following probability tree.
Theoretically, we could continue drawing this tree for as many exam questions as we like.
Now we can use this tree to calculate probabilities. Before you look at the calculation, I encourage you to always make a rough guess of what you think the probability will be. You might be surprised, but we humans have a massive tendency of misguessing probabilities!
To calculate the probability for a given path, we only need to multiply the numbers along the path.
The probability for guessing all three questions correctly is thus
The probability for getting the first and the second answer correct and the third incorrect is
What about getting the first one incorrect and the other two correct? As the calculation shows,
the probability is the same as the probability before. This makes sense because multiplication is commutative, i.e. the order in which we multiply the numbers does not matter.
Now we go back to our original question: We want to know what the chances are of passing, this means of getting at least two questions right. We don’t care which ones we get correct as long as we get at least two correct.
To calculate this probability, we calculate the probabilities for each of the paths and then add these probabilities.
Let’s say the exam is passed if at least two questions are answered correctly. To calculate this, all we need to do is mark all interesting paths, calculate the probabilities for each path, and add all of them.
This leads to a probability of
of passing this exam by guessing. That’s probably a lot lower than you may have guessed, isn’t it?
In the same way, we could also solve our original exam problem with 20 questions even though we would have to draw a massive tree!
And nobody ain’t got time for that.
This is where the Binomial Distribution comes into play. But before we introduce it, we will draw imaginary trees in our head and make some considerations.
Drawing an Imaginary Tree in our Heads
Let us first define a random variable X. A random variable is a variable that depends on randomness. In our case, X is the number of successes, whatever a success may be. Maybe that’s getting an answer right. Maybe it’s being a girl. Maybe it’s winning the lottery.
In our case, success means getting the correct answer, so
Let’s say we are first looking for the probability of getting exactly 0 answers right, P(X=0). Remember that our exam consists of 20 questions with four answers each of which exactly one is correct.
In our (imaginary) probability tree, this event is visualized by the path on the right with twenty incorrect answers in a row.
Thus the probability for this event is given by
Next up we may look at the probability of getting one answer correctly and everything else wrong. This is already a bit more complicated as there is not just one single path anymore that fulfills this condition, but many.
Let’s first consider the possibility of getting the first answer right and everything else wrong. So we first go to the left and then always to the right and get the probability
What happens if instead of getting the first answer right, we get the second answer right?
As we are still multiplying the same numbers (just in a different order), this probability doesn’t change.
No matter at which position we get the answer right, the probability will always be the same as we only change the order of the multiplication. There are 20 possibilities to position the one correct answer, we say that we choose one question out of twenty. In our (imaginary) probability tree, we have 20 paths with the same probability each. Thus the probability of getting exactly one answer right is
Now let’s look at the probability of getting exactly two answers right. To start, we might calculate the probability of getting the first two answers correct and everything else incorrect.
As we may have already guessed, again, it doesn’t matter which two questions we get right, the probability for getting two specific questions correct and the rest incorrect will always be the same. What’s more interesting now is to figure out the number of paths in our tree.
There are 20 questions in total, 18 of which we will answer incorrectly and 2 correctly. there are
possibilities to order 20 questions. Out of these 20 questions, 18 will be incorrect. These 18 can be ordered in
different ways. The two correct answers can be ordered in
ways.
Thus, there are
possible paths to get exactly two questions correct. Seems like a pretty big number, doesn’t it? Write them down, if you don’t believe me.
We now have 190 paths in which exactly two answers are correct. The probability of getting exactly two answers right is thus given by
The Binomial Coefficient
One big step in generalizing this is to understand the number of paths in the probability tree. Our above consideration can be generalized to the so-called binomial coefficient. Instead of wanting to calculate the number of paths for getting exactly 2 answers right, we want to calculate it for k answers.
With the same reasoning as above, we get
We could make this even more general and substitute the 20 with n. This gives us the general formula for the binomial coeffient.
In the probability tree as described above, it is the number of paths with exactly k successes.
In literature, the binomial coefficient is usually described as the number of possibilities to pick k people out of n.
The Binomial Distribution Probability Function
I hope you’re still with me. Because we can now put everything together and finally get our binomial distribution!
We can now generalize this to getting exactly k answers right. To get exactly k answers out of 20 right, we have to get k answers right and 20-k answers wrong. This happens with a probability of
because again, we multiply the number of paths with the probability of getting k answers right and the probability of getting 20-k answers wrong!
For a general n, this leads to
or, even more general, if we don’t know the probabilities yet:
The Probability Distribution Function
We can now calculate probabilities to get exactly k answers correct for any number of exam questions and probabilities.
But what we really want to know is the probability of at least 10 correct answers or however many correct answers it takes to pass the exam.
To calculate the probability for up to k correct answers, we only need to add the probabilities for 0, 1, 2,…k correct answers.
Why? Because in our probability tree, these events are all made of different paths, so to calculate the joined probability of all of these events, we can simply add them add.
And with this knowledge, we can answer our initial question.
Finally: Will you pass?
We can finally calculate the probability of answering at least 10 questions correctly. To calculate this probability we could either calculate the probabilities of answering 10, 11, 12,…20 questions correctly or calculate the probabilities of answering 0,1,…9 questions correctly, adding these probabilities up and subtracting the answer from 1.
No matter which option we prefer, we end up with the following probability:
This means that the probability of passing is roughly 1.4 %. If 100 students guess the complete exams, we would expect 1.4 of them to pass (on average).
What was your guess? Was it close to the correct answer?
About the Author: Maike is a coffee-fueled Mathematician who loves to teach, talk and write about Math. If you want to support her work, feel free to buy her a coffee: https://www.buymeacoffee.com/maikeelisa