The One-Sentence Proof in Multiple Sentences

Fermat’s theorem on sums of two squares had famously been proven in just one single sentence.

The One-Sentence Proof in Multiple Sentences
Image from Unsplash

Fermat’s theorem on sums of two squares had famously been proven in just one single sentence.

Can prime numbers be written as the sum of two squares? For 13 this is possible (2²+3²), but for example, 3 cannot be written in this way. Pierre de Fermat came up with a theorem on this topic, for which Don Zagier, an American mathematician gave a proof, which astonishingly is just one sentence long.

Numberphile [1] [2] made a great video on the one-sentence proof of Zagier but left out some details. I’ll give an overview of the points already covered in the video of Numberphile but will also explain the left out steps.

The theorem

Pierre de Fermat, a French mathematician of the seventeenth century, thought about under which conditions, primes could be written as the sum of two squares. For example, as already mentioned 13 is a prime, which can be written as the sum of two squares, namely 2² and 3². Also, 17 is a prime satisfying this rule as we have 17=1²+4².

In general, Fermat was interested in the question, whether or not for a given prime number, let’s call it p, two other arbitrary numbers, let’s call them a and b, can be found, such that p=a²+b².

Imagine the problem as a game: You give me a prime number p. My task is to tell you whether or not this prime is expressible as the sum of two squares. One possibility to decide whether the answer is yes or no, is to test all possibilities for aand b.

This method is feasible for small prime numbers. However, even computers struggle to play the game with this method for very large prime numbers. So it would be much more convenient, so to say, to have a better, a more direct, strategy.

Fermat wrote down the first hundred prime numbers which can be written as the sum of two squares and found a surprising pattern: It seemed that exactly those prime numbers p can be written as the sum of two squares, for which p-1 is divisible by 4. If we check this conjecture with our previous examples 13 and 17, the hypothesis seems to hold. So the theorem, which we will prove in the following paragraphs is:

A prime number p is expressible as the sum of two squares if and only if, p-1 is divisible by 4.

Unfortunately, Fermat did not prove it — he somehow has a reputation of leaving things unproven — and also other mathematicians struggled to come up with a proof. About 50 years ago Don Zagier gave a proof, not the first one, but unbelievably a proof in just one sentence. I’ll give a detailed version of the famous one-sentence proof to illustrate all thought processes and all formal checks needed to understand the proof. The original proof is at the end of this story.

In the next few paragraphs, p is a prime number satisfying the preconditions of the theorem. So p is a prime and p-1 is divisible by 4, which means p-1=4k or equivalently p=4k-1 for some natural number k. Our task is it to show, that there indeed are some numbers a and b such that p=a²+b².

The ingenious idea

Zagier starts his proof by looking at the solutions of the equation p=x²+4yz. In the original proof, you’ll find

So S is the set of all triples of numbers x, y and z for which the equation p=x²+4yz is true.

Don’t puzzle your head over how Zagier came up with the idea to look at exactly this equation. If this step is logical to you and you would have come up with the exact same idea, congratulations, you probably have a promising career in mathematics ahead.

Recipes for new solutions

In the equation p=x²+4yz it can be seen that from a given solution (x,y,z), a new solution can be obtained by just swapping the roles of y and z. It is also apparent that if this process is applied again, one will get back the old solution. This is called an involution. So the transformation (x,y,z) → (x,z,y) is an involution and solutions come in pairs.

However, it could also be that if our involution (x,y,z) → (x,z,y) is applied nothing changes, namely if y is equal to z. Now suppose that there is such a solution where y is equal to z. This would mean we could write p=x²+4yy or equivalently p=x²+(2y)² and voilà, we have written p as the sum of two squares.

The argument above illustrates that the whole question, whether or not is expressible as the sum of two squares boils down to the question, whether or not (x,y,z) → (x,z,y) has a fixed point (i.e. a point where nothing changes) in the set S (remember: S is the set of solutions to p=x²+4yz).

This is the point where they take a, quote, “leap of faith”, in the Numberphile video. In the video, they assume that S contains an odd number of solutions. Why is this helpful? If S contains an odd number of solutions then, there is a solution (x,y,y) because if for all solutions y and z were different, the number of solutions would be even, as they come in pairs. So because S contains an odd number of solutions, there is a solution (x,y,y) and therefore p is expressible as the sum of two squares.

But why is |S| odd?

If you followed the sketch of the proof to this point, you have almost understood everything there is to understand in the original proof. What’s left to show is that the number of solutions in S is indeed odd.

For this matter, Zagier has another clever trick up his sleeve. He gives a different involution on S, which looks slightly more complicated:

With some simple high-school algebra, it can be checked that I is indeed an involution. That means, starting with (x,y,z) and applying I twice, we get back (x,y,z).

Also, it is easy to check that if (x,y,z) is a solution to the defining equation of S(that means x²+4yz=p), then in all cases applying I yields again a solution.

So, also produces solutions to p=x²+4yz in pairs. Assume, I has exactly one single fixed point. Assume for a second, we could show that. That would mean that in S all solutions are paired with a different solution with the exception of a single one which is paired with itself. If we could show that, we’d immediately get that S contains an odd number of elements.

And that’s exactly the path we’ll take. We are going to show that I has exactly one fixed point and therefore |S| is odd.

On the fixed points of I

Zagier came up with the more complicated involution not to make this story longer, but because it’s easier to investigate the fixed points of I. When considering fixed points, we want to know for which solutions the involution Ispits out the exact same solution we put in.

If you fiddle around with the definition of I it will turn out that only if you start with a solution (x,y,z) satisfying the second case of I’s definition (i.e. y-z<x<2y), you will again end up with a solution satisfying the case in which you started. So if there is a fixed point (and we really want one), it has to be a solution that satisfies the condition of case two.

So, if (x,y,z) satisfies case two and is a fixed point then we get that x=2y-x and therefore x=y.

If we replace y by x in our equation we get p=x²+4xz. You can see that in this case p is divisible by x. Because p is prime x has to be 1. Therefore we have x=y=1.

As p is our prime and x and y are just 1, also z has to be a concrete number. We can put everything we know into an equation and solve for z:

So for our given prime p, our involution I has exactly one fixed point on S. As we have seen, I produces the solutions in pairs, where most pairs consists of two different solutions. Only one pair contains the same solution twice. In other words, there is an even number of solutions (the pairs) and one extra solution (the fixed point). Therefore, we end up with an odd number of solutions for the equation p=x²+4yz. This is, as described above, sufficient to prove that p is expressible as the sum of the squares.

The original one-sentence proof

As to this point all pieces of the one-sentence proof have been given, let’s have a look at the original proof of Zagier which unbelievably is really just one sentence.

The involution on the finite set
defined by
has exactly one fixed point, so |S| is odd and the involution defined by (x,y,z) → (x,z,y) also has a fixed point.

♡ Q.E.D.