Einstein’s Theory of Relativity

Explained as simply as possible, but no simpler.

Einstein’s Theory of Relativity
Based on a photo titled “Albert Einstein by Doris Ulmann”. Source: Wikimedia Commons. Public Domain.
“Everything should be made as simple as possible, but no simpler.”

The special theory of relativity is without question one of the most important discoveries in the history of science and second only to Newton’s discovery of the laws of mechanics in its importance to physics. In spite of this, special relativity is poorly understood and there is an abundance of misinformation on the internet and in the media on the subject. This is not helped by a mostly undeserved reputation for being too difficult for most people to understand.

In reality, the basic ideas aren’t that difficult to understand. This article will explain some of those basic ideas by taking a direct path through the history of physics since Galileo, showing why the laws of physics as they were understood in the 19th century had to be adjusted, showing how special relativity arose from that adjustment, and exploring some of the consequences of that new theory.

Reference frames, covariance, and Galilean relativity

The basic idea of relativity is that two different observers who are in motion relative to each other must agree on the laws of physics. When two different observers are in relative motion, they are said to be in different reference frames, and when their relative velocity is constant, those reference frames are said to be inertial. When all observers in all inertial reference frames agree on a physical theory, then that theory is said to be covariant. We will only consider inertial frames.

Suppose that an observer who is at rest with respect to frame S is standing at the origin of a coordinate system (x,y,z) and that an observer at rest with respect to frame S′ is standing at the origin of a coordinate system (x′,y′,z′). If the observer in S sees the origin of the S′ coordinates moving to the right with constant velocity V, then the two reference frames are said to be in standard configuration:

Source: Wikimedia Commons. Public domain.

We will always assume that S and S′ are in standard configuration.

Suppose that an observer in S notes that an event takes place at point P at time T₀ and that another event takes place at point Q at time T₁, and let L be the distance between P and Q and ΔT=T₁-T₀. Suppose that the same events are seen by an observer in S′, separated by distance L′ and with the second event occurring ΔT′ seconds after the first. Before Einstein, the following assumptions were made:

  • Distance is absolute: L=L′
  • Time is absolute: ΔT=ΔT′

Lengths are given by the Pythagorean theorem: L²=(Δx)²+(Δy)²+(Δz)² where Δx, Δy, and Δz are the displacements in the x, y, and z directions. Quantities that take the same numerical value in all inertial frames are said to be invariant.

Let x, y, z, and t be the position and time coordinates of the coordinate system attached to S and x′, y′, z′, and t′ the position and time coordinates of the coordinate system attached to S′. These assumptions about distances and time intervals imply that these coordinate systems are related by the following rule:

This is called the Galilean transform. Galilean relativity is the theory that the laws of physics are covariant with respect to the Galilean transform. We’ll check that this is true for Newtonian physics. Suppose that Newton’s law is known to be true in frame S′:

We are ignoring the y′ and z′ coordinates for simplicity.

Here, F′(x′,t′) is an arbitrary force that is measured experimentally, and F(x,t) is the expression for the same force according to frame S. Suppose that the observer in S′ determined F′(x′,t′) by checking numerical values for x′ and t′. To find F(x,t), the observer in S checks the same numerical values for x and t. Since it’s the same force, and also since S does not experience any acceleration relative to S′, the observer in S must get the same results, F(x,t)=F′(x′,t′). By plugging x′=x-Vt and t′=t into the derivative, we get:

It follows that:

So Newtonian physics is covariant with respect to the Galilean transform. But is this true for all laws of physics?

Consider Maxwell’s equations in a region of space free of charges or currents:

Take the curl of the third line and use the vector calculus identity ∇⨯(∇⨯E)=∇(∇⋅E)-∇²E= -∇²E because ∇⋅E=0. Then the electric field vector Eobeys the wave equation:

It can be shown by the same process that the magnetic field vector B will also obey the wave equation. This equation predicts that a disturbance in the electric field will propagate with constant speed c, the speed of light. Consider an electric field whose only component is in the direction and which does not depend on y or z. Suppose that the wave equation is obeyed in frame S′:

We need this to transform into:

Let’s see if this happens with the Galilean transform.

The derivatives transform according to the chain rule:

And this means that under the Galilean transform, the wave equation as seen in frame S′ transforms into the following when seen from S:

This presents a problem: observers in different inertial frames will disagree about the law governing the propagation of a ray of light. To resolve this, we have no choice but to conclude that at least one of the following propositions is true:

  • Maxwell’s equations are wrong.
  • There is only one special reference frame in which Maxwell’s equations are true, namely the rest frame of the so-called luminiferous aether.
  • The Galilean transform is wrong, and therefore so are the underlying assumptions about space and time.

We can discard the first proposition immediately. Maxwell’s equations are experimental facts of reality. The second can be discarded in light of the several decades in the second half of the 19th century during which physicists tried and failed to detect the aether. This leaves only the third option.

The Lorentz Transform and Einstein’s theory of relativity

In 1892, Hendrik Lorentz published a paper in which he showed that the transformation under which Maxwell’s equations are covariant is:

Where γ is called the Lorentz factor:

Newton’s laws are also covariant under this transform with the appropriate modification, but that will be discussed in the sequel to this article.

This is called the Lorentz transform. Unfortunately Lorentz did not give a correct physical interpretation, since he incorrectly attributed it to the motion of the Earth relative to the luminiferous aether.

Einstein gave the correct interpretation in his 1905 paper On the Electrodynamics of Moving Bodies, and that interpretation is the foundation of what is now called special relativity. He started with the following two postulates:

  • The laws of physics are the same in all inertial reference frames.
  • The speed of light has the same value in all inertial reference frames, that is, it is invariant.

We can use these to derive the Lorentz transform, but doing so will require making some changes to how we understand space and time.

Time Dilation

Let S′ be the rest frame of a train that is in standard configuration with S, the rest frame of someone standing on the platform. An experiment is carried out on the train in which some physical process takes place over a time interval Δt′. We will show that the observer on the platform will see that same physical process take place over a time interval Δt, where Δt and Δt′ are related by:

Since γ>1, this is called time dilation. Suppose that according to the person conducting the experiment on the train, a laser pulse leaves point A′, travels directly upwards and then reflects from a mirror at point B′, and then returns to a detector at point C′, which is right next to A′.

The total distance traveled by the laser pulse was 2h, and the speed was c, so:

Now let’s think about what the observer on the platform sees. While the laser pulse travels at constant speed from the emitter to the mirror and then back to the detector, the train is also moving to the right. To the observer on the platform, the path of the laser pulse is a triangle:

Not to scale.

From the Pythagorean Theorem, the length of the line AB is given by:

The total path length is twice this amount, and also, since the speed of light is the same in both reference frames, the total path length must be equal to cΔt, so:

Now we solve for Δt in terms of Δt′ by eliminating h. From the formula for Δt′, we have h=cΔt′/2, so by plugging this in and squaring both sides of the formula for Δt, we get:

This shows that, because the speed of light must be invariant, time is dilated between reference frames. This explains why moving clocks appear to run slower: If a clock on the train ticks every second, then the interval between ticks is dilated according to someone watching from the tracks.

Demonstration: Muon decay

A muon is a subatomic particle that is identical to an electron in every way except its mass: A muon is about 207 times heavier. The weak interaction (one of the four fundamental forces) causes a muon to decay into an electron and two other particles called an electron antineutrino and a muon neutrino:

Muons have a decay half life of about 2.2 microseconds, meaning that if you have a sample of 100 muons, it will take about 2.2 microseconds for half of them to decay into electrons.

Muons are produced in the upper atmosphere when cosmic rays strike gas molecules, at an altitude of about 15 kilometers. Muon detectors at sea level typically detect one muon per square centimeter per minute, and their mean velocity when they are detected at sea level is about 0.995c. If we ignore relativity, we find that the time it takes for a muon to reach sea level is 15,000m/0.995c~50 microseconds, or about 23 half-lives. Since the flux of muons at sea level is 1/s∙cm², this means that the number of muons being produced at altitude would be 2²³/s∙cm², which isn’t a very realistic number.

But let’s see what happens when we do consider special relativity. When it comes to particle decay, it doesn’t matter how much time passes for you, the observer. What matters is how much time passes for the particles, and if those particles are moving much faster than you then, because of time dilation, when a time interval Δt elapses for you, a shorter time interval Δt′=Δt/γ elapses for the particles.

For V=0.995c, γ~10. So while to you it looks like the muons take about 50 microseconds to reach sea level, for the muons it only takes about 5 microseconds, or about 2.3 half-lives. This means that, if the flux of muons at sea level is 1/s∙cm², then the flux of muons at are produced at 15km is about 4.5/s∙cm², which is a much more reasonable number.

The muons, for their part, see the ground moving towards them with speed 0.995c. The muons will see themselves reaching the ground (or the ground reaching them, as it were) after only 5 microseconds, but how could they have traveled 15,000 meters in 5 microseconds at a speed of just 0.995c? The answer is that they didn’t.

Length Contraction

An observer at rest in frame S sees a particle with velocity Vx pass a post at point A at time t=0, and then at time Δt she sees the particle pass a post at point B, with both points on the x-axis separated by length L. The rest frame of the particle is S′, and in S′ the particle is stationary and the two posts, separated by length L′, are approaching the particle with velocity -Vx. The first post passes the particle at time t′=0 and the second passes the particle at time Δt′=Δt/γ. Since L=VΔt and L′=VΔt′, we see that L′=L/γ. This means that length is contracted in the moving frame.

This answers the question posed at the end of last section: The muons don’t have to travel the 15,000 meter distance that is seen by an observer on the ground. In the rest frame of the muons, the distance is only 1500 meters.

This refers to the contraction of space according to the observer in the rest frame of a moving object. There is also a reciprocal version of this principle, which says that moving objects appear to be contracted. Let S′ be the rest frame of a rod that appears to be moving with velocity Vx according to the observer frame S. The observer has no way of knowing whether she is moving relative to the rod or the rod is moving relative to her. If she was moving relative to the rod then she would see space as being contracted so the rod appears to be shorter than its rest length. This is exactly equivalent to saying that the rod appears to be shorter because it is moving since there is no way for her to know whether she or the rod is moving.

The Lorentz Transform

Now we can prove that the Lorentz transform relates the coordinate systems of the two reference frames in standard configuration. We will show that:

Since the velocity of S′ relative to S is constant and entirely in the x-direction, by symmetry, it must be true that y′=y and z′=z.

To proceed, we now define a new quantity called the spacetime interval (Δs)²:

Be aware that other conventions exist for the minus signs.

We’ll show that the spacetime interval is invariant, meaning that (Δs)²=(Δs′)² for all pairs of reference frames S and S′.

Suppose that a person standing still in frame S sends out a laser pulse that travels distance L in time Δt, and with respect to frame S′, the pulse travels distance L′ in time Δt′. Then =(L/Δt)²=(L′/Δt′)² from the invariance of c. Then we have:

The rest of the derivation of the Lorentz transform follows the derivation that Einstein used in his popular book on relativity. Drop the delta symbols in the second line of the above, and since y′=y and z′=z, these terms cancel out of the equation. Then we can write:

We can use this to write:

Add the second equation to the first to get an equation for x′ and subtract the first equation from the second to get an equation for ct′:

Then make the following assignments:

So we get a linear system for x′ and ct′:

The origin of the primed coordinate system has velocity Vand therefore we can set its position vector as (Vt,0,0), so let x′=0 coincide with x=Vt. Then the first equation gives:

Now the system of equations becomes:

To solve for a, plug these into the expression for the invariance of the spacetime interval, c²(t′)²-(x′)²=c²t²-x²:

This means that:

Then we get the Lorentz transform when we substitute this into the formulas that we found for x′ and ct′:

So we have successfully derived the Lorentz transform from physical principles.

Demonstration: The classical limit

The Lorentz transform looks very different from the Galilean transform. How could physicists have been this incorrect for such a long time?

Consider a fighter jet travelling just above the speed of sound relative to an observer in a nearby control tower, with V=350m/s. Then V²/c²~1.4×10⁻¹². The best way to approximate the value of γ for small values of V²/c² is to use the binomial approximation, which says that:

This gives a good estimation for γ:

So for velocity close to the speed of sound, non-relativistic physics is accurate to parts per trillionth (12 decimal places). And of course, even this “low” speed was practically inaccessible to anyone conducting experiments before the 20th century and certainly would never have been encountered by anyone in their daily lives, so it’s easy to see why it took almost 300 years from the time of Galileo before anyone noticed that something was wrong.

Spacetime

It cannot be emphasized strongly enough that time dilation and length contraction are properties and space and time themselves. They are not the result of forces that cause clocks to run slower depending on who’s looking at them, nor does moving at relativistic speed induce forces that stretch or compress objects. It is also not the result of a measurement error or optical illusion that causes observers in different frames to misjudge the length of an object or the rate that a clock ticks. When observers in different frames report different lengths for measuring rods or different frequencies for ticking clocks, they are all correct because lengths and time intervals are not invariant, and that’s just how space and time work.

Classical physics is formulated in three-dimensional Euclidean space, E₃, the set of all ordered triples of real numbers (x,y,z) combined with enough topological structure to make things like “distance” and “point” meaningful, as well as a function called the Euclidean metric, which says that the distance between two points P₁=(x₁,y₁,z₁) and P₂=(x₂,y₂,z₂) is:

So in classical physics, if an event takes place at point P₁ at time t₁ and then a second event takes place at P₂ at time t₂ where P₁ and P₂ are separated by distance L and time Δt=t₂-t₁, the best we can say is that the two events took place L meters apart and the second took place Δt seconds after the first. This is what it means when we say that space and time in classical physics are “separate”: there is no coherent way to assign a single number as the “distance” between two events in classical spacetime.

Now, do we actually live in Euclidean space? We do not. If spacetime was Euclidean, then the Galilean transform would be the correct relationship between coordinates of different reference frames, so distances would be invariant with respect to a change of reference frame. But this is false because of length contraction. This begs the question of what kind of space we actually do live in.

Consider the set of all spacetime points (x,y,z,t), but this time suppose that the “distance” between two points is the spacetime interval. If event s₁ takes place at position (x₁,y₁,z₁) and time t₁, and another event s₂ takes place at position (x₂,y₂,z₂) at time t₂, then the “distance” between them is given by:

The name for the function that gives the spacetime interval between two events is the Minkowski metric, after Herman Minkowski. Minkowski, who was in fact one of Einstein’s college professors, was the one who actually formalized the concept of spacetime. So rather than a three-dimensional Euclidean space with one “extra” dimension of time, we instead live in four-dimensional Minkowski spacetime. The implications of this are huge and several of them will have to wait for the follow-up to this article. But to close out this article, let’s talk about the most famous one.

Demonstration: Mass-energy equivalence, E=mc²

One of the most famous consequences of special relativity is that rest mass is equivalent to energy. The rest mass of a particle is its mass as measured in the frame in which the particle is not moving. This section is meant to provide a justification, though not a formal proof, for this claim.

I will first argue from physical principles that a particle can travel at light speed if and only if it has no rest mass. The actual proof will have to wait for the follow-up to this article, which will be about some applications of relativistic physics.

Suppose that a particle appears to travel at light speed in frame S, and that it covers a distance L in time Δt so that cΔt=L, so (Δs)²=(cΔt)²-L² =0. But by the invariance of a the spacetime interval, (Δs′)²=(Δs)², so in any other reference frame S′, (Δs′)²=(cΔt′)²-(L′)²=0 so L′/Δt′=c, so the speed of the particle is c in every inertial frame. This means that it does not have a rest frame, and therefore it is not physically meaningful to say that it has a rest mass. This satisfies the “only if” part of the claim.

Now suppose that there is a frame in which the particle is at rest and has zero mass. Then the particle may as well not exist: it is at rest so it has no momentum to transfer to other particles, and it has no mass so it is impossible for any other particle to transfer momentum to it, so there is no way for this particle to interact with anything in the universe. Since we are only interested in particles that have a physically meaningful existence, we can say that there is no frame in which a massless particle is at rest, so all massless particles must travel at light speed. This satisfies the “if” part of the claim.

We’ll see in the sequel that relativity causes momentum and energy to work differently than how we are used to thinking about them, but the basic conservation still laws still hold: momentum and energy are still conserved in each reference frame.

A positron is a subatomic particle that is the anti-particle to the electron. It is identical to an electron in every way, except for having opposite charge. It is known from experiment that when a particle and an anti-particle encounter each other, they annihilate each other and produce radiation. The formula for this is:

Where e⁺ means a positron, e⁻ means an electron, and γ means a photon, so two photons are produced. Consider the case where the electron and positron are both at rest and in contact with each other the instant before annihilation occurs. The total mass in the system is 2mₑ, twice the mass of an electron, and the total momentum in the system is zero. But after annihilation, the total rest mass is zero because the resulting photons travel at light speed. Where does the mass go, and where does the energy come from?

Since momentum is conserved, after annihilation, the total momentum is still zero, so the two photons have momentum with the same magnitude p and opposite directions. While photons do not have rest mass, they do still have kinetic energy, which we can write as E=pc.

Experiments have found that after the annihilation, the total kinetic energy of the two photons is about 1.637×10⁻¹³ Joules. The total rest mass of two electrons is about 1.829×10⁻³⁰ kilograms. When this total rest mass is multiplied by c², we get (2mₑ)c²=1.644×10⁻¹³ Joules, which matches the total energy released up to a rounding error. The same relationship between the energy and the rest mass occurs when this experiment is repeated with protons and antiprotons, neutrons and antineutrons, muons and anti-muons, and so on.

Since energy is conserved, there must have been as much energy in the system before the annihilation as after. This means that we could hypothesize that the energy before the annihilation was stored as the mass of the electron and positron, with the amount of energy stored being E=mc², and that the annihilation process turned this energy into the kinetic energy of the photons. This would account for both the appearance of kinetic energy in a system that initially had none and the vanishing of the rest mass in the system, but we still need to actually prove this, which will be done in the next article.

Any images that have not been given a citation are my own original work. Some of the examples that I used are based on examples covered in the textbook Modern Physics for Scientists and Engineers, 2nd edition by Taylor, Dubson, and Zafiratos.