Why Probability Theory is Hard
It’s not because you’re stupid or weren’t concentrating in school
 
            In 1982, Kahnemann, Slovic and Tversky published “Judgement under uncertainty: Heuristics and biases” and shattered humanity’s collective self-delusion that we had any functional intuition for even the most rudimentary problems in probability theory. This work has seen a renaissance in popularity since the publication of Kahnemann’s rather more accessible “Thinking fast and slow”.
Kahnemann is broadly sympathetic to our struggles, but much of the follow-up literature and course material has a slightly disparaging, not to say patronizing odour, as if reliable probabilistic intuition is just a question of a little hard work and application.
I have taught probability theory to all people of all ages, backgrounds, levels of motivation and levels of mathematical enthusiasm; from talented school kids and struggling school kids to undergraduates who take it because they have to and post-graduates who study it out of love. A large part of my consultancy involves teaching probability to people who want to use data to make more informed decisions.
I’m here to tell you that probability is hard and why, but that a little goes a long way and it’s entirely worth the struggle.
Probability theory is not intuitive
When we learn to drive, we turn the steering wheel, the car responds. It responds differently at different speeds. When we start, this takes us by surprise. We first over-turn, then we under-turn because we just over-turned, then we begin to narrow in on an appropriate response and we learn to do this across a range of speeds. We can do this because, largely, our car responds in the same way for a given speed every time. We subsume and integrate those responses and we absorb cues that allow us to extend that learnt intuition to other, similar mechanical systems and from there to mechanical systems in general.

Uncertain systems, by definition, respond differently every time, to the same input. The wiring we possess to subsume, integrate and automate responses simply can not work. Best case, it just can not engage. Worst case, we try to hypothesize response protocols— inferring patterns where there are none — and we become angry and frustrated when they don’t work.
Given the immense challenge of developing an intuition about just one stochastic system, we can hardly be surprised that it is virtually impossible to generalize to uncertain systems in general
Probability theory is all “Slow”
Readers of Kahnemann’s “Thinking Fast and Slow” will recognize the distinction between System I “Fast” (intuitive, instinctive, often emotional) and System II “Slow” (deliberate, methodical, rational) thinking. System II is slow, but it’s also hard; it demands energy, will-power and — certain mind states aside — it is a limited resource. Because Probability Theory is non-intuitive, it is perpetually doomed to languish in System II thought paradigms.

We might hope that notwithstanding the additional costs of developing stochastic intuitions, sufficient exposure over a long enough period of time may bring us to an instinctive apprehension of uncertain systems. That may be; however, despite a maths PhD, and two decades using probability theory professionally, I have not experienced this myself. It is, though, undoubtedly possible to develop an effective intuition for how to System-II-solve probability problems. So while we can develop an intuition to speed up our “Slow” thinking, it’s still “Slow” (and hard).
Probability is conceptually confusing
Students (in the broadest sense) who look to learn the “Slow” logic of probability are immediately faced with considerable conceptual challenges.

First, probability theorists don’t even agree what probability is or how to think about it. While there is broad consensus about certain classes of problems involving coins, dice, coloured balls in perfectly mixed bags and lottery tickets, as soon as we move into practical probability problems with more vaguely defined spaces of outcome, we are served with an ontological omelette of frequentism, Bayesianism, Kolmogorov axioms, Cox’s theory, subjective, objective, outcome spaces and propositional credences.
Even if the probationary probability theorist is eventually indoctrinated (by choice or by accident of course instructor) into one or other school, none of these frameworks is conceptually easy to access. Small wonder that so much probabilistic pedagogy is boiled down to methodological rote learning and rules of thumb.
Conclusion
There’s more. Probability theory is often not taught very well. The notation can be confusing; and don’t get me started on measure theory.
The good news is that in terms of practical applications, very little can get you a very long way. The alternative to the basic level of understanding that allows a quantitative analysis of uncertainty is, frankly, crystal balls and tea leaves. Even simple models, based on the most rudimentary probabilistic paradigms, will clarify outcomes, furnish a framework and provide insight into the data required to make informed decisions.
And though it’s hard, it’s without a shred of doubt entirely worth the effort. Probability theory, despite its ongoing turf war, is mathematically mature enough that whatever framework you adopt actually represents pretty much the minimal conceptual machinery you need rationally to navigate uncertainty.
So go for it, but be ready and be kind to yourself. It’s hard.
 
             
             
             
            