Everything You’ve Ever Known is Statistics
Information is never without uncertainty
The brain is an amazingly complex machine for processing information. Whatever your philosophy of mind is, you can’t deny that it depends in large part on the brain’s ability to process incoming signal data and convert that data into higher level understanding.
These days brain-inspired techniques have taken over Artificial Intelligence (AI) in a way that they never had decades ago.
Yet, skepticism over neural networks and their ability to mimic human intelligence remains strong, particularly among those with a strong mathematical background, because neural networks are, fundamentally, statistical instruments. They are nothing more nor less than a way of determining statistics from data and then mining those statistical insights for factors. Even decision making, which seems like a non-statistical activity, is statistical in that it represents the correlation between two spaces: that of data and that of decisions.
This article, however, is not about artificial intelligence but about the very nature of information. The overarching thesis is that all empirical knowledge is statistical.
I say “empirical” knowledge because I want to exclude logical deductions. While premises may be statistically derived from knowledge of the world, logic is itself a non-statistical enterprise by design. While you can argue that “all is statistics” that is hardly useful. There must be some delineation between statistical and non-statistical information.
Our knowledge of the world derives largely from information arriving through our senses. While genetics contributes to how we understand the world, human brains are very general in how they adapt to the environment. In this article, I will focus on the senses and show how statistics can describe how information is processed.
A sensor, like an eye, is a measuring device that receives some state, x, at any given time, t. The state received in the case of the eye is the impact of light on the retina, so it is a state that contains a large number of components, one for each signal from the retina to the visual cortex. In the case of a camera, the state is a set of pixels. Thus, the state could contains millions of elements each with billions of possible values.
All sensors have errors. Therefore, any element of a state vector is a random variable, X, taken from the statistical distribution, p, of the sensor given the hidden state of the environment.
The environment or scene that the camera or eye is viewing has an actual state, y, which is hidden. The goal of the brain or computer is to determine not what that state is but what is the best estimate of that state that I can make and the errors or statistical distribution of that estimate.
The reason why the goal isn’t to determine the actual state is because (1) it is impossible to determine an exact state without a perfect measuring device which doesn’t exist and (2) given that we can’t measure something perfectly any estimate we come up with will have random errors in it. Our job is to make sure that our estimate of that probability distribution is no better and no worse than what is possible with the given measurement device.
If our probability distribution is too narrow, then our estimate will be what is called optimistic and the actual state may stray from the mean more easily than we think. If the distribution is too wide, then the estimate will be called pessimistic and the actual state is less like to stray from the mean than we think. An ideal estimate is neither optimistic nor pessimistic.
To see why, imagine that I have a random variable X drawn from a normal distribution N(0,1/2) so mean 0 and standard deviation 1/2. Now, I estimate that the distribution is N(0,1). In that case, I estimate a 15% probability that X is greater than 1. In reality, the probability is 2%. Likewise, if I estimate it is N(0,1/2) and it is actually N(0,1) then I’ll estimate the reverse. Both are bad if I’m trying to make decisions from that random variable that hedge my bets against its being greater than 1.
Now, in addition to understanding the uncertainty of my sensor, I also need to understand the uncertainty in my state estimation for my environment, z, which is my estimation for the hidden state, y. That is a function of both my sensor uncertainty as well as my uncertainty as to how the external thing I’m viewing is changing with time and how it started.
That has a lot to do with how well I understand the thing I’m looking at. Consider that, with a two dimensional passive sensor like the eye or a camera, three dimensional objects can resemble one another, blend and merge together, and create all kinds of optical illusions. The way to sort that out is to develop an internal model of what you think you are seeing. As you make measurements, you update your model with those measurements to refine what you think you are looking at. The brain does this automatically, which is why you can sometimes look at a thing and not recognize it until you’ve moved around a little.
With more active sensors like radar, the problem is less to do with the ill-defined 2D to 3D problem and more to do with the need to “look” periodically at an object. What the object does in between one look and the next is a statistical process that you need to guess at. Cars behave differently than people and aircraft.
In either case, statistics of the model and the sensor combine into your state estimation: what you think you are looking at and how likely it is to be what you think it is.
This goes for all the other senses as well. All the information coming into the brain is statistical and the brain’s job in processing it is to subject it to a statistical analysis.
This is why everything you’ve ever known is essentially statistical.
What about words, are they statistical?
Words are a way of sharing digital information between people. When understood correctly, words are transmitted precisely though they may not be understood in exactly the same way. Words however are simply reflections of reality. They are expressions of what we perceive and so they are in essence just the end result of a statistical analysis.
Information, what we can understand from nature, is always statistical. What we can understand from other human beings may not be, but ultimately what digital information encodes must have some source in the mish-mash that is reality.