Is Strong Artificial Intelligence Possible? No
We are analyzing the most hype topic of the modern days, which — to our mind — has created an absolutely unfounded outlook on our way of…
We are analyzing the most hype topic of the modern days, which — to our mind — has created an absolutely unfounded outlook on our way of thinking having no scientific base. To write this article I have read a few books on AI (having tears because of a huge number of logical mistakes) and communicated with a lot of machine-learning specialists and data-engineers.
Investigative Technique
To start with, I do not deal with big data, whose specialists like pitching a line — I am a philosopher and so I will analyze core scientific capabilities of AI.
To analyze the topic I will be using philosophic formal-logic language, specially created to investigate the chance of any knowledge (or logical necessity), while the language of formulas is based on absolute entity of concepts and not on the analysis of possibility of their existence (we can only find a “blank spot”, if we talk about chances).
I will also avoid prying into calculating aspects of the topic. I read two wonderful articles of Azim Azhar on problems of calculating possibilities in 2017:
In an autonomous car, we have to factor in cameras, radar, sonar, GPS and LIDAR — components as essential to this new way of driving as pistons, rings and engine blocks. Cameras will generate 20–60 MB/s, radar upwards of 10 kB/s, sonar 10–100 kB/s, GPS will run at 50 kB/s, and LIDAR will range between 10–70 MB/s. Run those numbers, and each autonomous vehicle will be generating approximately 4,000 GB — or 4 terabytes — of data a day. For comparison, the most data hungry mobile internet users in the world, the Finns, used on average 2.9 Gb of data per month back in 2015. (In other words, a self-driving car’s daily data demands are the equivalent of about 40,000 eager Finns surfing the internet.) Full-length version.
Here I will analyze the algorithmic base for the possibility of AI — i.e. the aspect to start with before creating theories about its architecture.
Algorithmic Base of AI
Any algorithm tree is built on two semantic units — YES and NO (ones and zeros). The totality of the semantic units, in return, is divided into two logical operations: AND, OR. Only two of them! Matter to say: in binary logic doesn’t exist operation NOT, it’s pointless. I write “1 OR 0” then I add 1 or 0 and as a result, I have “1 and 0 and 1…”. I haven’t operation NOT — e.x. I can’t write “not 1” because it would be “0” and I can’t write “NOT 1 and 0” because it’s meaning nothing (read the difference between Contrary and Contradiction).
Algorithm semantics, no matter how complicated it is, can not go beyond two semantic units and operations with them. They can lead to a long sequence of elements (concrete set of ones and zeros) with different set of yes and no. In response to a definite sequence, an algorithm gives a forgone conclusion put in the algorithm — the meaning of an algorithm is to work with input and output data. We should note that there is no hierarchy and abstractions in the binary algorithm — there is only sequence of input and output data. Only a sequence! This is why the program code must be precise and unambiguous — it does not allow multi-layers, it does not allow anything except yes and no.
Important! The discovery of the binary in the Western world is owed to Leibnitz, who was above all a philosopher and who was fascinated by the elegance of the binary logic. In his opinion, this logic can describe the integrity of the universe because any logic — no matter how intricate it can be — results in 0 and 1: any event either happened or not. The fact took place or not — any other logic results in the binary just because of this objective principle. In logic any value of A is absolute and not relative and any absolute value — as we know from formal logic — results in either “truth” or “lies”.
The modern proof of the binary logic universalism is quantum computers, which initially anchored hopes to overcome the binary logic, but alas! Even here the binary logic is supreme: unlike “ordinary” computer, the quantum computer principle is based on calculations of all possible variants between 1 and 0 and, ultimately, we receive a probable result but this probability is expressed only in 1 and 0, which is Bloch’s sphere in quantum computer theory.
Briefly, Bloch’s sphere shows all possible calculation variants and, when registering a concrete calculation, the result “inclines” to 0 or 1 depending on its position on the sphere.
The bottom line is that no matter what complicated algorithm is invented, it will result in the binary logic, which is not the law of local mathematics but the law of nature. AI must make a decision of either 1 or 0 in any concrete task, the question being how precisely it can define this task having no full set of data, and so, we continue further.
AI Problematics
But the algorithm idea is not the AI idea. AI begins when the algorithm, having no element in its system, “creates” such an element itself in response to input data. On what grounds? On the grounds of the only utility instrument — operations “and”, “or”, in other words, comparing unknown input data with known ones and giving the most statistically suitable variant as output data. To teach AI we look at what it gave us as an output and give it the new data, which it uses as a new element to make a decision in the future — this way we improve the precision of AI. Fallacious illusion is created by the number of algorithms, which can be realized in these simple constructions. We can actually build the algorithm with a million of interconnected elements, but essentially, AI problematics lies in the possibility to give the most precise output data in response to the maximally unknown input ones.
AI problematics lies in the possibility to give the most precise output data in response to the maximally unknown input ones.
This problematic leads to two ways of creating AI. We will either try to ultimately specialize the algorithm so that the unknown data in the concrete area become less, or they will be to a greater degree “similar” to the known ones. As a matter of fact, this is not artificial intelligence but ultraprecise machine learning and, as we will see later, the less it will remind “intellect”, the better it will work. That is why specialized neuronets work and work well (e.g. image recognition), but as soon as a new specialization appears, they break down immediately.
The second way is to make AI the maximal intellect, i.e. maximally similar to our mind. This second way implies inclusion such a fine thing as creativity into the algorithm, so that it will not be linked to a concrete specialization and could make diverse decisions, which means it could be the intellect at least a bit. But in both cases we can face three unsurpassable natural setbacks connected with algorithms in general:
- Algorithmic limit.
- Impossibility of an abstract algorithm.
- Complicated nature of objective data.
In the next article we will investigate these problems, study the AI supporters’ points (including Bayes’ theorem) and conclude the main point of why Strong AI is not possible.
The main point of all investigation: the less machine learning similar to strong AI the better it works. In the first part, we defined the general algorithmic base of all algorithms (including AI and quantum algorithms) and the main AI problematics: it lies in the possibility to give the most precise output data in response to the maximally unknown input ones. Let’s investigate the theory of algorithms and 3 main arguments for strong AI.
1. The Algorithmic Limit
Let us immerse into the theory of algorithms. There is such a thing as “algorithmically undecidable problems” in algorithmics which are useful for our study. We will take three as an example.
- Halting problem, undecidability of which was proven by Turing, might sound like this: it is impossible to determine if an algorithm will run forever or halt if initially we did not specify the input that it will run forever or halt.
- Entscheidungsproblem (decision problem) might be formulated like this: it is impossible to prove algorithmically the validity or invalidity of another algorithm work. That means we can not create an algorithm which — without our concrete input — will be able to define the truth value of formal operations.
- We also find it interesting to study Kolmogorov complexity, which says that though a computer is capable of searching for consistencies, it is unable to find the best of those. If we make it search for consistencies in heterogeneous data, it will find the first of them, and it is doubtful that this one will be the best. We can make it search further but for the algorithm it will be new data, different from the initial. This will mean that the precision of the output result will not depend on the input task but the mathematical context, which complicates the decision in real life.
These and other tasks generally result into Gödel’s incompleteness theorem. Let us formulate it so: for any inconsistent system of axioms or formal instructions we can formulate the true statement which this system will lack — i.e. each system is inconsistent.
In a more applied sense, all these and other examples show that an algorithm can not give a correct response, if the unknown elements volume is a half more than its own pre-programed volume — the adequacy of the algorithm (not even its precision) depends on this problem. In the incompleteness theorem we can make a true statement consisting of more than 50% of other axioms. Even with 80% volume completeness, we are not sure about its precision, to say nothing of the volume of less than 50%. For the kids: if an algorithm has two baskets for blue and red eggs, it will define a purple one and will put it in one of the baskets, as purple has both colours, but it will be unable to adequately define a white egg because it is more that 50% different from red and blue — it will not even understand that this is an egg. Adding the
information about white eggs will not “improve” the algorithm but will add new knowledge, i.e. expand its volume, which means it will distance the algorithm from the initial task.
2. The Impossibility of an Abstract Algorithm
If a visual algorithm tooled to define a cat on four criteria (body shape, body size, fur, ear shape) is given a bald cat with no ears, it will respond that this is not a cat. We can specify that such cats also exist, i.e. input informatory knowledge or options. We can even change the priority. BUT! We change the priority for ourselves but we change the sequence for the algorithm. However, let us assume that the algorithm learned how to define all cats irrelevant of their life journey. For better precision we will add a new algorithm that can define a cat on its voice. We can increase the algorithm precision and note that in all cases we do not improve the algorithm but input new content exceeding those 50% of unknown data. Above the two algorithms we will create the third, which will make the final decision. What will we receive? A super-algorithm which can define cats — and that is all. This works wonderfully with stable input data where a separate item does not differ drastically from another one. This is how credit score works, which processes loan applications based on previous borrowers. It learns infinitely on the loan granting statistics.
The idea of joining specialized algorithms into one decision-making super-algorithm seems appealing. But its sense will not change — it will still make “and, or” and output “yes, no”, though the increasing heterogeneity of unknown data will be tending to exceed 50% of the known data. The thought that AI will work better if we join a lot of strong algorithms in one is similar to the thought that we might create a fastest car joining Bugatti and McLaren. The number of known data will not change because of this. In fact, at some moment it will be more useful to make AI more stupid because a driverless car defining all kinds of cacti is less useful than a car that simply defines that there is an object in front of it which should be avoided. In addition, what is the criteria for the super-algorithm to operate but the direct data? Its data are the other algorithms, as we showed above, an algorithm can not define truth with reference to other algorithms (“Entscheidungsproblem”). It can do this only statistically, and the lacking 50% are tried to be filled with statistic data, which an algorithm must check, but why this is useless can be seen while analyzing theorists’ arguments. And for the time being we will think about one more AI base — learning.
3. Retraining
To achieve “intellectual abilities” AI must always be learning. Learning? Sorry — retraining, i.e. not acquiring new knowledge but creating new items based on the old ones or inputting them into new categories. In fact, it can not move any further from its input content or — to be more precise — further from its input categories (qualities). AI does not have to know concrete information because its nature is not in content knowledge but in the demonstrative one. We, miserable people, unlike AI, can conceive something not in the frames of given categories but all other, including the most surprising. If we apply this method to AI training, we will add surprising categories, and AI will have to constantly retrain. But constant neuronet retraining will result in returning to its old errors and retraining again because if we formalize the whole process, an algorithm in general always realizes the simple sequence of items. An algorithm does not consider the nature because it has only one nature. And also, what will it be learning? Strictly speaking, an algorithm accepts any knowledge as absolute, however, we see ourselves that there is no absolute knowledge — everything depends on the context. Consequently, placed in different contexts, an algorithm will always learn contradictory data and, as a result, it will be useful in only the totally generalized spheres, which literally shows its uselessness. Machine learning is good because it learns on new data, where “new” are all the things not having been included in the concrete task yet. For instance, we were looking for fraudsters on the Internet based on their abnormal behaviour and recently we were happy to learn that the majority of them visit the forum at www.uspeshnye-denigy.narod. But as soon as a new specialization appears, this knowledge might become not only useless but destructive for the precision. For example, we would want AI to offer adverts to appropriate people but it would exclude all “Denis’s” even if they suit all other parameters but they have the combination “deni” which was present in the web address. I might be contradicted by saying that this problem is solved by the quantity of data (situations, context, etc.) for learning but this will not solve the problem. Let us see.
4. The Nature of Data
Even if we input all the world’s knowledge into AI and it will perceive the reality from all the outlooks, it will give nothing but, on the contrary, this will make it ultimately useless and acting on a chaotic algorithm. Firstly, this super-precise neuronet will at some moment make a lot of errors because things in the world are very often similar. It is possible, that the precision will become a counter-problem, when it starts to classify really new data as known. High precision is not only a good thing but a bad too.
What must be the precision for AI to become intellect? The thing is that human intellect is not tailored to precision — its nature is in spontaneous decisions, which can reveal the perceived object from new sides and we will understand its nature differently. Heat is not perceived by a child through the gradation of warmth between objects which they touch, but — as it often happens — through the spontaneous decision to touch a boiling pot. If an algorithm could do something unknown to it to receive the new unknown or link it with the known, then AI would become not only strong but divine. But the problem of the first stage is that it has only those input data which we input in it, and it will not differentiate between its nature. Think about your own learning: any worthy knowledge makes you reconstruct all your habitual categories of the perceived world and to create the new ones. This is the way to learn about inconsistencies of the things in the world. And we always take into account the fraction of the unknown in everything: as soon as we encounter new manifestations of a familiar thing, we start to reconstruct our mentality and try to reconcile the contradictions. We, miserable people, work with what we have outside — this influencing factor, but not with what is inside of us. This is probably the process of forming the mentality from the very infancy and, apparently, this process happens continuously. Unlike a computer, we do not store all the data in our memory but as if create them again — this question requires a separate philosophic analysis but we can definitely say that we can work with what we only have not addressing our memory. Otherwise, what is our memory based on when there is yet nothing in it? An algorithm can not work like this — it always works with a reference to an existing database.
Three arguments of AI theorists
I have analyzed the theoretical possibilities of AI not mentioning the “intellect” itself but let us face the AI theorists and question their main points.
One algorithm can solve different tasks
Based on this assumption there emerges the supposition that we can make the algorithm ultimately universal, detriment of its efficiency to solve separate tasks. In fact, we create a logical structure universal for the majority of laws of nature and system processes. This is the way to develop one algorithm which will play chess and catch loan fraudsters. This will be one and the same neuronet input with different data. If we continue this logic, we can assume that we will create AI, built on ultimately abstract and universal laws. This neuronet will solve all the tasks “correctly” because it is in no contradiction with the basic laws of the Universe.
Antithesis: We can see here some popular logical mistakes. Firstly, strictly speaking, we do not know “the laws of nature” — we know the laws of concrete objects and their affinity between different objects, though we are not even sure that the same laws create the same registered result. Secondly, the concept of “universality” exists only in our mind — we join the similar manifestations together in one structure to work with the data. But any universality results in the lack of content. In the conclusion A≠-A we have a maximally abstract structure which is important for differentiation of objects but it says nothing about the data themselves. If we input the data, the structure will not say anything about them — we will have to show manually: “this is a hot-dog” and “this is not a hot-dog”. Only having the data, neuronet will start learning something but it will not learn anything by the structure. Consequently, for the strong AI we will have to collect the data from all spheres, and they have to be maximally concrete. If we input the parameter “hardness”, we will have to name all types of hardness. For the specialized machine learning we will have to input only those types of hardness which it might meet — this is the good thing about it but we still face the “50% law” because all other types of hardness will require new training on them. We can not detect the abstract law which will be able to define these types by itself. “Why, we are able to show the algorithm what is hard and what is symbolic!?” But the concept of hardness is already a content one, which an algorithm has had to be trained on, based on different types of hardness. And to learn to differentiate between it and a symbolic one, it will need additional learning on differentiation — i.e. it will need to differentiate between a great number of the hard and a great number of the symbolic. I remind that for an algorithm all data have one and the same nature and it has nothing abstract.
Statistic Learning and Bayes’ Theorem
This theorem says that if one event happened then it is statistically probable that the event connected to it will happen too. Many think that this theorem will become the fundament of the strong AI. Bayes’ algorithm works very simply: different hypotheses are input in it and it starts checking them on the basis of continually received statistics. So, according to this theory, AI can draw any data from assumptions put in it. Then it can create hypotheses itself and check them. It sounds hopeful, and we are given a lot of examples where machine learning defines speech, images, sounds, etc. based on the infinite stream of input data.
Antithesis: Bayes’ theorem can only work in one context with one set of strictly defined concepts, that is why is works so well with, for example, machine vision. When you come to think about it, an algorithm can not output a hypothesis in a strictly defined context. If the concepts start to become vague, an algorithm starts to output weird theories, until it comes to distinct concepts but then it will work only with them. To achieve high precision even in the specialized machine learning we will need a great number of data. If the strong AI creates a lot of assumptions and checks them on the basis of vague concepts, it will create the world of formal statistics and at some point this statistics will start contradicting content knowledge. Let us think about how we make and check our hypotheses. We obviously do not do it based on statistic data, do we? However, we are sure about some conclusions drawn from very few data. Why so? Strictly speaking, we are never totally sure, but it is enough for us to have some assumptions to make appropriate decisions. We can prioritize tasks in any context and we often have to make statistically unfavourable but most appropriate decisions in our case. Such decisions are the majority in our lives and they are not necessarily the hard ones — the easiest decisions always require context perception. Let us imagine that we gave AI the task to look after a child at home. Then a fire happens. What will AI have to do: stop the fire or take the child out of the house? Based on the statistics of people deceased during fires, it will have to take the child out of the house and sacrifice the house to the fire. But what if the fire was small but because of AI’s reaction to it became a big one while AI was saving the child and not extinguishing the fire? Even if AI learns to define the size of a fire based on the scientific data on fire spreading, will it be more useless than a common fire alarm? The moral
is: any statistics can be useful only in a field-specific context, in all other cases it is excessive. As soon as AI has two vague tasks, in the majority of cases it will make a wrong decision even if it knows all the conditions. Besides, in case of the fire a person will try to extinguish it and, after a failure to do so, they will start saving a child — it is highly likely to be the best decision. However, if AI is trained with “try”, the number of unknown data at some point will exceed 99%, not 50%.
AI evolves itself
Then the theorists claim that in the long course of learning AI will find the evaluation criterion and make a correct decision by itself. It will be able to find hidden objects connections and in such way will learn to cure diseases, find black holes and register subatomic particles — in general will make scientific discoveries because AI can continually analyze data and so it might generate “intuitive mind”.
Antithesis: Our world does have consistent patterns and a great number of hidden connections. A scientist making a discovery finds these connections or uncovers the influence of a factor, which was not known before. In this way a biologist searches for genetic and chemical patterns in a disease and a physicist searches for physical processes in natural phenomena. In all these cases, a scientist studies an object according to the nature of data. There are different methods of research for objects of quantum size and for big physical objects. This method was developed according to the theory of relativity which itself is insane to comprehend. This theory was partly developed on the basis of observation of consistent patterns but these observations were also the base for totally incorrect Aether theories dominating in the 20th century. The same way the theory of rays coming out of our eyes and reflecting from objects dominated in the 19th century as the consistent patterns indicated so. Discoveries occur when a principally different point of view emerges — then the consistent patterns are enriched with new content! Basically, they become other factors: yesterday the god of rain sent raindrops from the sky and today a cloud is doing it — these are completely different fact and logic contents for one phenomenon. A person makes such discoveries in an extraordinary tension of their imagination and only after that defines consistent patterns. We can create an infinite number of fantasies from consistent patterns and anomalies. AI might learn how to define or predict these or those diseases based on tests but it will do it only with very concrete data in a very concrete context. In fact, this is just automation of a doctor’s work. AI will never be able to discover a principally new factor in a disease because we will not be able to input the idea of a disease nature in it. Why? Because in the absolute sense we ourselves do not know the nature of any concrete disease. If you are sure that “scientists know”, answer a simple question: why does in the similar conditions one child catch a cold and another does not? After some argument you will say that each person has their “individuality” — and that is correct. But the nature of the “individuality” itself is unknown to us and so we know at least far from everything about a disease. And this means that we can not input a strong criterion — it can only sort different connections of a disease statistically which will require checking (e.g. people with long hair suffer from cancer at 40–45 years old). As it is said in Kolmogorov complexity, an algorithm can not define the best consistent pattern. It can only find the hidden ones from the present set of data.
It can be said that an algorithm can not change, it can only be specified, but don’t the changes make the basis on “evolution”? In a narrow technical sense, which suits us, evolution implies more precision in solving applied tasks, based on a more differentiated perception of these tasks. The Internet “evolved” when more applied services and more applied functions in them appeared. For any evolution, including biological, it is necessary to have a new task perception, a new approach, which is also true for the scientific discoveries. But there can not be anything new for an algorithm — it only has an existing database, and if this database already contains all the world’s knowledge, AI will be the intellect. And what about the complicated operations which AI is already making? Do not confuse AI with automation, and I have not found any reliable sources where a robot made a surgery to the full extent. In a more or less reliable article it was literally written: “Dupont envisions autonomous robots assisting surgeons in complex operations, reducing fatigue and freeing surgeons to focus on the most difficult maneuvers, improving outcomes”. That means a robot was helping and AI in this case was fulfilling one specialized function — defining where it was and then transferred the controls to a surgeon.
I waived the arguments that are indirectly included into the abovementioned, though more often they are theories on how AI will work and not on its principal existence — and these are fantasies. On the other hand, I tried not to make childish conclusions about Chinese room and followed the logic. As a conclusion:
Main Thesis on Impossibility
It is impossible to create a learning system of data processing (i.e. the one that will continually improve the precision of the output data as a response to the maximally unknown input ones), where the number of unknown input data exceeds 50% of the existing, which will be the base of learning, and their quality possesses nominally infinite gradation of attributes between any two items. The main setbacks are:
- Algorithmic limit.
- Impossibility of multifunctional specialization, which will contradict itself.
- Impossibility to rely on statistics and hypotheses because the more they transcend the context, the less objective they become.
- Impossibility to create an “abstract” or “intuitive” algorithm because abstractions and intuition are based on the unknown while an algorithm accepts only clear commands.
- Continual training will not make AI better, it will make it move in circles and re-train because the criterion for the effective decisions is only the concrete situations which very often are contradictory.
In the view of the abovementioned, we can conclude that the creation of artificial intelligence is possible if you possess metaphysical knowledge (including the one about the future) and know how to solve any contradiction — though this is argued by the supporters of the constantly happening existence.