How to write artificial intelligence. Artificial Intelligence. Contradictions and problems of creation. Why artificial intelligence defeats humans

Artificial intelligence created a neural network December 15th, 2017

We have lived to the point where artificial intelligence creates its own neural network. Although many people think that they are one and the same. But in fact, not everything is so simple, and now we will try to figure out what it is and who can create whom.


Engineers from the Google Brain division demonstrated AutoML this spring. This artificial intelligence is able to produce its own unique AI without human intervention. As it turned out quite recently, AutoML was able to create NASNet, a computer vision system, for the first time. This technology seriously surpasses all analogs created earlier by people. This AI-based system could be a great help in the development of, say, autonomous cars. It is also applicable in robotics - robots will be able to reach a completely new level.

The development of AutoML is based on a unique reinforcement learning system. We are talking about a managerial neural network that independently develops completely new neural networks designed for certain specific tasks. In the case indicated by us, AutoML aims to produce a system that most accurately recognizes objects in a video in real time in real time.

Artificial intelligence itself was able to train a new neural network, tracking errors and correcting work. The training process was repeated many times (thousands of times) until the system was fit for work. Curiously, she was able to bypass any similar neural networks currently available, but designed and trained by a person.

At the same time, AutoML evaluates the performance of NASNet and uses this information to improve the child network; this process is repeated thousands of times. When engineers tested NASNet on ImageNet and COCO image sets, it outperformed all existing computer vision systems.

Google officially stated that NASNet recognizes with an accuracy of 82.7%. The result is 1.2% higher than the previous record, which was set in the early autumn of this year by researchers from Momenta and Oxford experts. NASNet is 4% more efficient than its peers with an average accuracy of 43.1%.

There is also a simplified version of NASNet, which is adapted for mobile platforms. It surpasses analogues by a little more than three percent. In the near future, it will be possible to use this system for the production of autonomous cars, for which computer vision is important. AutoML continues to produce new hereditary neural networks, striving to conquer even greater heights.

This, of course, raises ethical questions around concerns about AI: what if AutoML builds systems at such a rate that society simply can't keep up? However, many large companies try to take into account the security concerns of AI. For example, Amazon, Facebook, Apple and some other corporations are members of the Partnership on AI to Benefit People and Society. The Institute of Electrical and Engineers (IEE) has also proposed ethical standards for AI, and DeepMind, for example, announced the creation of a group that will deal with moral and ethical issues related to the applications of artificial intelligence.

However, many large companies are trying to take into account the security issues of AI. This, of course, raises ethical questions around concerns about AI: what if AutoML builds systems at such a rate that society simply can't keep up? The Institute of Electrical and Engineers (IEE) has also proposed ethical standards for AI, and DeepMind, for example, announced the creation of a group that will deal with moral and ethical issues related to the applications of artificial intelligence. For example, Amazon, Facebook, Apple and some other corporations are members of the Partnership on AI to Benefit People and Society.

What is artificial intelligence?

The author of the term "artificial intelligence" is John McCarthy, the inventor of the Lisp language, the founder of functional programming and the winner of the Turing Award for his great contribution to the field of artificial intelligence research.
Artificial intelligence is a way to make a computer, computer-controlled robot or program capable of also thinking intelligently like a human.

Research in the field of AI is carried out by studying mental capacity human, and then the results of this study are used as the basis for the development of intelligent programs and systems.

What is a neural network?

The idea of ​​a neural network is to assemble a complex structure from very simple elements. It is unlikely that one single part of the brain can be considered intelligent - but people usually do surprisingly well on an IQ test. Nevertheless, until now, the idea of ​​creating a mind “out of nothing” was usually ridiculed: the joke about a thousand monkeys with typewriters is already a hundred years old, and if desired, criticism of neural networks can be found even in Cicero, who sarcastically suggested tossing tokens with letters into the air until you are blue in the face, so that sooner or later a meaningful text will turn out. However, in the 21st century, it turned out that the classics were sarcastic in vain: it is the army of monkeys with tokens that, with due perseverance, can take over the world.
In fact, a neural network can even be assembled from matchboxes: it's just a set of simple rules by which information is processed. An “artificial neuron”, or perceptron, is not called some special device, but just a few arithmetic operations.

The perceptron works nowhere easier: it receives several initial numbers, multiplies each by the “value” of this number (about it a little lower), adds it up and, depending on the result, gives out 1 or -1. For example, we photograph an open field and show our neuron some point in this picture - that is, we send it random coordinates as two signals. And then we ask: “Dear neuron, is this heaven or earth?” - “Minus one,” the dummy replies, serenely looking at the cumulus cloud. “It is clear that the earth.”

“Poking a finger in the sky” is the main occupation of the perceptron. No accuracy can be expected from him: you can just as well flip a coin. The magic begins in the next stage, which is called machine learning. After all, we know the correct answer, which means we can write it into our program. So it turns out that for each incorrect guess, the perceptron literally receives a fine, and for a correct guess, a bonus: the “value” of incoming signals increases or decreases. After that, the program is run through new formula. Sooner or later, the neuron will inevitably “understand” that the earth is below in the photo, and the sky is above, that is, it will simply begin to ignore the signal from the channel through which x-coordinates are transmitted to it. If such an experienced robot is slipped another photo, then it may not find the horizon line, but it certainly will not confuse the top with the bottom.

In real work, the formulas are a little more complicated, but the principle remains the same. The perceptron can only perform one task: take numbers and sort them into two piles. The most interesting thing begins when there are several such elements, because the incoming numbers can be signals from other "bricks"! Let's say one neuron will try to distinguish between blue and green pixels, the second will continue fiddling with the coordinates, and the third will try to judge which of these two results is closer to the truth. If you set several neurons on blue pixels at once and sum up their results, you will get a whole layer in which the “best students” will receive additional bonuses. Thus, a sufficiently spreading network can shovel a whole mountain of data and take into account all its errors.

A neural network can be made using matchboxes - then you will have a trick in your arsenal that you can entertain guests at parties. The MirF editors have already tried it - and humbly acknowledge the superiority of artificial intelligence. Let's teach mindless matter how to play the game of 11 sticks. The rules are simple: there are 11 matches on the table, and in each move you can take either one or two. The one who takes the last one wins. How to play it against the "computer"?

Very simple.

We take 10 boxes or cups. On each write a number from 2 to 11.

We put two pebbles in each box - black and white. You can use any items - as long as they are different from each other. That's it - we have a network of ten neurons!

The neural network always goes first. To get started, look at how many matches are left, and take boxes with that number. On the first turn, it will be box number 11. Take any pebble from the right box. You can close your eyes or flip a coin, the main thing is to act at random.
If the stone is white, the neural network decides to take two matches. If black - one. Put the pebble next to the box so you don't forget which "neuron" made the decision. After that, a person walks - and so on until the matches run out.

Now comes the fun part: learning. If the network won the game, then it must be rewarded: throw one additional pebble of the same color that fell during the game into those “neurons” that participated in this game. If the network has lost, take the last used box and take out the unsuccessfully played stone from there. It may turn out that the box is already empty, in which case the previous similar neuron is considered to be the last one. During the next game, hitting an empty box, the neural network will automatically give up.

That's all! Play a few games like this. At first, you will not notice anything suspicious, but after each win, the network will make more and more successful moves - and after about a dozen games you will realize that you have created a monster that you cannot beat.

Sources:

In a series of articles, we will talk about new approaches in AI, personality modeling and BIG Data processing that are not available to most AI specialists and the public. The value of this information is that it has all been tested in practice and most of the theoretical developments have been implemented in applied projects.

Many of you have heard about modern technologies that are associated today with the concept of artificial intelligence, namely: expert systems, neural networks, linguistic algorithms, hybrid systems, cognitive technologies, simulation (chat bots), etc.

Yes, many companies use the above technologies to solve the problems of their clients in information processing. Some of these companies write that they create or have created artificial intelligence solutions. But is it intelligence?

The first thing we will do is define what intelligence is.

Imagine that a computer with intelligence exists. And you have the option to communicate with him by voice or via text messages.
Questions:
  • Is it necessary to build in the computer intelligence program features of the language (describe semantics, grammar, morphology) or could it learn languages ​​on its own through interaction with a person?
  • If you were given the task of teaching a computer a language, what would you do?
  • If only you took part in the training, then who would he look like?
And now, answer these questions again, with the only difference that you would have to teach:
  • A thoroughbred parrot, theoretically capable of communication.
  • Newborn baby.
We have just done intellectual work, and I hope that many of you have received new knowledge. And that's why:
  • First, I asked you to imagine (imagine) "what would happen if...". You acted in a changed environment. Perhaps you lacked information and knowledge, it was difficult for you.
  • Secondly, you turned out to be capable of learning, cognition, you found an analogy familiar to you yourself or met it in the text, or perhaps you used the Internet or asked a friend for advice.
There are many approaches to defining intelligence. We will define its main features ...

First of all intelligence is the ability to learn and imagine.

In order to create an algorithm that simulates intelligence, the first thing to do is give him the ability to learn, no knowledge is required to invest in it.

Let's go back to our child example to describe the learning process in more detail.
What principles are at work when a child learns to understand and speak a language?

  1. The more often he hears a word in different contexts, the faster he will remember it. The first word he will say is likely to be “mom”.
    "Mom loves you"
    "Mommy will wash your hands"
    "Mom kisses you"
    "Where's mom?"
    Learning happens at the expense of data redundancy.
  2. The more information channels are involved, the more effective the training:
    the child hears: "Mom loves you."
    the child sees the smile of the mother.
    the child feels the warmth emanating from the mother.
    the child feels the taste and smell of mother's milk.
    the child says "mom".
  3. The child will not be able to reproduce the word right away. He will try, try. "M", "Ma", "Mom", "M" ... "Mom". Learning takes place in action, each next attempt is corrected until we get the result. Trial and error method. It is very important to receive feedback from reality.
  4. Don't educate your children, they will still look like you. The child strives to be like the people around him. He imitates them and learns from them. This is one of the mechanisms of personality modeling, which we will talk about in more detail in future articles.

What is the role of the imagination?

Imagine that you are driving a car on an unfamiliar highway. You pass a speed limit sign of 80 km/h. Drive on and you see another speed limit sign, but it's splattered with mud and almost indecipherable. You are moving at a speed of 95 km/h. What will you do? While you were making a decision, a police officer looked out from behind the bushes, and you saw a radiant smile on his face. In your head, the “image of the sign” was instantly completed, and you understood why the policeman was standing there, and that you urgently need to apply the brakes. You slow down to 55 km/h, the smile on the policeman's face instantly disappears, and you drive on.

And one more interesting example works of imagination from the animal world is the observation of magpies. A magpie buried food in a wasteland in front of the other magpies. All the magpies flew away, but our magpie returned to the wasteland and hid the food. What happened? She imagined (imagined) “what would happen if” another magpie flew in, who saw where she hid the food. She modeled the situation and found a solution to avoid it.

Imagination is a simulation of a situation on arbitrary conditions.

As you have already seen, intelligence is not a knowledge base, it is not a set of programmed reactions or following predetermined rules.

Intelligence is the ability to learn, cognize and adapt to changing conditions in the process of solving difficulties.

Don't you think that in defining intelligence we have lost sight of some important components Or forgot to tell you something?

Yes, we lost sight of perception, and forgot to talk about memory.

Imagine that you look through the peephole and see part of the letter:

What is this letter?

Maybe "K"?

Of course not, it's the Japanese character for "eternity".

Before you just set a task (problem). Most likely, you found a similar image of the letter "K" in your head and calmed down.

Your intellect perceives everything as images and looks for a similar image in memory, if it is not there, then a binding (anchor) is formed to existing images and thanks to this you memorize new information, gain skills or experience.

Image - a subjective vision of the real world, perceived with the help of the senses (channels of information).

Perception is subjective, because it depends on the sequence of learning, the sequence of appearance of images in a person's life and their influence.

Perception begins with light/dark pattern recognition. Open your eyes - light, close - dark. Further, a person learns to recognize more and more complex images - “mother”, “dad”, a ball, a table, a dog. We receive reference data, and all subsequent images are an add-on to the previous ones.

From this point of view, learning is the process of building new relationships between perceived images and images that are already in memory.

Memory serves to store images and their relationships..

A imagination is the ability to complete an unfinished image.

To summarize, here is another experiment from the animal world:

The chimpanzees were put in a cage, and inside the cage they hung a bunch of bananas quite high from the floor. At first, the chimpanzee jumped, but quickly got tired, and seemed to lose interest in bananas and sat down, barely paying attention to them. But after a while the monkey picked up a stick left in the cage and shook the bananas until they fell. On another occasion, a chimpanzee managed to connect two sticks to get bananas, since each stick alone was not enough to reach them. The animal coped with a more difficult task, unexpectedly placing a box under the bananas and using it as a step.

The chimpanzees showed her familiar “bunch of bananas” image. But the image for her turned out to be incomplete - they cannot be taken out and eaten. But since this was the only source of food available, the unfinished image built up internal tension and demanded completion.

The means for solving the problem (completing the image) were always available, but the emergence of a solution required the transformation of existing images (it was necessary to learn with the help of imagination). The chimpanzee needed to imagine (mentally list all possible options): “what will happen if I take a stick”, “what will happen if ...” and the most likely assumptions to check in practice, try and get feedback, imagine again, try, get feedback connection and so on until we complete the image (learn).

If recognizing the image of the hieroglyph "eternity" would be a matter of life and death for you, then you will definitely find a way to do it.

From a more popular language, let's move on to a technical one and formulate the basic concepts that we will use further:

  • The intersection of redundant information from different information channels creates an image.
  • Learning is transformation information flows in the information field.
  • Information field (memory) - storage of images and their relationships.
  • Imagination is...
    - "Dear reader, complete the image of the imagination yourself, using redundant information from your life experience and this article."
  • Intelligence is the ability to learn and imagine.

At the beginning of the article, we listed the technologies associated today with artificial intelligence, now you can independently assess how much they correspond to the concept of intelligence.

In the next article, we will consider such a task as an intellectual search for information on the Internet. Let's define the criteria of intelligence, develop practical approaches and "feel" a real application that implements the principles described in this article.

The article does not claim to be true, it is part of our developments and research. Write comments, supplement the material with your examples or thoughts. Learn and imagine...

Not everyone knows what is hidden behind the phrase "artificial intelligence" or AI (Artificial Intelligence). Most people probably think of AI as a computer that has been programmed to "think" for itself, make intelligent decisions, and respond to stimuli. This idea is not entirely correct. No computer and no machine can really think - because it requires the presence of consciousness, which the "soulless machine" does not have. A computer can only do what a person tells it to do.

Briefly about AI programming

AI programming is not about teaching a computer how to think. Rather, he will be programmed to learn and solve specific problems on his own based on his experience. But here, too, we are not talking about our own thinking, but about imitation. This also applies to the decisions the AI ​​makes. can weigh options and then make choices. However, its selection will always be based on the parameters that were previously programmed.

Thus, artificial intelligence can only do what was predetermined for a computer, but better, more accurately and faster than a person. By the way, if you want to learn how to program, take a look at our tips for beginner programmers.

Use of artificial intelligence

Artificial intelligence is already being used in many areas, such as complex computer games and search engines. When programming AI, a complex of disciplines plays an important role, and not just computer science or mathematics. Philosophy, psychology, neurology and linguistics are of great importance.

Artificial intelligence is divided into neural and symbolic (strong and weak). The first attempts to mimic the structures and functions of the human brain. The latter focuses on the relevant problem and result.

In everyday life, for example, artificial intelligence is programmed and used in robotics. It serves to control production processes or simply performs household tasks. also used for visualization. The most popular example is face or fingerprint recognition.

Another step in the creation of artificial intelligence is knowledge-based systems. Then the data related to programming is entered into the program. This allows artificial intelligence to logically and independently give answers to questions asked. However, these “independent answers” ​​are based only on the knowledge that artificial intelligence was originally endowed with.

In this part, neither more nor less, it is told about the algorithm that underlies intellectual activity. In parallel, we will try to answer the question of how similar phenomena could arise in natural intelligence. Of course, we will not reveal all the secrets of intelligence, we will not create a twilight brain, but we will find out the principles, the main direction of where to dig further. We will learn more about the human intellect. There will also be practical sketches of algorithms that can be programmed on a computer right now.

But first, briefly, what we have reached in the previous parts (). I myself have already forgotten what was there, so I’ll have to remind myself, otherwise I won’t even be able to tell further. :) Who remembers - skip this section.

What happened in the past

Penrose, in his wonderful books, believes that the brain is capable of making absolutely true judgments, and argues that the basis of mental processes are physical processes that can perform eternal calculations in a finite time. Moreover, these processes calculate not just anything, but the absolute and irrefutable truth in the truest sense of the word. And the brain can "pull" these processes to think. And that is why such processes are needed for the brain to work. And, although such processes are unknown to today's physics, Penrose believes that a deeper level of the universe is a different reality based on such processes.

In many ways, Penrose is right about this other reality, and even more so, one day we will share equally interesting and similar ideas about what lies at the foundations of the universe. But all the same, Penrose hurried, jumped, so to speak, a few steps. The vast (if not all) majority of intellectual activity can be explained in more prosaic and mundane terms.

The undoubted merit of Penrose is that he convincingly explained why intellectual activity can in no way be based on formal logic (or in other words, on strict algorithms). More precisely, Penrose showed that absolutely true logic (it is also intellectual activity in Penrose's understanding) is impossible on known physical processes. But we understood it in our own way, namely that intellectual activity does not need absolutely true logic. Or, in other words, the human intellect is plausible, it produces good approximations to the truth, but there is still a possibility of error. And this radically changes the matter, namely, it completely changes approaches to how to explain natural intelligence, and how to build artificial intelligence. And such intelligence can be modeled on a Turing machine, programmed on a conventional computer, however, it is better to have an architecture of greater power and with inherent parallelism, for example, quantum or optical.

Now let us recall what kind of fuss is around the truth and not the truth of logic. Mathematical and computer calculations, human reflections, logical constructions and inferences are associated with the concept of an algorithm or a formal system (in fact, this is the same thing). The execution of an algorithm (it is also the application of the rules of a formal system) is a model of all kinds of calculations, reflections and other physical processes (or, at least, a fairly good approximation). An algorithm is a set of instructions that some abstract computer (Turing machine) can execute sequentially in steps.

There is a concept of a strict algorithm (it is also a complete and consistent formal system). On the same set of input data, a strict algorithm for final the number of steps will give the same answer. As applied to formal systems and logical reasoning, this means that in a finite time for the initial conditions one can find the true (consistent and unambiguous) answer. Such calculations are also called deterministic.

But there are also non-deterministic (non-strict) algorithms in which these conditions are not met (they are also incomplete/contradictory formal systems). For an algorithm, non-observance of the finiteness condition means that it is not known whether the algorithm will finish its calculation, and it is not clear how to find out about this in advance. A non-deterministic algorithm may finish its computation, or it may wander forever, but what exactly it will do is a mystery that can be guessed forever. For formal systems, the proof of the truth or falsity of the original statement is incomprehensibly finished someday or will continue forever. Inconsistency means that within a formal system, you can pick up different chains of rules that, for the original statement, will give both a true and a false answer. For the algorithm, this means that different results can be obtained on the same data.

Many, including Penrose, say that intellectual activity is based on strict formal logic. But there is a global ambush here. Godel's long-established theorem says that a formal system cannot be both complete and consistent. Completeness means that the formal system knows everything about its field of knowledge. Including such a system can judge the truth of itself. If someone from the outside creates a formal system, then it can work, producing the correct results and not caring at all about whether this someone created it correctly. If the formal system tries to make sure that it is done correctly, then it will fail. Because our system is consistent, but not complete. If the system can judge the correctness of itself (complete), then such a system will have internal contradictions, and the results of its activities are not necessarily correct. Why? Including because the question of self-testing (self-knowledge, self-reflection) belongs to the category of eternal calculations.

What follows from this? It turns out (according to Penrose) that the human intellect is a complete and consistent system, because it can generate true statements and at the same time monitor the correctness of itself. But according to Gödel's theorem, this is impossible. So we have to involve unknown physical processes for the work of the intellect, which in a short moment can look at eternity, find the answer and return this answer to the brain. But as we have already noted, the intellect does not have to be complete and consistent, although it can very plausibly pretend to be true and infallible.

The second ambush is that physics does not know the entities that formal logic operates on. Namely, formal reasoning is often based on the concepts of natural numbers, the concepts of truth and falsity. Natural numbers are those in which 1+1 = 2, 2+1 = 3, and so on. True = 1, false = 0, negation of true = false. All units are absolutely equal to each other, the permutation of the terms of the sum does not change, and so on. Yes, the trouble is, in our world there are no such particles, such things or processes that could be uniquely compared to natural numbers, and at the same time, so that the rules of arithmetic are fulfilled for these entities in any ranges. In some ranges, the arithmetic is approximately correct, but outside the global failures begin. Therefore, formal logic, roughly speaking, operates without understanding what, entities, the essence of which is rather vague. Moreover, arithmetic itself does not apply to complete and consistent systems, such fun fact. And in general, it seems that such concepts as absolute truth, natural numbers, in principle, cannot exist. How and why, will be in the following parts.

What follows from this? All processes, all calculations that take place even in the brain, even in computers, are inherently either incomplete or contradictory, although at the same time they give a good plausible approximation to complete and consistent calculations.

Why does Penrose dislike contradictory formal systems, why does Penrose deny them the right to be the basis of intellectual activity? As we remember, in an inconsistent formal system, for the same data, it is possible to derive both a true and a false statement, up to the fact that 1 = 2 and so on. On this basis, Penrose hints that inconsistent systems will always(!) give conflicting results. From here, Penrose follows a very narrow interpretation of chaotic processes, he believes that these are just random processes that, on average, can be modeled by a strict formal system.

In fact, inconsistent systems can in most cases converge to the true result, it is not at all necessary that internal contradictions will immediately dominate and destroy the system. There may be systems in which contradictions are minimized. And even when run on an abstract computer, they will remain non-deterministic, they will be incomplete and inconsistent, but in most cases they will produce a plausible result. Why did Penrose decide that contradictory systems will always be destroyed by their contradictions? Penrose is silent on this...

Further more. As we saw in the previous parts, the processes of our world, both in computers and in brains, are all inherently vague and contradictory. But in most cases they issue correct result. This is due to the fact that these processes consist either of multiple repetitions of similar calculations, or of a large number of similar elements, so that the combination of these repetitions or elements in most cases produces a stable and correct result. This, of course, leaves a very small chance that a small internal contradiction will grow and destroy the entire system. But in most cases, the system seems to harmonize, the elements, acting on each other, minimize internal contradictions. Not all processes in our world are highly harmonized, but there are such processes, and what is happening in computers and in the brain belongs to them. Where does such harmonization come from in our world? following parts. There remains a tiny chance that in our worldviews, in our intellectual activity, we are somehow globally wrong, that there is a small wormhole in our judgments, which can completely turn our whole idea of ​​the universe upside down. But more on that in the next sections.

Initially, human thinking is based just on such processes. There are no long logical chains, no clear rules. Instead, there are short situation-response chains, without long processing cycles. The elements of these chains have a large number of inputs, and inside the element, the input data breaks up into many parallel, duplicating each other, fuzzy paths that give a clear solution at the output. We have called such elements short and broad plausible rules. Such rules do not deal with logical inference, they already "remember" turnkey solution to situations known to them. The mechanism for learning such rules is also far from a clear logical conclusion and is described in the previous parts.

Such processes are good for interacting with the real world, but formal logic is hard for them. Nevertheless, the human intellect can work in the mode of formal logic, can emulate computer calculations. Maybe, but at the expense of much more "heavy" processes. In order to run the calculations of a simple logical circuit, a simple program, in the brain, myriads of short fuzzy rules are involved, which, in their combination, give a result similar to the work of strict logic. And since these rules are not intended for formal logic at all, their number involved in emulating formal logic will be much larger than for interacting with the real world. And therefore, different animals are not capable of logical reasoning; this requires a sophisticated human brain. Although everyday tasks that different animals solve in passing, the computer can not do it.

But such "heavy" processes have an advantage. It consists in the fact that the brain can produce new logical constructions and computer programs with a high degree plausibility, while a simple but effective algorithm can only do its job pointlessly. The complexity of derivative constructions is many orders of magnitude less than the degree of processes initially involved in the brain. It is this difference in complexity that resolves the contradiction that contradictory intellectual processes create true logical constructions. If this difference in complexity is not taken into account, then there is no way to understand where these true constructions come from.

Tasks that require complex logical constructions, a person solves literally by the method of "scientific poke". Namely, come up with some simplest option, run its calculation in the brain, see the wrong moments, come up with the next (not necessarily the correct option), emulate the calculation again, and so on. With good training, such constructions pass into the category of fast automatic actions that do not require the participation of consciousness (and, nevertheless, their complexity is still enormous), typical situations are remembered, and seem that the brain works like a normal computer (according to formal logic), although this is not at all the case.

It also happens when the brain is "cocked" for a long time, "accelerates" to some task, there is a bookmark of initial data, unsuccessful attempts, vague forebodings and longing that the truth is somewhere nearby. And then bang, and a flash of insight, everything falls into place and a new truth is born. It may seem that this truth was born instantly and came from the higher realms. But in fact, the effect is the same, a flash of insight was preceded by a long and hard work that involved, changed and created a myriad of short and plausible rules, tried to somehow combine them together, harmonize them, for the most part unsuccessfully. And now the moment comes when all these rules are already harmoniously combined with each other, merge into a single harmonious process, and all together give out a new truth.

Artificial intelligence, following such principles, may well be programmed on conventional computers. Naturally, this program will initially be aimed at non-determinism and the presence of internal contradictions. While existing computer programs, although non-deterministic and inconsistent, are written modern programs aiming to have less indeterminacy and contradictions in them. Of course, it is better for artificial intelligence to use a more efficient architecture that allows a large number of parallel and interacting processes. For example, quantum or optical. Single-processor electronic computers can also be programmed for intelligence, but they will probably lack the power.

About "heavy" processes and harmonization will be described in detail later, and now let's start designing artificial intelligence.

Building blocks of intellect

Let's start by briefly recalling what we have already come up with in this area and what is missing. All this is described in detail in the previous parts. We remind you of this in order to understand why it is so, and not otherwise. After all, the intelligence algorithm itself is not so complicated, the main thing in it is the principles, you need to understand in which direction to move and what results to expect.

Programming languages. There are procedural and predicate. In procedural languages, a program is written as a strict sequence of instructions, between which there can be conditional jumps.

Predicate languages ​​have a set of independent rules, each with its own scope. An executor in predicate languages ​​checks all rules for compliance with the current situation and applies the right rules, which change the situation (internal state), and thus can build long logical chains from the rules. Naturally, this is more difficult for the performer than the execution of a procedural program.

Procedural languages ​​are good where the algorithm is clearly known and requires fast and effective implementation. Predicate languages ​​are good where you need to store human knowledge, logical rules, and then draw conclusions on knowledge (for example, evaluate various input situations). It is convenient to add new knowledge to them, without rewriting the entire program. There are even modifications in which, after the introduction of new knowledge, the entire knowledge base is brought into a consistent state. Predicate languages ​​(like Prolog) have for some time been considered the future of artificial intelligence.

But the fact is that procedural and predicate languages ​​are mutually expressed into each other and have the same problems inherent in algorithms (formal systems, see above and below).

First, we face the problem of stopping. The algorithm can wander forever in search of a solution, although it may lie nearby, in a neighboring branch. But the algorithm will correspond to a complete and consistent formal system. But there is no sense for us (so far we believe that we cannot do an eternal calculation in a finite time). If we make some sort of pruning of "long" branches, then the algorithm will become more practical, but it will lose its completeness and consistency, it will become not true, but plausible. And here we are not talking about the fact that the probability of an incorrect decision will slightly expand, but that the algorithm will be able to produce radically wrong decisions.

Secondly, the predicate rules that make up the logical units are too "narrow". In natural intelligence, logical units have orders more input conditions and these inputs are processed according to fuzzy criteria. Moreover, with such a representation, knowledge is "smeared", loses its clarity and formalism.

The existing fuzzy logic (there is such a section in science) is not suitable for use in predicate languages, that's why. Any fuzziness, when it meets another fuzziness in logical inference, can generate many alternatives, different logical chains. Moreover, these options can easily grow like an avalanche. The existing fuzzy logic, as far as I know, does not deal with either parallelization of chains or their reverse union. All fuzzy logic does is operate with the same logical expressions, but instead of logical zero and one, it uses a real range from zero to one, and arithmetic operations to combine numbers from this range.

There are variants of "complex" logic, in which, when the implication is reversed, an indeterminacy arises, which is expressed by something like an imaginary unit, and which participates in further calculations, with the possibility of parallelization and merging of chains. But while this topic requires further disclosure.

Thirdly, we do not have an algorithm that could train (create) other algorithms, in the absence of a person, but when there is a set of training situations (a representative set of pairs of correct input and output values).

Pattern recognition systems. Well suited to the role of a logical unit for our artificial intelligence. They are able to classify the input situation well and issue a solution to the output. True, only if long-term processing is not needed, since such systems do not have internal memory (state) and transformation of this state, representing rather a "stimulus-response" reflex. But with the classification recognizers cope perfectly. They can even process complex images (for example, recognize a person from a face image). Methods for training pattern recognition systems are efficient and well known. Trained on a set of known examples, the recognizer can capture hidden patterns and qualitatively generalize the experience to unknown examples.

Learning principles. When the desired (reference) result and the real result of the intelligent system operation are known, it is possible to calculate the error of this system and correct the system so that it works in the right direction.

Methods of correction (training) are accurate (they are also called local), and global. Local methods are able to calculate the error throughout the system, and therefore are fast and efficient. Global methods cannot do this, they randomly change the parameters of the entire system, look at how successfully the change affected the system, and on this basis decide whether to save this change.

Gradient descent method refers to local ones, when its direction can be calculated for an error, and propagated from the input in the opposite direction throughout the system. Such a method, although "only" plausible, gives good results in practice, for example, for training multilayer perceptrons (which are often called neural networks). But it is not always applicable (as well as other local methods), since the structure of the error and the way to correct it may not be known.

But we have global learning methods, a genetic algorithm and annealing simulation, they are omnivorous, but very voracious in computing resources. They can work when little is known about how to correct the error. The genetic algorithm is more efficient, especially if you know something about the structure of the problem being solved.

Scale principle. Means that by repeating similar processes many times or by combining a large number of similar elements, a highly stable (or highly plausible) result can be achieved. A similar element / process does not mean similar on average, it means that elements can contradict and compete with each other, they can be unstable, but in the end they still combine (harmonize) into a solution with a high degree of plausibility. For example, in logical circuits of computers, all elementary particles are unstable, but their half-life is either very long, or the number of particles in a logical element is very large, so that the decay of an individual particle practically does not cause a failure of logical circuits. Another example, in artificial neural networks, a separate neural connection has little effect on decisions, the connections themselves can be contradictory, but in the end, the neural network produces mostly correct decisions.

Let's summarize. We have predicate languages ​​that are suitable for complex reasoning and internal state processing. There are pattern recognition systems that can be used as logical units for predicate languages. There are omnivorous learning methods that we hope to automatically create (train) new algorithms. There is a principle of scale, which, with the loss of completeness and consistency, will allow us to maintain high plausibility of our artificial intelligence decisions.

Intelligence Algorithm

Let me briefly recall the essence of the genetic algorithm. There is such a method - random search. A random solution is generated, evaluated, then randomly modified. If the result is better, the decision is remembered. Then the cycle repeats. This method is used when it is not clear how to calculate the solution "according to science". It turns out very long. And if you start in parallel a large number different solutions? For those of them that are progressing in success (the quality of the solutions is good or improves over time or compared to "neighbors"), we create copy instances, and change (randomly) these instances on the sly. Those of the solutions that look bad against the background of the rest, or do not improve the quality of the solution over time, we subject those instances to more and more random changes or even delete them, and in their place we put newly generated random solutions. Naturally, bad decisions are less likely to be propagated. There is one more operation (also used randomly), when a piece is bitten off from two different solutions, and these two pieces are glued together into a new solution. It's called crossover. The better the solution, the more likely it is to crossover. As a result, one can jump to obtain a solution that has best result than both of its parents. But the opposite can also happen. If the solution turned out better, then it will multiply in the future, if worse, then such a solution, with highly likely is removed. Such a search is most effective when we know the structure of the solution, and apply the operations of random change (mutation) and crossing without shredding the solution by bits, but taking into account this structure.

Due to the fact that the solutions are not just carried out in parallel, but are constantly compared and exchanged with each other, such a search gives a fantastic performance jump compared to random search and is able to grind the most difficult tasks. It would seem that a typical random solution does not represent anything interesting on average, and its efficiency is extremely low. But as soon as a set of solutions begin to interact with each other, an atypical result (a good solution) quickly appears and progresses. This is by the way that Penrose advises to study chaotic processes on the average, to study typical cases, emphasizing that, apart from typical cases, they cannot generate anything, which, of course, is unfair. Such a search is an illustration of the principle of scale, one of the typical harmonizing processes.

This is what is called a genetic algorithm, which can find effective solutions in a variety of areas, when it is not even known how to search for the correct solution "according to science". Or there is no "scientific" way at all, as in the case of automatic writing programs. The effectiveness of the genetic algorithm is primarily evidenced by the fact that life on Earth (and then the mind) appeared precisely according to such principles. Why such a harmonization process is possible is the subject of the following sections.

There is such a direction in artificial intelligence - genetic programming. Each solution is not a set of parameters, but an entire program written in a procedural programming language. With all its loops, conditional transitions and internal state in variables. Accordingly, the result of the decision is the result of the execution of this program. In order to create a program, a genetic algorithm was used, which, from a large number of randomly generated programs, created a program that the best way solving this problem. In the article that I saw, the task was to control the steering wheel of the car. Those. the result of the decision is not the only response to the input, but a process extended in time. The genetic algorithm succeeded and created a program that correctly controlled the steering wheel. The task is not all that complicated, they do something similar on neural networks (although there is still some internal state there, and the rules for the interaction of the state with the network are written by a person). But it is indicative that the program was automatically created, which has an internal state, different cycles and branches.

Unfortunately, I did not follow the state of affairs on this topic further, and I can’t tell anything more. Those who wish can search for the phrase "genetic programming". Therefore, further we go beyond the limits of what has been studied and enter into the area of ​​assumptions. It is quite possible that some of these assumptions are already known, and I am inventing a bicycle. But it's still interesting. :)

Let's see what properties the programs obtained using the genetic algorithm have. Such programs may have infinite (or very long) loops, so the suitability evaluation should discard programs that run for a very long time without returning visible result. The move, in general, is correct, but, unfortunately, it throws out potentially interesting long logical chains (how else will they be taken into account). Further, when crossing the branches of the program, they will thoughtlessly shred, generating often meaningless code. And if for a simple task this is not such a problem, then for more complex tasks, either a large number of unsuitable solutions will turn out, because the smallest change can completely ruin the program's performance, and probably, in the end, there will be little sense. Or the program will have a large number of redundant branches, "junk", which will be combined in an unimaginable way with each other in the right solution. This "trash" in the process of evolution to learn to survive the changes, so that the change does not break the program fatally. But in any case, we have to say goodbye to the idea of ​​"thin" logical chains, which would be the same clear programs that a person writes. What happens as a result of automatic writing of programs will be far from such chains. Of course, data mining algorithms will appear that will be able to minimize this contradictory heap into a clear algorithm, but this clear algorithm, for further improvement in automatic mode, it will be necessary to return it back to the "smeared" view (or the smearing will happen by itself, in the process of further learning). And there is a suspicion that the algorithm pulled out with the help of data mining will have a narrower "outlook" than its original, "smeared" version. A similar phenomenon was described in the previous parts about pattern recognition.

As we remember, predicate languages ​​are more flexible to changes and adapted to recording human knowledge, because they do not consist of a rigid framework of a program, but of independent rules that automatically work when a suitable situation (conditions) occurs. The genetic algorithm works more efficiently if the operations take into account the structure of the solution. Writing in procedural form causes the genetic algorithm to thoughtlessly shred the program, generating many inoperable options. Therefore, we write the program in a predicate form, and adjust the genetic algorithm so that it takes into account such a structure. Namely, different solution programs will be able to exchange not pieces of bits, but entirely independent rules. Random changes will work at the rule level. Even more than that, within one program, you can have a different number of rules, no matter in what order. And these rules can be very like a friend on the other, and completely different. You can multiply and cross not only the programs themselves, but also the rules within one program. And all because when the program is executed, they themselves will line up in the correct chain, because the performer does not stupidly follow the branches of the program (as in a procedural language), but selects the rules according to the current situation (each rule changes the situation).

But the most interesting thing would be to make the bank of rules common to all programs. The program in this case would be data about which rules it prefers from the general bank, and it is possible that information about the preferred sequence of their application. In this case, performance criteria can be applied not only to programs but also to rules. After all, each rule contributes to several different programs, and you can calculate how many of these programs are successful and how many are not. And on this basis, draw conclusions about the effectiveness of the rules, and accordingly evolve not only programs, but also the rules (i.e. multiply, cross, randomly change the rules themselves). Efficiency gains come from the fact that similar rules are no longer duplicated in different programs, each program has access to a wider bank of rules. But most importantly, the rules are evaluated jointly, with cross-use in different programs, which (presumably) dramatically improves the quality of the evaluation and evolution of the rules.

So we got the simplest version of artificial intelligence, quite applicable in different games, incl. computer, in expert systems and process control systems. This is also suitable for modeling black-box processes with internal memory, instead of Markov models (these are processes that can be seen at the input and output, but the internal state and processes are incomprehensible, a black box in our opinion).

Here a logical proposal may arise so that the genetic algorithm can separate parts of the program into independent subroutines, and take into account their structure when changing the program. For procedural notation, this can increase efficiency, but still does not eliminate its inherent disadvantages, because it will still need a rigid sequence of instructions, conditional statements and loops, which an accidental change can break. In the predicate notation, procedures as such do not exist at all. But on the other hand, it is possible to split the global situation into a set of hierarchical situations and into a sequence of situations, so that only its own set of rules deals with each sub-situation. In the short term, such a partition should, as it were, increase the efficiency of the genetic algorithm. But the fact is that in a real intellect the interaction of rules is more complex, to which such a division is both characteristic and not characteristic at the same time. Therefore, by imposing such a breakdown of the situation, in the short term we can benefit, but in the future it will interfere. More about this will follow.

Artificial intelligence version 2.0

In predicate languages ​​(such as Prolog), there is no sequence of steps in a program. There is only a set of rules, and the sequence of execution of these rules is not initially set in any way.

It looks like this:
rule n: result if condition;
rule m: result if condition;
etc.

The condition can be quite complex, including both simple expressions and other rules, including mutually recursive application. The result of the execution of a rule is also a complex condition that can denote both the final situation and part of the condition for checking the applicability of other rules (and itself too). These results are clear and unambiguous.

There is no global situation during the execution of the predicate program. There is an initial condition under which the interpreter searches for the first rule with a suitable condition, the result of this rule is added to the initial condition. The search for a suitable rule for the new condition is repeated. The result is a chain of inference that can lead to a rule that indicates the achievement of the final result. If the interpreter exhausts all available chains, then it will start to roll back, for each loop looking for the next appropriate rule building new chains.

Again, note that the state (situation) is the initial condition plus the chain of applied rules. When rolling back, the chain of applied rules changes accordingly.

When linking rules into chains, the interpreter may well fall into an endless search, although the solution may be nearby. Therefore, if the interpreter is "smart", then it can either recognize places of potential looping, or it can lead several decision chains in parallel, choosing from them the one that will lead to the final situation faster. On the other hand, whoever writes the set of rules must take care to minimize the chance of a loop.

For example, a task about missionaries and cannibals, when a crowd of missionaries and cannibals must be transported on the same boat to the other side. The boat can only fit two people. If there are more cannibals on the shore than missionaries, then the missionaries will be eaten. When solving the problem in a predicate language, admissible situations are written so that the missionaries are not eaten (including recursively) and admissible movements of the boat (there should always be one or two people in the boat). Further, the interpreter itself builds a tree of feasible solutions until it comes to a situation where the entire crowd is on the other side. The chain of rules in this case carries the sequence of transporting missionaries and cannibals across the river.

Since the rules are clearly linked to each other in the process of finding a solution, in predicate languages, a hierarchy and sequence of their application is already set at the level of rules, something similar to grouping into procedures in procedural languages. But as soon as we make the chain between the rules less clear, this grouping is lost. And how exactly it will arise anew (or how to help it form) is a new question.

In predicate languages, there are no loops, no conditional jumps, no hard-coded sequence of actions. The performer always "knows" what to do, because he selects the next step for the current situation. The performer selects only one step, because at the next step he will evaluate the changed situation and choose a new step, no matter how unexpectedly the situation changes. This will also help the performer to extricate himself if part of his program has failed or made the wrong decision. Instead of falling catastrophically in the wrong direction, the performer will reassess the situation, and, quite possibly, the next step will improve this situation.

Of course, the processes involved are much more computationally hungry than the procedural writing of the program. And in its original form in predicate languages, not everything is as smooth as described in the previous paragraph.

The disadvantage of predicate languages ​​is that the scope of the rules is very narrow, and they line up in too long chains. In intelligence, on the contrary, short chains of inference prevail, in which the logical unit evaluates a very wide array of input conditions, moreover, according to fuzzy and non-linear criteria.

Therefore, the next step in building artificial intelligence is to replace narrow clear rules with fuzzy and wide ones, and make the inference chain shorter.

First, let's make the global state of the program (a regular array of numbers). Part of this array is the input data. They are updated regularly from the outside. Whether to allow the program to change them is not a matter of principle, the main thing is that they are regularly updated. Part of this array is the internal state of the program. The rest is the output. The inner and output cells differ only in that the solution is read from the output cells. Both inputs and outputs are always used to write/read the same parameter. For example, input #1 - speed, input #2 - fuel sensor, output #3 - steering wheel position change, output #4 - speed change. We assign numbers arbitrarily, in the process of learning the program itself must learn to understand where the input and output are.

For the basis of the rule, let's take, for example, a multilayer perceptron (which is often called simply a neural network). Note that the training algorithm for such a neural network inside the program is still unknown to us. We will have many such neural networks. Together they will make up a set of rules for the program. Each neural network receives the entire global state of the program as an input (the number of inputs is equal to the number of state cells). The neural network has few outputs. Each output also corresponds to one of the state cells. At each iteration, what happened at the output of each neural network is added to the global state (output values ​​can be negative). All networks are polled simultaneously based on the current state, and with their total impact create a new state.

The number of outputs and their binding for each network is initially chosen randomly. We construct evolutionary changes in such a way that in most cases they leave the switching between the global state cells and the inputs/outputs of the neural network unchanged. And only with a small probability of change can rearrange the input or output of the network to another cell. This is necessary because while we consider that each cell is a certain parameter (even if it is internal), therefore, if we switch the input / output of the network to another parameter to which it is not accustomed, then the result will be rather unsuccessful. Unfortunately, with this maneuver, we again lose some interesting properties real intelligence, but we get efficiency right now. We will return to these properties later.

In the process of evolution, the number of global state cells can also be changed. Then all neural networks are adjusted accordingly. If the cells are cloned, then the corresponding inputs and outputs of the neural networks are cloned. If cells are removed, then the corresponding inputs/outputs are removed from all networks.

There are also evolutionary changes that can increase or decrease the number of outputs of an individual neural network.

How exactly will a program consisting of a set of such neural networks produce a solution? More precisely, how, after the next iteration, to understand that the program has made its decision, and to read this decision from the input cells? This is also an interesting question that requires experimentation. The first considerations are as follows. The output values ​​stabilize. Or there are special outputs that signal that the answer is ready. These outputs are self-adjusting in the process of evolution.

After removing the decisions, the program should continue to work, and, most likely, starting from its internal state. How to push it to further work, because the program has stabilized in a specific solution? First, after the decision is removed, the input cells will be overwritten with actual data (we assume that while the network is making a decision, the input data has not changed much). Secondly, you can have a special input cell in which to place a large number at the start of the iteration. Further, there is either to learn how to change this number itself, or you can reduce it from the outside, letting the network know that time is running out. In general, there are enough ideas for experiments.

Now for an explanation as to why.

First of all, we abandoned building chains of rules, and forced each rule to write the result of its work to the global state. By doing this, we made it impossible to build long withdrawal chains and reverse rollbacks, but we got a faster response and a broad and fuzzy assessment of the situation. Note that the parallel processing of several options, where each option has its own global state, has not gone away. But we do not have such a wide branching as in the original predicate interpreter. If we tried to branch inference chains on broad fuzzy rules, then the number of options would be catastrophically off scale even at the early stages of constructing a solution.

As a result, we got something completely different, although it seems to be similar to the original output on predicates. This something is no longer capable of constructing complex and clear conclusions, but on the other hand it is able to act in a complex, rapidly changing environment, and moreover, it has some rudiments of a plausible logical conclusion, which the original version cannot do. The conclusion of complex and clear conclusions will still return to us, though in an unexpected way, but for now the resulting intelligence will be without it.

However, the resulting something can solve logical problems (like playing chess) in its own way, similar to how a person does it. Such thinking can be called situational. It does not start from building long logical chains, but from an assessment of what the current situation is and how to change this situation. The situation includes both external data and internal state, what the system has "thought up" by the current moment. Evaluation and decision on where to move on is made on everyone step, as opposed to procedural algorithms and inference, which can burrow into lengthy reasoning after falling out of reality. Therefore, random changes to such a program will not be fatal for performance, unlike procedural recording. Even if it finds itself in an incomprehensible situation, or makes a mistake, such a program will not fall into a stupor, but will try to do something, because not a small branch of the algorithm, but the entire set of rules, takes part in the assessment of the situation. And even if the program initially makes chaotic throws, sooner or later it will fall into a familiar situation and be able to turn it in the right direction.

Situational thinking is based on three things. The first is the generalization of the situation to a familiar case (as in pattern recognition systems). For example, from the diverse arrangement of pieces on the chessboard, the thinking system finds something in common, assesses the situation, whether there is a threat to its pieces, whether there is a chance to attack without loss, then there may be more specific situations-combinations. The second is experience (a library of short, plausible rules that are applied to change a situation for the better without a long logical conclusion). Based on an assessment of the situation, alternatives are proposed to change this situation, for example, approximate data on how to move the pieces. The parser translates this approximate data into the correct movements of the pieces on the chessboard (if the correct movements cannot be found, then the next alternative is taken). Similar situations (and, accordingly, solutions for them) can occur at any stage of the game, and we immediately get a solution for them, without a long enumeration of options for various moves. Yes, these options are "only" plausible, but they include a lot of experience from real games and are quite applicable to new games. Moreover, these situations include some knowledge of how the game will develop many moves ahead, but not at the level of moving pieces, but at the level of changing the tactical situation (which may include endless cycles, such as maintaining balance to achieve a draw) . And if, nevertheless, they lead to a loss, then the library will be supplemented with new rules that will work in their situation. Thirdly, this is an internal check of a probable solution several steps ahead (i.e., come up with something, and then estimate how well this something will change the situation, maintaining several alternative solutions, our system is not yet able to do this, it gives only one option, but there will be more about that).

By the way, when studying neural networks, did you think about how to make them work not only based on input data, but also teach them to digest the internal state and execute complex programs? I was interested here. True, then a long time ago, I never came up with anything worthwhile, how to train such a network. But now there is a slightly different answer.

Why did we make many neural networks, instead of one big one that could update the entire state? The fact is that for the efficient operation of a genetic algorithm, it is desirable to have a set of independent rules, each of which would be responsible for some specific (albeit hidden from our understanding) action. In addition, such rules can be exchanged between programs, made into different sets of programs, modified and cloned individual rules, and even form a library of the most successful (or even in the future library) rules. It will be difficult to do this with one large neural network. In addition, for conventional neural networks, a collective of networks tends to perform better than a conventional neural network.

For similar reasons, each neural network has only a small number of outputs. Those. each rule is competent for making its own small decision. But at the same time, each network has inputs from the entire state, with the expectation of having a global vision of the situation, but at the same time not responding to most cases that are not related to the scope of a particular rule. Each network must learn this in the process of evolution. Therefore, even when affecting the global state, exactly those rules that are applicable in the current situation will work out. It may well be that the number of inputs should also be limited, I have no thoughts on this matter, only an experiment will help here.

As a result, after training, a program consisting of a set of neural networks should be obtained. The program starts from the initial state, in which the input cells are specified, the remaining cells can be set to zero (or have small random values). At each iteration, the global state is fed to the input of all networks, the result of the operation of all networks is calculated, and the output of all networks is immediately added to the global state. The next iteration is coming. You can understand that the solution is ready, for example, by the fact that the output values ​​have stabilized, or a signal has arrived at a special output signaling that the solution is ready. After that, the output values ​​are read, new input values ​​are loaded, and the program continues to work with the updated data.

This program is created automatically using a genetic algorithm. The main thing is that we have at least some criterion for evaluating the effectiveness of the programs received (ie, is one program better than another), and this is enough for the genetic algorithm to work. For real world problems, there is usually such a criterion. It can also be sets of examples of work that are considered good and bad for different situations (pattern recognition systems also learn from examples). Having learned from known examples, the program, like image recognition systems, will be able to generalize its experience to unknown examples, including such generalization can be of a qualitative nature, catch hidden patterns in a set of examples and draw unexpected (but correct) conclusions. For tasks that require an accurate logical conclusion and a clear solution, this is more difficult (but there will be more about that). There may also be options to make programs fight each other, for example, to play chess, and recognize the one that plays better as effective, then external evaluation is not needed.

The genetic algorithm randomly generates a set of rules (neural networks) and a set of programs. All rules are located in the shared storage. Each of these programs consists of its own specific set of rules taken from a common repository. The rules themselves are in the repository, the program only refers to them. To evaluate the effectiveness, all programs are launched in parallel (each has its own state and a set of inputs and outputs). The best scores are given to those programs that work faster and more efficiently. Those programs that think for a long time, or do not make a decision at all, are penalized.

bad programs from more likely changed or completely removed. In their place come either newly generated programs or cloned ones from existing ones. Program evaluation can be cumulative, i.e. accumulate, giving the program some reprieve to evolve. Good programs are more likely to be cloned. There is an evolution of programs from bad to good.

After a sufficiently good solution is reached, the best program is selected as a result of the work of the genetic algorithm, and in the future this program is used for real problems.

What evolutionary changes can programs be subjected to. Add or remove a rule from the repository. Crossing with another program, namely, two programs are taken, on their basis a third one is created, which consists of part of the rules of one program and part of the rules of the second program. The rules to be added, removed, or written into the program when crossing are chosen randomly. Although, if you think about it, maybe there will be ways to do this more purposefully, maybe there will be an assessment of the effectiveness of the participation of the rule in a particular program.

What evolutionary changes can rules be subjected to (neural networks). As already mentioned, one of these changes is a change in the number of internal state cells, which affects all rules. The need to increase or decrease the number of state cells can be more or less estimated by the dynamics of the program, how often the states change, how much they correlate with each other, how much they affect the output values, and how efficient the population of programs is in general. The next evolutionary change is rule cloning, a random change in rules (namely, "shaking the weights" of the neural network, as in annealing, the lower the efficiency, the stronger the shake). Cloning together with the subsequent change in the rules can cling to the cloning of programs. For example, in the original program, a link to the original rule remains, in the cloned program, a link to the cloned rule. Or in the source program there is a link additionally to the clone of the rule. Rules can be crossed when pieces are taken from two neural networks and glued together into a third network. In the rules (neural networks), the number of outputs can randomly change, as described above, the number and structure of internal connections can change.

For each rule, you can calculate its effectiveness, based on how successful the programs in which this rule is included. Moreover, it is possible to take into account the fact that a rule may be included in the program, but be inactive, and therefore not affect the operation of the program. Based on this assessment, we can directionally evolve the rule bank, namely to multiply successful rules more often, and more likely to remove or change ineffective rules. We can also create programs based on the most effective rules. Or in the process of change more likely to include in programs best rules. Note that the bank stores rules with different scopes, but, nevertheless, solving a common task.

But the most interesting thing, it seems that for each rule, you can calculate not only the efficiency, but also the error! Namely, to understand how this rule should act correctly under given input conditions. After all, we have examples of triggering rules (neural networks) in good programs(we believe that these were the correct decisions of the rules that make up the program) and examples of work in bad programs (we think that these were the wrong decisions of the rules that make up the program). Accordingly, one can try to amplify the good decisions that each neural network produced and minimize the bad decisions. The values ​​of inputs and outputs can be reproduced without problems, and based on them, a training sample can be built, which can be sent to the error backpropagation algorithm. The main problem here is to expand the time sequence of what was at the inputs and outputs of the training sample, and here there can be ambiguities. After all, we cannot assume that all solutions (input-output pairs) in the correct network were ideally correct, and in the wrong one - ideally incorrect. Maybe it's the fault of a completely different rule, which at the very finish line "wiped" the correct decision? Getting involved in the unfolding of the entire sequence of decisions is a hopeless undertaking. Therefore, you will have to think about the formation of a sample based on these time sequences. And even if, when forming the training sample, we throw out many examples, leaving only the most unambiguous ones, it will still be progress.

Let's see what we have now. And now we have a tool for automatically writing programs that can navigate real-world problems, act flexibly in a wide range of situations, recover from errors, have some kind of internal logic and predict / model the situation. What they cannot do is develop subtle logical chains, make long inferences. Although for many tasks, such intelligence will be able to pretend that it had deep logical processes, although in fact it only applied the blanks obtained during the training. Such intelligence and independence are not enough, a person still needs to do a lot in it. And in the hardware part, what we got is not quite similar to what nature came up with.

Artificial intelligence version 3.0

Now let's add a thing called an environment emulator. We will need two varieties, one for emulating the external environment, the second for predicting. There will also be a third variety, but more on that later.

An emulator in predictive mode must be able to give the expected behavior of the environment a small number of steps ahead, knowing the history of previous states and the current impact on the environment from the control program. Now the program will not immediately act on the external environment, but first on the emulator. And on the emulator, you can see the forecast, whether the environment has changed in the right direction from the impact of the program. Therefore, you can have several instances of programs trained in a similar way, but different from each other. For each of them, start your own environment emulator in real time. And at each step to the external environment, give the impact of the program that will receive the best rating on the emulator. Another option is to make the decision (not necessarily the best one) that the program team will make by the "majority of votes", then this decision will be distinguished by reliability.

The emulator in emulation mode is similar to prediction, but is used in the process of training programs when there is no real external environment. In the previous version, we took ready-made cut-up examples taken from the external environment. So, instead of these examples, you can create an emulator trained to recreate typical situations external environment. After all, there can be a lot of examples, and it is more efficient to use a compact emulator instead of this breakthrough.

You can put the emulator in training mode on real sensors and leave it for a long time. A logical question is why not immediately put the right program on the sensors for training? There are several answers here. First, we may want to train the next version of the program, and then we again need to drive real devices. Secondly, on real sensors it is impossible to experiment whether the program learned correctly, or such experiments can be expensive, and the emulator can work in predictive mode.

In addition, the emulator can be configured to, firstly, issue random deviations from the behavior of the environment, and, secondly, combine different time sequences from the behavior of the environment. Since the emulator is trained on the external environment, such combinations will be "invented" plausibly. Which expands the set of examples for training programs.

Naturally, everything that happens in real time can again be recorded and used for automatic additional training of programs.

Programs for the emulator can be made using the same technology as described above.

If the external environment is very complex (as in a game of chess), then the emulator will be built using technology very close to the control program itself. Up to the point that when learning, the programs will play with each other, and the strongest program will survive. The prediction emulator can be customized to not only look for the best move, but also to adapt to the opponent's playing style. Thus, when playing with a human, in the "brain" of the machine there will be a whole battle between many programs and their opponents, emulators, before making a final decision.

Thus, by using an external environment emulator, we improve both the quality of program learning and the quality of decision making in real time.

Is there such parallelism in natural intelligence, with competitive decision making and an environment emulator? I think so, but hardly in such a direct form. What was formed in the course of natural evolution is certainly more complicated and more efficient. But here, for simplicity and the fastest achievement of the effect, we introduced competition and the emulator artificially.

After the introduction of an environment emulator and a competitive collective solution, it is possible to introduce new properties of our artificial intelligence.

Wandering background solutions. After the decision is made, it is not necessary to force all programs to reorganize to reflect on the next step. You can leave some (good by some criteria) programs to continue to think for a long time on past situations. It may well be that they still think of something, and this something will turn out to be useful either in the current (even if it has changed) situation, or it will turn out to be useful for further learning. For example, a program that lags behind can decipher an enemy's intent or find an interesting tactical solution. Then it will be possible to try to turn the current situation to this solution (artificial intelligence "changed its mind"), and if the game is already over by that time (or has gone in another direction), then the solution found can be used in training. How exactly both of these options are implemented is the subject of separate studies. At the same time, artificial intelligence will be “online” all the time, and will think, improve itself, conduct dialogues with itself, almost like a person.

Explosion of options. You can try to detect situations when a separate program (or a group of programs) is at a fork, when an ambiguous situation is detected, and this situation is forcibly branched into new decision branches (program + emulator). Again, how to detect such situations, how to branch them is the topic of separate studies, an unplowed field. So far, this is only at the level of an idea, the idea that in case of ambiguities, the intellect should be able to branch out options. But branching is not the same as traversing a decision tree. It's more like spreading the wave function, like complex arithmetic, when operations with ambiguity (an imaginary unit) give out several options, which subsequently interact with each other according to the rules of the same arithmetic. Also, branched solutions in artificial intelligence must continue to exist together, continuing to communicate with each other (how exactly is also a question), and in right moment branches can converge into a single solution. Moreover, branching will not occur stupidly, as in the case of enumeration of options, but precisely at those moments where it is most interesting.

How exactly can a potential branch point be detected? For ordinary neural networks, there are algorithms that increase the capacity of the neural network if it is not enough for data processing, and reduce the capacity of the network if it is excessive for decision making. Changing the capacity in a neural network is the addition and removal of weights-connections between neurons and the neurons themselves (read adding / deleting rows in a non-linear matrix and zeroing out elements that do not affect the solution). There is a whole trend about neural trees that grow as needed. So in a team of programs, you can check what different programs "think" about, minimize similar situations and try to generate new directions of "thought". In assessing this, first of all, emulators will help us, we need to look at how similar they give out a vision of the external environment.

You can also check individual programs to see how unambiguously they give a solution. If the program wanders between several solutions, or does not converge to a solution, then you can throw additional programs on this situation, initialized by the same situation, but with random deviations, in order to stimulate the "course of alternative thoughts". Branching can also be useful in training, when it will be possible to determine how ambiguous the program is in the solution, and to divide ambiguous cases into several more unambiguous programs so that they work more successfully together in a team. But then again, all these are just beautiful ideas, ideas for experiments.

They will dream

It's good when we have examples of correct behavior, or we can emulate the reaction of the environment. For natural intelligence (and the future of artificial intelligence), this luxury is not always available. The intellect is trying to do something with the environment, something comes out of it correctly, something doesn't, something remains with incomprehensible consequences. How can you learn from this?

To do this, we introduce the third type of environment emulator. It will remember the manifestations of the external environment, what the artificial intelligence did in response to these manifestations, and what it led to. This does not exclude the possibility that, as experience is gained, such an emulator will be able to combine the two previous varieties - emulation and environment prediction, and will be built on principles similar to our artificial intelligence.

How to learn when there is no clear information about what actions are right and what are not? A small digression. Hopfield networks are trained on examples by "summarizing" all examples, without critical evaluation, without error correction. The trained Hopfield network on a partial or noisy image in the process of iteration (convergence to the energy minimum of this image) can recreate the original image. So, after training in the network, sometimes false images are obtained. In order to eliminate false images, training examples are launched at the input, and if the network converges to a false image, then such an image is overwritten. In some way, the network "dreams" based on previously received information, and in a dream, false information is replaced by correct information. Hopfield networks are used for simple images, but we are interested in the principle.

We can go the same way here. After the accumulation of information from the external environment, the intellect from the external environment is turned off and works only with the emulator. The emulator reproduces situations, and if the intellect gave a good solution, then this solution is strengthened, if it is bad, then this solution is replaced by something else. For random, for example. The main thing is that the new solution does not look like a bad one. At the same time, we build changes in such a way that the accumulated good solutions are not lost, and new bad solutions do not appear.

At a minimum, such a rearrangement can be carried out using a genetic algorithm. It is possible that cross-evaluation of each rule that makes up the program is also possible, so that it will be possible to accurately calculate the error and correction for each rule. After all, we have some information about whether the program worked well or badly. It is more difficult here in the sense that if the decision was made by the team of programs, then the information about the correctness of the decision is known only for the winning program. But on the other hand, we have information about the behavior of programs, which is long in time, and it is already possible to extract details from it.

So it turns out that if artificial intelligence is put into natural conditions, then long phases of wakefulness will appear, during which information is accumulated through trial and error, and after them phases of sleep, during which this information is qualitatively digested. This process itself will turn out to be long and painstaking. In natural intelligence, such a mechanism, having once appeared in the process of evolution, quickly showed its usefulness and multiplied for subsequent generations. The thing, as it were, is not very complicated in order to appear during evolution.

They will feel pain

Another training option is when information about the correctness of actions is not available. Let me remind you that when learning by the annealing method, random changes in the parameters are used throughout the entire solution to rebuild the solution. The strength of such changes (temperature) starts high, and gradually decreases as the solution converges to the best option. If the changes do not suit us, the strength of the changes (annealing temperature) increases in search of a more suitable option.

Therefore, in the process of evolution, a pain mechanism has been developed. Wrong action - and we instantly feel how our neural connections are devoured by a cruel flame. Such a shock does not go unnoticed. The consequences of the wrong action are literally burned into our neural connections. So much so that we avoid by all means the repetition of these wrong actions. The mechanism is simple but effective.

In artificial intelligence, learning can be complemented by a higher rate of random change, a higher rate of goal-directed change if the intelligence produces bad decisions. Such additions can be applied both at the level of a team of programs, and at the level of individual programs or rules. "Bad" rules or programs can be literally burned as a result of incorrect actions, at the same time good rules and the program will be preserved and multiplied, but they will be "afraid" of wrong actions like fire.

At a higher level of intelligence, "pain" will also manifest itself in terms of the fact that "the head is splitting from ideas", "it is impossible to collect one's thoughts", etc. The state of a good decision will be accompanied by clarity of thought, harmony and "peace of mind."

Ensembles of rules

Imagine that in the process of evolutionary learning, some part of the rules is cloned so that they remain interconnected with the parent instance. For example, they remain connected to the same inputs and outputs, being activated almost simultaneously. At the same time, the rules are still different from each other, and their further evolution can go in different ways. Now we will make sure that this part of the rules remains mostly interconnected during changes, i.e. so that outputs, outputs, activation, and issued solutions are largely the same. In this case, the rules of the ensemble can participate in other places, including in other ensembles.

It turns out that the ensemble makes decisions jointly with all its rules. Due to the collective work, the solution will turn out to be better (at least, this is how it turns out for conventional neural networks). But at the same time, the ensemble will also have some new, qualitatively different representation of the situation, will be able to act far beyond the initial situation, due to the fact that the rules from the ensemble participate in other places (and ensembles) of the program (and this is already an assumption, concerning the ensembles participating in the program). It can be said that the vision of the situation will be more high-level, more generalized, more expanded and enriched with the experience of other situations. Thus, in borderline or novel situations, the ensemble will be able to generate (one might even say fantasize) new reality, continuing in meaning the old one, the one that he saw during training.

Here such things as abstractions-categories, associative thinking begin to slowly appear. "Heavy" logical chains begin to appear, which, due to orders of magnitude higher vision of the situation, can produce "thin" logical chains that are close to formal logic and strict algorithms.

When faced with new situations, the program, having such ensembles in stock, will be much easier to transfer and generalize its experience to new situations.

All this is still beautiful assumptions, but, apparently, things are going in that direction. How exactly such ensembles were formed and maintained in the brain is still unclear (but you must admit that evolutionarily it looks quite simple). How to support and lead ensembles in the program is also a question. Apparently, there is no clear fixation of rules for ensembles, everything is decided dynamically, by combining rules that are compatible for a specific situation, all according to the same principles of competition and interaction. At the same time, rules and ensembles can also dynamically line up in a hierarchy, and, nevertheless, there will be no permanent hierarchy, there will be no meta-levels and transitions between them, there will be no clear formalization with the allocation of system-subsystem connections. More details about how this can be formed will follow. And you must admit, from the point of view of the natural course of things, everything looks quite simple.

And again waves, frequencies and holography

In our artificial intelligence program, the division into competing programs and an alternative vision of the situation is artificially set. In the same way, a global state is artificially introduced, to which all rules have access and a clock generator, which at each new moment activates all the rules and updates the situation. But there is nothing like that in the brain, and yet something very similar formed there naturally.

What we have is the property of a natural neuron to accumulate potential, and when the critical threshold is exceeded, to discharge a series of signals through its outputs (remember about frequency neural networks in the previous parts?). These series, in turn, each time raise (or lower) the potentials of the neurons that are connected to the inputs of the original neuron. Threshold potentials, frequency and duration of discharge are parameters that are apparently tuned in the learning process.

So it turns out that neither a clock generator, nor loops and conditional transitions, nor a global state and forced parallelization of the situation are needed.

Collectives of such neurons already carry an internal state, complex iterative logic, and conditional processing.

Moreover, bundles of neurons can be combined into alternative (parallel) chains, each with its own vision of the situation, and at the output, all these chains compete with each other for whose decision will be submitted to the output. And such processing is quite amenable to modeling on conventional computers. Another thing is that at first it is likely to be more resource-hungry than the forced parallelization intelligence model, but in the long run, a less regulated model is likely to be more efficient.

Now let's talk about ensembles and meta-ensembles. It turns out that entire wave fronts roam through separate ensembles, which, in combination on different ensembles, give complex wave patterns, maybe even more complicated than holographic images. It is these wave patterns that dynamically link individual neurons (or network rules in our artificial intelligence) into ensembles and meta-ensembles.

See how it all came about naturally, yet pragmatically. No need to invent separately any frequency and holographic networks, no need to torture them and force them to recognize images. It is enough to apply an effective and natural course things, how all these frequency-holographic properties manifested themselves as a side effect.

The initial situation, once in the brain, is split into many alternative chains, causing whole storms of waves of changes in neuronal potentials, and as a result, it receives a much more complex and qualitative representation. At the output, all this processing again collapses into narrow limits that need to be issued during external world.

Associations, categories, generalizations and other philosophies

In the section on ensembles, we mentioned that it would be nice for the rules to participate in different places in the program, which is why the rules learn to qualitatively generalize experience from completely heterogeneous situations. Namely, they will be in the path of high-level abstractions such as associative thinking and categorization. For example, they will be able to take something in common from the concepts of "white" and "fluffy" and apply it in the situation of "flying". Such processing will make thinking much more powerful and will allow dynamically building ensembles of rules for completely heterogeneous situations.

To obtain such properties, we artificially introduced ensembles and their maintenance. In what other ways can you get properties that allow the rule, being trained for one situation (concepts), to take part and retrain in completely different situations (for other concepts, as in the example about white / fluffy / flies)?

So far, there are two options.

Option one, dynamic combination of inputs and outputs. Remember, at the beginning we put a strict correspondence between the inputs and outputs of the rules (neural networks) with the cells of the global state? The evolutionary changes were set up to change these correspondences as little as possible. In the next option, without global state, inputs and outputs different rules-networks are also rigidly connected to each other.

Now let's allow inputs and outputs to change position relative to each other in the course of work and in the process of learning. There are two questions. First, how to determine how compatible the elements of the resulting combination are, how effectively this compound solves the problem? Secondly, how to quickly find compatible / effective combinations of inputs and outputs, because there are a lot of combinations?

The simplest option is to match each input and output of the rule with a sign-compatibility that changes evolutionarily, and maybe there is a way to fine-tune this compatibility more accurately, in the learning process, based on the results of the rules. (Can the compatibility of outputs be calculated during rule execution? Would that be efficient?) For inputs and outputs to the external environment, a set of compatibility will also be needed, which will be part of the general set. When the program is running, the rules will be connected only taking into account the compatibility of inputs and outputs. The task of selecting such compatibility is not easy in terms of computation, but still not entirely difficult. Perhaps the algorithms of Hopfield networks, which can do similar things, will help in this selection.

The next option is to combine the inputs and outputs of different rules in different ways during the learning process and accumulate information about the effectiveness (compatibility) of different combinations. In real work, proceed as above - combine the inputs in accordance with compatibility.

The previous options are suitable for the implementation of artificial intelligence, but this combination of inputs and outputs in natural intelligence does not seem to exist. But there are feature maps, see about convolutional networks and neocognitron in the previous parts. And such maps seem to exist in natural intelligence.

The meaning of feature maps is this. There is a set of rules, there is a set of input cells. Each rule scans input cells using a moving window, and all cells from the window fall into the input of the rule. The result of the rule operation is written to the cell of the feature map corresponding to the position of the window on the input cells. As a result, for each rule, a feature map will be obtained, in which the places of the best triggering of the rule will have the highest values. In the next layer, all feature maps make up the input for a new set of rules, which again make up their feature maps. Rules can be learned by backpropagating the error. How to teach such rules as part of the program is an open question.

Feature maps performed well in image recognition, distortions associated with changes in scale, angle, rotation, and deformations specific to the depicted object.

Therefore, feature maps are a good candidate for experiments on dynamic combination of inputs and outputs for the rules that make up the program.

Option two, frequency combination of inputs and outputs. In this variant, it is not necessary to rearrange the inputs and outputs. In frequency neural networks (or in programs built on such networks), each neuron is both a simple frequency filter and a simple frequency generator. Moreover, this filter can be tuned to different harmonics at the same time (due to which the capacity and capabilities of frequency networks are higher than those of conventional networks). Similarly, any combination of neurons is both a complex frequency filter and a complex frequency generator. (In our artificial intelligence, such a neuron is the equivalent of a single rule represented by a small neural network).

Therefore, signals related to completely heterogeneous entities can roam through the same neurons at different frequencies. But since different frequencies affect the same neurons (combinations of neurons), different frequencies (and the entities processed by them) affect each other. Therefore, if our intelligence is built on the principles of frequency signal processing (as mentioned above in the section on frequencies), then this intelligence seems to already have the ability to generalize heterogeneous entities and to some philosophical abstraction. Although, there may be additional technical solutions that will speed up the formation of such generalizations in frequency networks.

And a little in conclusion. Such ways of recombining inputs and outputs give not only high-level properties, such as associative thinking and qualitative generalizations, but also more prosaic ones. What happens if the wires are mixed up in the electrical circuit? Most likely, this will be fatal for the circuit. But for the brain is not necessary. Experiments were carried out (on animals), when the brain was literally cut into pieces and mixed, after which it was folded back and the animal was released. After some time, the animal returned to normal and lived on. Useful property, truth?

There are no meta levels

Once I puzzled over how a logical hierarchy is built in intelligent systems, how logical constructions occur when another is built above one crystal clear level, generalizing several lower levels into itself. In the works on artificial intelligence (and not only) it was beautifully painted how systems evolve, accumulate complexity in themselves, and move from one level to another.

In real systems, it has always turned out that no matter how well the lower levels are thought out, they always accumulate some inconsistencies, changes that cannot be coordinated with the upper levels, these changes sooner or later break the entire system, requiring a major restructuring of the entire hierarchy. The way out was to not get too attached to such hierarchies, to leave freedom of action so that, if necessary, one could work bypassing the formal hierarchy. And this is not a sign of a poorly designed system, these are the realities of life.

Of course, the right system minimizes the mess, that's what the system is for. But this does not mean that informal ties are completely absent in such a system. A good system, for all its correctness, must carry the element of its own destruction. An element that comes into action when the system is no longer able to cope with its tasks, an element that completely rebuilds the system for new realities.

So in artificial intelligence, there seems to be no place for any logical levels and meta-transitions (especially in view of the previous section on associations and generalizations). All rules (they are neurons in our understanding) are simultaneously involved in decision-making at any level. Rules-neurons can dynamically line up in various ensembles, in various hierarchies. But even with such a dynamic alignment, they will not have hierarchical rigor, an element from the lower level can easily affect the upper level, so much so that it completely rebuilds it.

In each specific case of applying the rules, you can build your own hierarchy. But this hierarchy is not static, as happens with various formalizations. In intelligence, everything depends on which side to pull, at what angle to look. Each corner has its own hierarchy, and there can be a lot of such "points of view". Inside the intellect, apparently, there is no clear hierarchy (meaning not the "hardware" level that determines which part of the brain is responsible for which organs).

quantum genetics

A lot of interesting things are told about quantum computing and quantum properties of intelligence. Some even believe that the brain can directly "pull" quantum processes in order to think.

Briefly, the essence of quantum computing is as follows. The initial data are applied to a small number of elementary particles. In the process of solving, data begins to be processed simultaneously by a huge number of different ways, moreover, these methods communicate with each other, find out who has a better solution, who has a worse one, bad decisions are weakened, good ones are strengthened.

This happens due to the fact that when starting quantum computing, each particle completely "feels" the state of all other particles involved in the calculation, and feels instantly, and without significant energy costs. When the solution is run, the particles are immersed in a "smeared" ("tangled") state that cannot be explored from the outside world. In a smeared state, each particle does not have a clear physical state, each particle is simultaneously in several states and can participate in several parallel processes (and these processes "feel" each other). Moreover, the more particles involved in the solution, the more states the same particle can have simultaneously.

If we try to get into a smeared state and see what is inside, then at each specific moment we will get specific physical states of particles, without a hint of a multitude of simultaneous states. Moreover, after such an intervention, the decision process will be completely destroyed, the multiple state cannot be restored. Therefore, what happens in the interval between setting the input data and removing the result of the solution is a mystery. It turns out that a small amount of input data in the process of solving generates many orders of magnitude more complex internal state, which is not clear how it evolves, does not lend itself to research, and, nevertheless, gives the correct solution. The state taken when trying to study the smeared state is of a probabilistic nature. With the correct compilation of a quantum algorithm, it is possible to make it so that the probability of removing the correct solution is much higher than the probability of removing the wrong solution (i.e., the solution must be removed at least several times).

It would seem that in this way you can get huge computing power almost for free. But there is a problem - the particles of the solution must be completely isolated from the outside world, otherwise the outside world will knock down the correctness of the solution (break coherence). There is an opinion that complete isolation is impossible, since (as quantum physics says) every particle, every quantum, is initially smeared throughout the universe and is closely intertwined with every other particle that makes up our universe. And, as a consequence of this opinion, it turns out that quantum computing does not immerse particles in a state spread over alternative universes, but attracts other, quite specific particles from our own universe to parallelize calculations. True, this still does not eliminate the difficulties in studying the internal state.

Several interesting conclusions follow from this. Quantum computing at high power will not be able to provide us with absolutely true calculations, but it is quite suitable for plausible solutions, for example, in artificial intelligence, as described above. And one more conclusion that closely intersects with the previous one - quantum computing cannot provide self-knowledge, self-reflection, since they are an integral part of the universe, and therefore cannot know themselves and the universe as a whole, since they are an integral part of it. Why, in fact, quantum uncertainties follow when trying to measure the states of quantum particles, as we noted in the previous parts. After all, it is impossible to fully know oneself only with the help of oneself. Quantum uncertainty is, in fact, a direct consequence of Gödel's theorem, which states that a formal system cannot know itself with absolute true accuracy.

Now back to intelligence. Many researchers have correctly noticed the similarities between the properties of quantum computing and intellectual processes in humans. The most interesting properties for us are as follows. The input and output of decisions is a fairly simple set of states. These simple states plunge the brain into a much more complex state that cannot be explored from the outside. An attempt to investigate this state, or remove the decision - again produces a set simple states, and these states are also probabilistic and are more likely to give the correct solution than the wrong one. It is difficult for a person to realize this internal state, unlike entry / exit, but this internal state is what is called "felt". As in quantum computing, setting the initial state and removing the decision are rather time-consuming procedures. So it is easy for a person to think “within himself”, but in order to convey his thoughts outside, to another person, one must try hard.

Now it only remains to note that the above properties of quantum computing and human intelligence are practically one to one applicable to the previously described artificial intelligence algorithm, which was based on a genetic algorithm.

Indeed, it would seem, where in the brain, with its waves, frequencies and neurons, can some kind of genetic algorithm arise, with its chromosomes and alternative solutions? It turns out, if you look from the other side, the genetic algorithm is just one of the manifestations of more general class processes.

It turns out that in order for the brain to exhibit interesting quantum properties, it is not at all necessary to pull quantum processes directly, there are more pragmatic explanations. And it is not at all necessary for quantum computing itself to involve mysticism about parallel universes and absolute truths, because they may well be organized at the micro level into some kind of genetic algorithm that only pretends to immerse particles in a smeared state, but in fact content with the computing resources of its own universe.

It may well be that at the junction of quantum computing, genetic algorithms and other areas of artificial intelligence, a new theory of computing will arise, which will make it possible to make more powerful quantum computers, and will allow, based on the strict apparatus of quantum physics, to more accurately explain the processes occurring in intelligence. After all, what we have come to so far in understanding intelligence is reminiscent of the anecdote “I feel in my gut that 0.5 + 0.5 will be a liter, but I can’t prove it mathematically,” when we can do it, but we don’t yet explain why.

Internal representation of things

How is the representation of external things in the brain? It may seem that the brain recreates the physical model of objects and phenomena, which leads to a lot of incorrect conclusions. In fact, the internal representation is not at all the same as the physical model. The internal model is plausible. The internal model forms an analogue that captures only the most important properties of the object for us, those properties that are used in everyday experience. Such an internal model is sometimes called "naive physics". Within the limits of everyday experience, such a model gives, albeit incorrect, but quite practical results. But as soon as we go beyond everyday experience, such a model fails.

The set of rules that form such a model may be very far from the actual physical representation. Why the internal representation carries in addition a lot of "fantastic" properties of real objects, the internal representation begins to "live its own life". For example, cartoons. A person can easily recognize a face drawn on a caricature, trained people can draw caricatures. But caricature face recognition systems are baffling. And rightly so, even though there is a plausible model in the recognizers, it is much closer to the physical than the human one.

The internal representation is also characterized by the gradient of complexity described earlier. It consists in the fact that a simpler object or phenomenon generates a much more complex internal representation, which is responsible for modeling the essence of things. After all, the intellect cannot directly model physics. A simple idea gives rise to a "feeling" of the essence of things, when you feel why it is so, but you cannot explain. A more complex representation can stretch the representation of things to the level of understanding, fantasizing, to the level of conscious thinking (example with caricatures).

hello logic

What is logic and where does the highest intellectual activity come from?

Our intellect has gone from the simplest "input-output" reactions to the combination of a huge number of competing processes that decompose the input situation into a much more complex internal representation.

As a result, for some things, complexes of rules of enormous complexity (much more complicated than the physics of the original thing) can be formed, which are the internal representation of this thing. What kind of rules these rules can be and how approximately they can be formed is described above. But the main thing is that due to such a difference in complexity between the original phenomenon and its description, a qualitatively different representation of this phenomenon becomes possible, which makes it possible to derive new knowledge about this phenomenon with a high degree of plausibility.

Let me remind you that the paradox of intelligence vs algorithm lies in the fact that the algorithm can only stupidly model the physics of things, without understanding the essence of the phenomenon, cannot derive new knowledge about the phenomenon, and cannot even guarantee the truth of its work. The intellect, due to a much more complex internal representation of the essence of things, is able not only to model these same things, but also to derive new knowledge about them, and even to evaluate the truth of judgments about this thing, and with a high degree of plausibility.

The transition of complexity, which expands the representation of the essence of objects and phenomena into a representation that is many orders of magnitude more complicated than the original "physics" of objects and phenomena, is a good candidate for the role of understanding. There is a decomposition into a complex internal representation - there is an understanding and high-quality operation of the essence of things, a flexible response is possible in case of unexpected situations. There is no such decomposition - only stupid "cramming" is possible, blindly following the algorithm, not giving the conclusion of new knowledge and not giving an account of the essence or truth of one's work, and straying when new factors arise.

Stupid adherence to the algorithm does not attract higher intellectual activity, therefore it is fast and efficient, where only a clear reaction to typical situations is needed. Higher intellectual activity, with the involvement of understanding, is able to slowly rivet different algorithms, but is not able to quickly execute them. Combinations of these methods are also possible.

The next logical question - is it possible to understand what happens in the process of understanding? And the same logical answer - most likely yes, but this will require such a representation of the processes occurring in the intellect, which would be many orders of magnitude greater than the complexity of the original intellectual processes. That is, we can learn something about the intellect, we can do something, but we are not able to fully and qualitatively understand the work of the human intellect - the power of the brain is simply not enough. Although we can study and use patterns, as we use computers now, without full awareness of the processes taking place in them. It is unrealistic to imagine the processes taking place in all the millions of transistors, although it is quite possible to understand how the logical units of computer circuits work and how they are combined into higher levels. The same is true for the intellect.

From the foregoing, it becomes clear why understanding is difficult to explain, and why understanding is easy to feel, why it is possible to lay out the logical constructions that accompany understanding, and why it is very difficult to reproduce the whole basis that led to understanding itself. There will be a whole section on this below. From here it becomes clear what a feeling is, what a sensation of a state is, why feelings and sensations are difficult to express, but easy to feel. In general, there are many interesting consequences for those who are interested - look in the direction of the quantum properties of consciousness.

Another question is how exactly it turns out that we are aware of ourselves, are aware of the world around us? Will thinking machines achieve such awareness? This fundamental philosophical question falls outside the scope of artificial intelligence, but we will still try to answer it in the next part.

Continuing the thought about the differences in complexity and understanding, we come to the conclusion that the super-complex internal representation, in the end, will be able to generate very thin, one might say sharp, super-harmonized edges from the internal representation of things. In other words, idealizations, or abstractions of the original things.

These abstractions owe their birth to the repeated combination of a huge number of internal, conflicting and combined processes. But unlike the results of the interaction of the majority internal processes, for abstractions, the result will not be blurry (wide) in nature, but will be collected, as it were, into a point, into one or more very clear edges or peaks.

Naturally, abstractions are generated, among other things, by repeated observation of the manifestations of their real prototypes, and by repeated reflections involving the internal representation of objects. Moreover, the multiplicity of these observations and reflections is certainly higher than that of other objects that do not give abstractions. And the nature of the rules that give an internal representation of abstractions is certainly more ordered, more adapted to peak harmonization, form.

The next step is that such peak harmonizations will be able to unite into long chains, acting according to their own laws. Thus, we get a new level of thinking, abstract or logical. Naturally, such a level is much more complicated than ordinary understanding, and not every creature endowed with understanding is capable of complex logical constructions.

Such abstract chains will live according to their own laws, somewhere resembling the original prototypes, somewhere moving away from them.

Pay attention to where the logical constructions come from. They are not at the algorithmic level, and not even at the next level, the level of understanding. They already constitute the third level of intelligence, a kind of understanding over understanding.

It remains to be recalled that in the process of logical constructions, the brain only "pretends" that it works like a stupid algorithm, like a primitive inference machine. In fact, the processes involved in logical constructions are vast orders of magnitude more complex than the original logical constructions, and due to such a jump in complexity, the intellect manages to create new logical constructions and judge their truth with a high degree of accuracy.

In the same way, the brain can emulate the operation of computers (Turing machines), due to "heavy" processes, although it seems that the brain follows "fine" algorithms (especially if the brain is trained for such work).

And a little more about subtle logical constructions and drawing up algorithms. To the uninitiated, it may seem that when thinking about mathematical truths or when compiling computer programs, an enlightened sage sits down in some kind of meditation, and with the help of correct reasoning comes to the right conclusions or comes up with the right program. In fact, what is happening is more like the following diagram.

  • I came up with "something", the initial version (or even accidentally generated it).
  • Checked the work of this "something" by emulating logical reasoning, remembered problem areas.
  • I tried to improve problem areas by typing (improved this "something").
  • Checked out the improved version, and so on.

After all, the brain is able not only to generate correct logical chains on the go, but to check the operation of these chains by emulating logical reasoning. Another thing is that the brain has a huge "library" of templates for different situations, plus a bunch of plausible rules on how to combine these templates with each other. By applying these rules and patterns, with a small number of attempts, it is possible to build good logical reasoning and programs. In particular, such rules may include diagnostics of the work of different combinations of templates, they may and will be replenished dynamically, instead of running logical constructions on the entire amount of data each time.

This week you could read an extremely motivating case from a GeekBrains student who studied the profession, where he spoke about one of his goals that led him to the profession - the desire to learn the principle of work and learn how to create game bots on his own.

But really, it was the desire to create a perfect artificial intelligence, whether it be a game model or a mobile program, that inspired many of us to the path of a programmer. The problem is that behind tons educational material and the harsh reality of customers, this very desire was replaced by a simple desire for self-development. For those who have not yet begun to fulfill their childhood dream, here is a short guide to creating a real artificial intelligence.

Stage 1. Disappointment

When we talk about creating at least simple bots, eyes fill with brilliance, and hundreds of ideas flicker in my head what he should be able to do. However, when it comes to implementation, it turns out that the key to real behavior is the math. Yes, artificial intelligence is much more complicated than writing application programs - knowledge of software design alone is not enough for you.

Mathematics is the scientific base on which your further programming will be built. Without knowledge and understanding of this theory, all ideas will quickly break down into interaction with a person, because artificial intelligence is actually nothing more than a set of formulas.

Stage 2. Acceptance

When the arrogance is a little knocked down by student literature, you can begin to practice. It's not worth throwing yourself at LISP or others just yet - you should first get comfortable with the principles of AI design. Both for quick learning and further development, Python is great - this is the language most often used for scientific purposes, for which you will find many libraries that will make your work easier.

Stage 3. Development

Now we turn directly to the theory of AI. They can be conditionally divided into 3 categories:

  • Weak AI - bots, which we see in computer games, or simple assistants, like Siri. They either perform highly specialized tasks or are an insignificant complex of those, and any unpredictability of interaction puts them in a dead end.
  • Strong AI are machines whose intelligence is comparable to the human brain. To date, there are no real representatives of this class, but computers like Watson are very close to achieving this goal.
  • Perfect AI is the future, a machine brain that will surpass our capabilities. It is about the danger of such developments that Stephen Hawking, Elon Musk and the Terminator movie franchise warn.

Naturally, you should start with the simplest bots. For this, remember good old game Tic-tac-toe when using a 3x3 field and try to find out for yourself the main algorithms of actions: the probability of winning with error-free actions, the most successful places on the field for placing a piece, the need to reduce the game to a draw, and so on.

Several dozen games and analyzing your own actions, you will surely be able to highlight all the important aspects and rewrite them into machine code. If not, then keep thinking, and this link is here just in case.

By the way, if you still took up the Python language, then you can create a fairly simple bot by referring to this detailed manual. For other languages, such as C++ or Java, you can easily find step-by-step materials as well. Feeling that there is nothing supernatural behind the creation of AI, you can safely close the browser and start personal experiments.

Stage 4. Excitement

Now that things have moved on from the dead center, you probably want to create something more serious. The following resources will help you with this:

As you understand even from the names, these are APIs that will allow you to create some kind of serious AI without wasting time.

Stage 5. Work

Now, when you already quite clearly understand how to create AI and what to use, it's time to take your knowledge to a new level. Firstly, this will require the study of the discipline, which is called " Machine learning". Secondly, you need to learn how to work with the appropriate libraries of the chosen programming language. For the Python we are considering, these are Scikit-learn, NLTK, SciPy, PyBrain and Numpy. Thirdly, in development you can’t get away from . And most importantly, you can now read AI literature with a full understanding of the matter:

  • Artificial Intelligence for Games, Ian Millington;
  • Game Programming Patterns, Robert Nystorm;
  • AI Algorithms, Data Structures, and Idioms in Prolog, Lisp, and Java , George Luger, William Stubalfield;
  • Computational Cognitive Neuroscience, Randall O'Reilly, Yuko Munakata;
  • Artificial Intelligence: A Modern Approach, Stuart Russell, Peter Norvig.

And yes, all or almost all literature on this topic is presented in a foreign language, so if you want to create AI professionally, you need to bring your English up to a technical level. However, this is true for any area of ​​programming, isn't it?