Wednesday, May 22, 2019

Solving the AI Control Problem: Transmit Its Mental Imagery and Inner Speech to a Television

The "AI control problem" is the issue of how to build a superintelligent artificial intelligence, while still being able to control it. It is important to be able to control it, because we want to be able to intervene before if it starts to plan a hostile takeover. I have written articles that explain how to do this, and I will lay out the general premise here. I call it the "Mental Imagery Visualization Model."

Most professionals in the field of artificial intelligence believe that the most promising form of AI is found in neural networks. The artificial neural network is a computer architecture for building learning machines. A neural network is composed of many nodes, or neurons, that communicate with each other to process inputs and create outputs. Neural networks are generally the most intelligent and best performing versions of AI today. However, the problem with neural networks is that they are a "black box." They are so complicated that, even in a very simple network, a human could never decipher the complex mathematics to determine what was on the network's mind. This is one reason why AI researchers are afraid that the AI systems of the future will be completely inscrutable, and that we will never know about its plans for world domination until it starts to act on them. However, I believe that I have a powerful precautionary safeguard to address this problem.

There are many technologies in use today that make it possible to take the outputs of a neural network and use them to formulate a picture or a video. These technologies include inverse networks, generative networks, Hopfield networks, self-organizing maps, and Kohonen networks. In this article and on this webpage I explain how to build a superintelligent AI system that implements my model of working memory. These sources also explain how to use the above technologies to create a audio/video output of the AI's consciousness... a clear view into its mind's eye.

If the contents of the AI's consciousness (its mental imagery and inner speech) are transmitted to a television, then people can watch exactly what is going on in its mind. In the article I explain that human reasoning is propelled by a constant back and forth interaction between association areas (prefrontal cortex, posterior parietal cortex) that hold working memory, and sensory brain areas (early visual and auditory cortex) that build maps of what is going on in working memory. These interactions are key to the progression of thought.

Think of something right now. Don't you see mental images? If I ask you to imagine a green hippopotamus on a unicycle, your early visual cortex will build a topographic map of exactly that. In fact, there is brain imaging technology today that can create pictures of people's mental imagery. It doesn't work so well yet, but it uses neural networks to do what it does. The technology for creating pictures of a neural network's activity is much more advanced, and neural networks are routinely used today for building topographic maps. Slap this tech on to your superintelligent AI, and it won't be able to hide anything from you.

In my architecture for AI, the generation of imagery maps is necessary for a cognitive cycle. In order to keep thinking and reasoning, the system must be building mental imagery. It is inherently obligated to create pictures and text to initiate and inform the next state of processing. It would be a simple addition to the network to capture its internally generated imagery and display it for humans to observe. In an advanced AI, this video stream may proceed very rapidly, but it could be recorded to an external memory drive and monitored by a team of people. You could have many people observing and interpreting various parts of this video feed, or you could also have another AI scanning it for contentious elements.  As they watch its inner eye and listen to its inner voice, they can determine if its intentions become malevolent and determine if its "kill switch" should be activated. With full insight into its mind's eye, it should be possible to discover and address a hidden agenda before the AI initiates a hostile takeover.

It would be important to ensure that all of the cognitive representations held coactive in the machine's working memory were included in the composite depiction built into its maps. This would make it an open book. This way the machine could not attempt to formulate thoughts that were not transduced into mental images. The sequence of maps that are made must be consistent with the aims, hopes, and motives. This is the case with the human brain. Imagine that you are in a room with someone and the only thing in the room is a knife. Complete access to the pictures they form in their brain, along with their subvocal speech would give you near certainty about everything from their plans to their impulses.

This kind of information could also help us to develop "friendly AI." Instead of rewarding and punishing an AI's behavior, we could use this video feed to reward and punish its intentions and impulses to bring its motivations in line with our own. It could also be used to alter the machine's motivations, intentions, and utility functions to bring them in line with human objectives. Just as in a human child, compassionate, prosocial, and positive behaviors and cognitions could be programmed and engineered into it after it has already been designed and implemented.

Without using this method it would be practically impossible to predict the intentions of a recursively self-improving artificial agent that was undergoing a rapid explosion in intelligence. Many researchers have come up with good reasons why sufficiently intelligent AI might veer off the friendly course. Steve Omohundro has advanced that an AI system will exhibit basic drives that will cause AI to exhibit undesired behavior, these include resource acquisition, self-preservation, and continuous self-improvement. Similarly, Alexander Wissner-Gross has said that AIs will be highly motivated to maximize future freedom of action, despite our wants and needs. Eliezer Yudkowsky has been quoted as saying, "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." Alexa Ryszard Michalski, a pioneer of machine learning, has emphasized that a machine mind is fundamentally unknowable and is therefore dangerous to humans. If the technology described above is properly implemented, the machine mind would not be unknowable, and would not necessarily be dangerous at all.

A LINK TO: My Article on AI and Working Memory

A LINK TO: My Webpage on My Architecture for AI

A diagram illustrating the reciprocal interactions between items held in working memory and sensory cortex in the brain. This would be recreated in an AI system.

A diagram illustrating how working memory interacting with sensory cortex that build mental imagery in the form of topographic maps creates a continuous narrative, a stream of thought, and progressive imagery modification.