Wednesday, May 22, 2019

Solving the AI Control Problem: Transmit Its Mental Imagery and Inner Speech to a Television Monitor

The "AI control problem" is the issue of how to build a superintelligent artificial intelligence, while still being able to control it. Maintaining control is important, because we want to be able to intervene if a supercomputer starts to plan a hostile takeover. I have written articles that explain how to do this, and I will lay out the general premise here. I call it the "Imagery Visualization Model." This method transduces the AI's mental imagery and inner voice to a format that humans can watch and hear. If you can see what the network is thinking then anything it tries to plan will be as clear as day. 

In a nutshell the idea is to have the AI's memory and processing system be linked to a second system that can create maps that depict what is going on in the first system in each of its processing states. In fact, this is what goes on in the mammalian brain. Our cortical sensory areas continually create topographic maps of whatever our mind turns to. Recent research has proven effective in reading the activity patterns in visual cortex so that what a person is thinking can be displayed on a screen. Interestingly, the brain imaging technology today that can create pictures of people's mental imagery uses neural networks to do what it does. However, this research is still in its early stages. This is not so with computers. There are countless examples of neural network implementations that do exactly this. They are generative systems and they generate pictures or video to match what is going on in the rest of the network. Connecting a generative system like this to existing nodes in an AI's memory network would make it an open book.

Most professionals in the field of artificial intelligence believe that the most promising form of AI is found in neural networks. The artificial neural network is a computer architecture for building learning machines. A neural network is composed of many nodes, or neurons, that communicate with each other to process inputs and create outputs. Neural networks are generally the most intelligent and best performing versions of AI today. However, the problem with neural networks is that they are a "black box." They are so complicated, and their representations are so distributed, that even in a very simple network a human could never perform the complex mathematics to determine why the system performed the way it did. This is one reason why AI researchers are afraid that the AI systems of the future will be completely inscrutable, and that we will never know about its plans for world domination until it starts to act on them. However, I believe that the present method would create complete transparency, and would amount to a powerful precautionary safeguard.













The picture above shows the 3 outputs of a 2 layer neural network being sent to a television and this, of course, is an oversimplification. Correctly implementing this method would require a hierarchical biomimetic system, composed of many interconnected multilayer neural networks of pattern-recognizing nodes. The multiple interfacing networks would be arranged in an architecture similar to the mammalian neocortex with auditory and visual modules at the bottom of the hierarchy. These sensory modules would develop maps of incoming sensory input, but also maps of internally generated representations as well.

There are many technologies in use today that make it possible to take outputs from a neural network and use them to formulate a picture or video. These technologies include inverse networks, generative networks, Hopfield networks, self-organizing maps, and Kohonen networks. In this article and on this webpage I explain how to build a superintelligent AI system that implements my model of working memory. These sources also explain how to use the above technologies to create a audio/video output of the AI's consciousness... a clear view into its mind's eye.

If the contents of the AI's consciousness (its mental imagery and inner speech) are transmitted to a television monitor, then people could watch exactly what is going on in its mind. In the article I explain that human reasoning is propelled by a constant back and forth interaction between association areas (prefrontal cortex, posterior parietal cortex) that hold working memory, and sensory areas (early visual and auditory cortex) that build maps of what is going on in working memory. These interactions are key to the progression of thought. This is partly because each map introduces new informational content for the next iterative cycle of working memory.

Think of something right now. Don't you see mental images? If I ask you to imagine a green hippopotamus on a unicycle, your early visual cortex will automatically build a topographic map of exactly that. But it will also add new specifications and draw the hippo the way it "wants" to, filling in the blanks (i.e. it might be smiling, or pedaling, it might be a silhouette, or block of clay). This new content added by unconscious sensory cortex will in turn will affect the way you think about the hippo, and where you mind turns next. 


In my architecture for AI, the generation of imagery maps is necessary for a cognitive cycle. In order to keep thinking and reasoning, the system must be building mental imagery. It is inherently obligated to create pictures and text to initiate and inform the next state of processing. It would be a simple addition to such a network to capture its internally generated imagery and display it for humans to observe. 


In an advanced AI, this video stream may proceed very rapidly, but it could be recorded to an external memory drive and monitored by a team of people. You could have many people observing and interpreting various parts of this video feed, or you could also have another AI scanning it for contentious elements.  As they watch its inner eye and listen to its inner voice, they can determine if its intentions become malevolent and determine if its "kill switch" should be activated. With full insight into its mind's eye, it should be possible to discover and address a hidden agenda. Slap this tech on to your superintelligent AI, and it won't be able to hide anything from you.

The machine should not be able to consciously alter or manipulate its maps in order to deceive us. To prevent this, the connections linking the subsystems would have to be fundamental and unalterable. 
It would also be important to ensure that all of the cognitive representations held coactive in the machine's working memory were included in the composite depiction built into its maps. This way the machine could not attempt to formulate thoughts that were not transduced into mental images. The sequence of maps that are made must be consistent with its aims, hopes, and motives. This is the case with the human brain. Imagine that you are in a room with someone and the only thing in the room is a knife. Complete access to the mental imagery they form in their brain, along with all of their subvocal speech would give you near certainty about everything from their plans to their impulses. This mental imagery could be streamed on websites so that any person or scientist can watch the imagery and monitor it for questionable content. 

This kind of information could also help us to develop "friendly AI." Instead of rewarding and punishing an AI's behavior, we could use this video feed to reward and punish its intentions and impulses to bring its motivations in line with our own objectives. It could also be used to alter the machine's motivations, and utility function. Just as in a human child, compassionate, prosocial, and positive behaviors and cognitions could be programmed and engineered into it after it has already been designed and implemented.

Without using this method it would be practically impossible to predict the intentions of a recursively self-improving artificial agent that was undergoing a rapid explosion in intelligence. Many researchers have come up with good reasons why sufficiently intelligent AI might veer off the friendly course. Steve Omohundro has advanced that an AI system will exhibit basic drives that will cause AI to exhibit undesired behavior, these include resource acquisition, self-preservation, and continuous self-improvement. Similarly, Alexander Wissner-Gross has said that AIs will be highly motivated to maximize future freedom of action, despite our wants and needs. Eliezer Yudkowsky has been quoted as saying, "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." Alexa Ryszard Michalski, a pioneer of machine learning, has emphasized that a machine mind is fundamentally unknowable and is therefore dangerous to humans. If the technology described above is properly implemented, the machine mind would not be unknowable, and would not necessarily be dangerous at all.





A diagram illustrating the reciprocal interactions between items held in working memory and sensory cortex in the brain. This would be recreated in an AI system. This involves transformations of information between lower-order sensory maps and higher-order association area ensembles during internally generated thought. Sensory areas can create only one topographic map at a time, whereas association areas hold the salient or goal-relevant features of several sequential maps at the same time.



A diagram illustrating how working memory system interacting with sensory cortex that builds mental imagery in the form of topographic maps creates a continuous narrative, a stream of thought, and progressive imagery modification. Representations B, C, D, and E, which are held active in association areas, all spread their activation energy to early visual cortex, where a composite image is built that is based on prior experience with these representations. 2) Features involved in the topographic imagery from time sequence 1 converge on the PFC neurons responsible for F. B drops out of activation, and C, D, E and F remain active and diverge back onto visual cortex. 3) The same process leads to G being activated and D being deactivated


A LINK TO: My Article on AI and Working Memory


A LINK TO: My Webpage on My Architecture for AI