Modern AI is capable of some fantastic
feats, yet is still very limited compared to the human mind. The disciplines of
machine learning and deep learning have shown us that with a powerful computer
and a bunch of neurons organized into a network we can throw easy psychological
problems at these networks and expect good answers. However, when researchers
try to construct more complex networks, to tackle more cognitively complex problems,
the networks fail to deliver. This is because they have not yet used a simple
trick that animals have been using for hundreds of millions of years: gradual
and progressive myelination.
As we progress from infanthood through childhood our brains make
various biological changes. These changes cause our level of analysis to slowly
progress from analyzing brief sensory experiences, to analyzing complex,
abstract scenarios. We begin our lives only being able to notice and attend to
interactions occurring on short time scales. By adulthood, with the prefrontal
cortex fully developed, we find ourselves able to follow interactions occurring
on long time scales. In order to develop the ability to think about complex
things we had to spend almost two decades gradually altering our brain’s
processing strategy. It is a scaffolding process where we focus on the simplest
things first, and use basic knowledge about them to advance incrementally to
more complex things. The fact that all humans, and mammals in general, do this
strongly suggests that it plays a role in the acquisition of advanced
intelligence. In this entry I will argue that this developmental process will
be instrumental in training superintelligent AI.
This gradual process of brain development is made possible by
myelination. Myelin is a fatty substance surrounding the connections between
neurons (axons). Vertebrate animals use it to speed up information transmission
between cells. The myelin increases the rate at which the electrical impulses
travel. But vertebrates aren’t born with all the myelin that they will need as
adults. Instead myelin develops slowly in specific areas, one at a time. Once a
brain area has developed valuable, reliable, and consistent knowledge the
connections formed by learning are solidified by the introduction of myelin.
The order of brain areas affected by myelin is consistent across
all mammals. The early sensory areas are the first cortical areas to develop
myelin. One of these, the primary visual area, starts to myelinate shortly
after birth as the infant gains visual experiences. These early visual areas
are responsible for basic visual perception and don’t rely on trial and error
interactions with the environment. Rather they involve responding to visual
stimuli that are presented simultaneously without any time delay between
appearances. This happens when you see a picture of a house; you generally see
the roof, windows, and door all at once without experiencing much of a time
delay between these stimuli.
The last areas to myelinate are the association cortices and the
prefrontal cortex (PFC). The PFC does not generally finish myelinating until
one reaches the age of 18 or older. This means that the PFC does not “trust”
that it has been wired up correctly until almost two decades into life. Whereas
the visual system “trusts” that it has been wired correctly before the first
two years. This is because sensory stimuli are generally honest, and all show
up at the same time. Whereas complex events are constructed from stimuli that
are removed from each other by delays in time. Understanding the relationships
between events that are not simultaneous requires careful, logical inferences
about causality. For example, the sale of a house is an abstract concept that
involves parties, contracts, and delays that can last for weeks or months. This
is why children aren’t licensed to sell houses.
It takes time to learn to make complex inferences that involve
delays in time. It is probably the case that the process of myelination during
development involves the progressive accumulation of knowledge that supports
and buttresses more complex knowledge. In other words, as simple things are
mastered in early cortical areas they provide the basis for new learning in the
late cortical areas. In the same way, many brief, simple experiences create the
knowledgebase to start to understand long, complex experiences with more
advances probabilistic structures. The layers at the bottom of the hierarchy
must be trained before the higher layers can find regularities and statistical
structure within them. But as you can see in the diagram below the top of the
hierarchy falls in the middle between sensory input and motor output. To
properly train sensory input and motor output it is imperative that they be connected
to each other, and can interact with each other to drive behavior, long before
the association areas interposed between them are brought to the
table.
Many AI researchers point out that the things that AI and neural
network systems today can accomplish are things that can generally be
accomplished by an adult human brain in under a second. This means that they
can only do things that we do unconsciously, such as near instantaneous pattern
recognition. Today’ AIs can recognize houses but could not recognize,
understand, or broker, the sale of a house. What AI is able to do are the kinds
of things that we are able to do with our primary sensory and motor areas. This
is because they are designed like a primary cortical areas. They do not feature
reciprocal interactions between various structures organized into a brain like
hierarchy. Very few AI architectures exist today that connect primary areas
with association areas and a PFC. Those that do, don’t use anything like the
process of myelination. Rather, in existing AI all of the areas from the simple
to the advanced come online at the same time. I think these systems should use
something analogous to the process of myelination because it would help them in
their acquisition of knowledge. If they did, here’s how they should go about
it:
First you would need a number of neural networks of pattern
recognizing nodes. These networks must take inputs from the environment, each
corresponding to a different sensory modality. These early networks must be
linked to one another. Then these would have to be linked together in a
hierarchy where unimodal networks form inputs to multimodal networks, which
then form inputs themselves to even more densely multimodal networks above
them. This “multimodal fusing” is depicted in the figure. The nodes of the
densely multimodal networks would be the association networks and at the top of
this hierarchy would be the PFC which would also be connected directly to the
early motor networks. The nodes of the association and PFC networks would
exhibit sustained firing. Importantly this sustained firing, the activity of
the association networks, and their influence over ongoing processing elsewhere
would start out extremely meager, and increase over time. These capacities
could be increased as the system exhibits proficiency at simple tasks, such as
object recognition, scene classification, and simple motor movements. As the
association areas are added to the system a capacity to plan, and make higher
order inferences and classifications could be expected.
One important concept that I haven’t explained yet is that the
first areas to myelinate in the brain, the sensory areas, have neurons of a
single modality (e.g. either vision or hearing) that fire for short durations.
The association areas and the PFC on the other hand have multimodal neurons
(e.g. both vision and hearing) that fire for long durations. As in the mammalian brain (Huttenlocher
& Dabholkar, 1997), sensory areas should mature (myelinate) early in
development, and association areas should mature late. This will cause the capacity
for sustained firing to start low, but increase over developmental time.
Nature has found that it doesn’t pay to let the multimodal, neurons capable of sustained firing come online until the basics are learned first. I strongly suspect that AI network engineers will find this too. For the sake of progress I just hope that this myelination/development feature is implemented and perfected sooner rather than later. Given the rapid processing in computers, and the sheer amount of data available to them I don’t think that this process will take 18 in an AI as it does in a human. But I strongly believe that it is necessary for any developing thinker to start with the elementary inferences first.
An article that I wrote which can be found here explains this in more detail.
https://www.sciencedirect.com/science/article/pii/S0031938416308289
Postponing the initialization of association
networks in this way would allow the formation of low-order associations between
causally linked events that typically occur close together in time. This would
focus the system on easy-to-predict aspects of its reality (e.g. correlations
between occurrences in close temporal proximity). The consequent learning would
erect a reliable scaffolding of highly probable associations that can be used
to substantiate higher-order, time-delayed associations later in development
(Reser, 2016). In other words, the rate of iterative updating from one state to
the next (Fig. 9) would start very high. This would be reversed over the course
of weeks to years as an increasing capacity for working memory would be folded
in to the system.
Nature has found that it doesn’t pay to let the multimodal, neurons capable of sustained firing come online until the basics are learned first. I strongly suspect that AI network engineers will find this too. For the sake of progress I just hope that this myelination/development feature is implemented and perfected sooner rather than later. Given the rapid processing in computers, and the sheer amount of data available to them I don’t think that this process will take 18 in an AI as it does in a human. But I strongly believe that it is necessary for any developing thinker to start with the elementary inferences first.
An article that I wrote which can be found here explains this in more detail.
https://www.sciencedirect.com/science/article/pii/S0031938416308289
Here is an excerpt from that article.
"Due to their sustained activity, neurons in the PFC can span a wide delay time or input lag between associated occurrences [35], [89] and thereby allow elements of prior events to become coactive with elements of subsequent events. Sustained activity allows neurons that would otherwise never fire together to both fire and wire together, and also allows features that never co-occur in the environment to be present together in topographic imagery. Thus, it may be reasonable to assume that SSC underlies the brain's ability to make internally derived associations between representations that never occur simultaneously in the environment. The longer sustained firing in association cortex lasts, the better the animal will be at capturing information about causally linked stimuli that present apart in time. The longer the sustained firing, the longer the delay can be. The same regularity may happen persistently in the environment, where a stimulus is followed several seconds later by another stimulus, concern, or opportunity; however, if the animal lacks sufficient sustained firing, this statistical regularity will not be captured by the neocortical system because the ensembles for them will never be exposed to each other.
Few if any mammals have evolved a human-like capacity for sustained firing in PFC neurons, and thus the mental lives of most mammals likely involve associations made between temporally proximate stimuli and concepts. This may suggest that in most ecological niches it is not helpful to create memories for relationships between stimuli that occur in delayed succession and instead it is better to focus on analyzing stimuli that present in quick succession [68], [72]. There may therefore be two strategies, on opposite ends of a continuum, for holding recent information active: immediate and delayed succession strategies. The delayed succession strategy, involving high sustained firing and a low rate of working memory updating, is optimal for environmental scenarios that are prolonged over time, where temporally distant cues may retain contextual relevance. This strategy is likely associated with certain ecological or life-history conditions such as low extrinsic mortality, intergenerational resource flows, meme transference, and the K-selection strategy in general.
How can the brain trust that an association between two concepts that are removed in time and never co-occur simultaneously in the environment is valid? Each of the contents of working memory contribute to the selection of the next addition to working memory, and this may help to ensure that the contents held in working memory at any moment are veridically concordant rather than incongruous. This is because the system is narrowly constrained to only combining ensembles that have been highly associated in the past. If this is true, it suggests that at an early age the first associations are between stimuli that are nearly simultaneous, but that these can create foundational knowledge upon which to base reliable inferences about associations between stimuli that are removed from each other by a delay in time.
Because the frontal lobes of infants are underdeveloped, their brains probably exhibit far less continuity between brain states. Very young children can trust the connections that their early sensory areas have made concerning the spatiotemporal associations between near simultaneous features because these events show high order and regularity. This may be why sensory areas myelinate so early in life. Perhaps association areas are programmed genetically not to finish myelinating until early adulthood because it is a time-intensive process to form and test higher-order hypotheses about relationships between constructs that are more distributed through time."