During my early time studying neuroscience two phenomena really stood out to me as fascinating with great exploratory potential: "sustained firing" and "receptive fields." Most people have no idea what these words mean, but here I will try to explain them and what they have to offer to computer science and artificial intelligence. To do so, we will shrink down to the level of brain cells and see how they help us carry many forms of information through time, but that modern AI systems only carry words, and this can be framed as their major limiting factor.
I. Introduction: The Illusion of Cognition
Large language models (LLMs) and the transformer architecture share
striking functional parallels with the human brain. Both systems rely on
capacity-limited stores to hold information, and both update that information
iteratively, selecting the next most probable association based on prior
context. However, this functional mimicry masks a profound ontological
disconnect. While the brain is an evolved organ embedded in the thermodynamic
flux of the physical world, the language model is hermetically sealed within a
"token space."
Consequently, these models suffer from two fatal deficits that prevent
true general intelligence: they exist in Token-Bound Time and rely on
Token-Locked Receptive Fields. They do not process reality; they process a
symbolic queue. They relate tokens back to previous tokens, making
probabilistic guesses about associations, but this process is entirely
untethered from the real time of moving objects, physical interactions, and
genuine causality.
II. Trapped in Token-Bound Time
To understand the deficit of the Transformer, we must first define the
biological standard it fails to meet. In the mammalian brain, time is not
merely a sequence of events; it is a metabolic endurance test. The prefrontal
cortex tracks time through sustained firing, a mechanism where neurons must
actively expend energy to keep a representation alive across a delay. This
"holding cost" grounds the brain in real time; the duration of a
thought is physically palpable.
AI, by contrast, lives in Token-Bound Time. In this state,
"time" is not a temporal dimension measured in seconds or decay; it
is a topological dimension measured in sequence length. The model perceives the
"past" not as a fading signal that requires energy to sustain, but as
a perfectly preserved list of integers at specific positional indices.
This creates a metric gap. Consider two sentences: "The ball
[fell]" and "The empire [fell]".
In real time, the first event is instantaneous and the second spans
centuries of complex causal decay.
In token-bound time, the distance between the subject and the verb in
both cases is identical.
Because the model lacks a mechanism for sustained firing, it lacks the "visceral physics" of duration. It uses token order for learning (credit assignment), treating a gap of five centuries with the same computational weight as a gap of five seconds. It lives in a "frozen world" where time is spatialized, stripped of its flow, and severed from the thermodynamic constraints that govern actual cause and effect.
III. Token-Locked Receptive Fields
The limitations of Token-Bound Time are compounded by a structural
blindness I call Token-Locked Receptive Fields. In neuroscience, a receptive
field is the specific "window on the world" to which a neuron
responds. Each brain cell has a unique set of inputs all of which combine to
determine its unique response properties signifying not just where it sits in
the hierarchy, but what it represents when active. The cortex is organized into
a massive spatiotemporal hierarchy of cells each with its own receptive field. Low-level
fields (in sensory cortex) are small, transient, and lock onto simple physical
features (edges, brightness). High-level fields (in association cortex) are
massive, sustained, and lock onto abstract "trans-temporal" realities
(goals, social hierarchies, future predictions).
Current language models have a similar hierarchy. But they suffer from a
"flatness" of perception. Whether at Layer 1 or Layer 96, the
attention mechanism is structurally identical: it is attending to tokens. The
model effectively has millions of "eyes," but every single one of
them is looking at text, and nothing else.
A Token-Locked Receptive Field means the system never graduates from
processing the symbol to processing the referent. It manipulates the word
"apple" and the word "gravity" with sophisticated
statistics, but it lacks the hierarchical architecture to combine these into a
compounded, multi-modal receptive field that "understands" the
physics of a falling apple. The model is trapped in the map, unable to perceive
the territory.
Elsewhere I have argued that AI needs
to build a scene and should be designed to be scene based. Not a sequence or a
stack of convolutions. A scene. A dynamic, relational, cohesive, world-centered
scene. I think this argument complements the argument I am making and you can
read about it here:
https://www.observedimpulse.com/2025/10/from-context-windows-to-cognitive.html
IV. The Synthesis: Complexity Requires Duration
These two deficits are not separate; they are causally linked. In the
biological brain, the neurons with the most complex, compounded receptive
fields are precisely those that exhibit sustained firing over the longest
periods. They are generally in the parietal cortex and prefrontal cortex.
This reveals a fundamental law of intelligence: Complexity is linked to duration.
To model a complex, abstract concept (like "justice" or
"causality"), a system must be able to hold a state stable against
time. The "deepest" thoughts are necessarily the "longest"
thoughts. Because LLMs lack the mechanism for sustained firing (temporal
depth), they are structurally incapable of forming the compounded receptive
fields (informational depth) required for reasoning. They are attempting to
build a skyscraper of meaning on a foundation that has no temporal thickness.
V. Conclusion: Beyond Language
The diagnosis is clear: current language models are effectively a
disembodied "Broca’s Area" (the brain’s language production center), highly
capable of lexical manipulation and syntactic sequencing, yet isolated from the
sensory, executive, and temporal hierarchies that constitute a mind.
To move beyond this plateau, Artificial Intelligence needs a more general
reality model capable of multimodal fusion. It needs to be attached to an
architecture capable of sustained firing, a mechanism that forces it to endure
the passage of time rather than just counting tokens. Until we break the lock
of Token-Bound Time and expand the hierarchy beyond Token-Locked Receptive
Fields, these models will remain impressive mimics of language, forever
separated from the physical reality that gives language its meaning.
We need to move away from local receptive fields and fixed or predefined hierarchies. We must move toward global receptive fields, flexible cross-attention,
the ability to unify heterogeneous or asynchronous signals, integrate
information across space and time, reason about long-range dependencies, combine
heterogeneous signals, continuous world updating, We need a universal architecture
for building coherent worlds out of fragmented signals. We need a relational
engine capable of binding separate streams of information into unified,
structured representations.
My AGI architecture, that attempts to do these things, can be found at
aithought.com
No comments:
Post a Comment