Monday, December 1, 2025

Language Models are Trapped in Token-Bound Time with Token-Locked Receptive Fields

During my early time studying neuroscience two phenomena really stood out to me as fascinating with great exploratory potential: "sustained firing" and "receptive fields." Most people have no idea what these words mean, but here I will try to explain them and what they have to offer to computer science and artificial intelligence. To do so, we will shrink down to the level of brain cells and see how they help us carry many forms of information through time, but that modern AI systems only carry words, and this can be framed as their major limiting factor.

I. Introduction: The Illusion of Cognition

Large language models (LLMs) and the transformer architecture share striking functional parallels with the human brain. Both systems rely on capacity-limited stores to hold information, and both update that information iteratively, selecting the next most probable association based on prior context. However, this functional mimicry masks a profound ontological disconnect. While the brain is an evolved organ embedded in the thermodynamic flux of the physical world, the language model is hermetically sealed within a "token space."

Consequently, these models suffer from two fatal deficits that prevent true general intelligence: they exist in Token-Bound Time and rely on Token-Locked Receptive Fields. They do not process reality; they process a symbolic queue. They relate tokens back to previous tokens, making probabilistic guesses about associations, but this process is entirely untethered from the real time of moving objects, physical interactions, and genuine causality.

II. Trapped in Token-Bound Time

To understand the deficit of the Transformer, we must first define the biological standard it fails to meet. In the mammalian brain, time is not merely a sequence of events; it is a metabolic endurance test. The prefrontal cortex tracks time through sustained firing, a mechanism where neurons must actively expend energy to keep a representation alive across a delay. This "holding cost" grounds the brain in real time; the duration of a thought is physically palpable.

AI, by contrast, lives in Token-Bound Time. In this state, "time" is not a temporal dimension measured in seconds or decay; it is a topological dimension measured in sequence length. The model perceives the "past" not as a fading signal that requires energy to sustain, but as a perfectly preserved list of integers at specific positional indices.

This creates a metric gap. Consider two sentences: "The ball [fell]" and "The empire [fell]".

In real time, the first event is instantaneous and the second spans centuries of complex causal decay.

In token-bound time, the distance between the subject and the verb in both cases is identical.

Because the model lacks a mechanism for sustained firing, it lacks the "visceral physics" of duration. It uses token order for learning (credit assignment), treating a gap of five centuries with the same computational weight as a gap of five seconds. It lives in a "frozen world" where time is spatialized, stripped of its flow, and severed from the thermodynamic constraints that govern actual cause and effect.

III. Token-Locked Receptive Fields

The limitations of Token-Bound Time are compounded by a structural blindness I call Token-Locked Receptive Fields. In neuroscience, a receptive field is the specific "window on the world" to which a neuron responds. Each brain cell has a unique set of inputs all of which combine to determine its unique response properties signifying not just where it sits in the hierarchy, but what it represents when active. The cortex is organized into a massive spatiotemporal hierarchy of cells each with its own receptive field. Low-level fields (in sensory cortex) are small, transient, and lock onto simple physical features (edges, brightness). High-level fields (in association cortex) are massive, sustained, and lock onto abstract "trans-temporal" realities (goals, social hierarchies, future predictions).

Current language models have a similar hierarchy. But they suffer from a "flatness" of perception. Whether at Layer 1 or Layer 96, the attention mechanism is structurally identical: it is attending to tokens. The model effectively has millions of "eyes," but every single one of them is looking at text, and nothing else.

A Token-Locked Receptive Field means the system never graduates from processing the symbol to processing the referent. It manipulates the word "apple" and the word "gravity" with sophisticated statistics, but it lacks the hierarchical architecture to combine these into a compounded, multi-modal receptive field that "understands" the physics of a falling apple. The model is trapped in the map, unable to perceive the territory.

Elsewhere I have argued that AI needs to build a scene and should be designed to be scene based. Not a sequence or a stack of convolutions. A scene. A dynamic, relational, cohesive, world-centered scene. I think this argument complements the argument I am making and you can read about it here:

https://www.observedimpulse.com/2025/10/from-context-windows-to-cognitive.html

IV. The Synthesis: Complexity Requires Duration

These two deficits are not separate; they are causally linked. In the biological brain, the neurons with the most complex, compounded receptive fields are precisely those that exhibit sustained firing over the longest periods. They are generally in the parietal cortex and prefrontal cortex.

This reveals a fundamental law of intelligence: Complexity is linked to duration. To model a complex, abstract concept (like "justice" or "causality"), a system must be able to hold a state stable against time. The "deepest" thoughts are necessarily the "longest" thoughts. Because LLMs lack the mechanism for sustained firing (temporal depth), they are structurally incapable of forming the compounded receptive fields (informational depth) required for reasoning. They are attempting to build a skyscraper of meaning on a foundation that has no temporal thickness.



V. Conclusion: Beyond Language

The diagnosis is clear: current language models are effectively a disembodied "Broca’s Area" (the brain’s language production center), highly capable of lexical manipulation and syntactic sequencing, yet isolated from the sensory, executive, and temporal hierarchies that constitute a mind.

To move beyond this plateau, Artificial Intelligence needs a more general reality model capable of multimodal fusion. It needs to be attached to an architecture capable of sustained firing, a mechanism that forces it to endure the passage of time rather than just counting tokens. Until we break the lock of Token-Bound Time and expand the hierarchy beyond Token-Locked Receptive Fields, these models will remain impressive mimics of language, forever separated from the physical reality that gives language its meaning.

We need to move away from local receptive fields and fixed or predefined hierarchies. We must move toward global receptive fields, flexible cross-attention, the ability to unify heterogeneous or asynchronous signals, integrate information across space and time, reason about long-range dependencies, combine heterogeneous signals, continuous world updating, We need a universal architecture for building coherent worlds out of fragmented signals. We need a relational engine capable of binding separate streams of information into unified, structured representations. 

My AGI architecture, that attempts to do these things, can be found at aithought.com



No comments:

Post a Comment