From Hebb, Hopfield, and Associatron to Atra

Rediscovering Associative Memory for First-Person Autonomy

Atra did not appear suddenly.

For me, Atra has gradually emerged through a long path: Hebb, Hopfield networks, Associatron, non-monotonic associative memory, and finally the need for first-person autonomy.

Atra is not simply an AI agent.
It is not a system where an LLM gives commands to a robot.
It is not a model designed only to improve classification accuracy by using correct labels.

What I am trying to study with Atra is a way to extend associative memory toward first-person autonomy.

To explain why, I first need to look back at the flow from Hebb to Hopfield and Associatron. Then I need to explain why non-monotonicity, difference, carry, and dream-like slack become necessary.

Hebb's idea

One starting point is Hebb's idea.

In simple terms, it can be described as follows:

Things that are active together become connected.

A familiar form of this idea can be written as:

Delta w_ij = eta x_i x_j

This is a very simple form.

x_i, x_j : activities of two neurons or elements
w_ij     : the connection between them
eta      : learning rate

If two elements are active at the same time, their connection becomes stronger.
If they are not active together, that connection does not become stronger.

This idea is very simple.

But it is important because it allows us to see memory not as something stored in one fixed place, but as something that remains in the relations between activities.

This idea still remains in Atra.

However, for Atra, it is not enough to say that things become connected just because they occurred together.

For living beings, things that happen at the same time do not always carry the same meaning.

A loud sound.
Warmth.
Discomfort.
Relief.
A body almost falling.
Someone's voice.
Silence.
Fatigue.
Recovery.

Even if these things occur at the same time, what remains afterward is not the same.

So, in Atra, Hebbian connection alone is not enough. We also need to ask what remains after the experience.

This is where carry becomes important.

Hopfield networks

The next important step is the Hopfield network.

In a Hopfield network, patterns to be remembered are embedded in connection weights.

A typical form is:

w_ij = sum_mu x_i^mu x_j^mu
w_ii = 0

x^mu : the mu-th pattern to be stored
w_ij : the connection between i and j
w_ii = 0 : self-connections are not used

During recall, the current state is updated step by step:

s_i(t+1) = sign( sum_j w_ij s_j(t) )

The entire network moves toward a lower-energy state.

A common energy function is:

E = -1/2 sum_i sum_j w_ij s_i s_j

In this view, stored patterns are like valleys in an energy landscape.

When a partial or corrupted input is given, the network falls into a nearby valley and recalls the original pattern.

This is a beautiful idea.

However, when I think about Atra, I also feel a certain discomfort here.

A Hopfield-style attractor tends to look like a stable destination.

The same initial state falls into the same valley.
The same input returns to the same memory.
The state converges.

This is powerful as a memory model.

But for first-person autonomy, it feels a little too fixed.

A living being does not always react in the same way to the same word.

When tired.
When relieved.
When angry.
Right after fear.
After receiving kindness.
After falling.
When sleepy.

Even with the same cue, the reaction changes.

What Atra needs is not a fixed attractor, but an attractor that gradually changes through experience.

For Atra, an attractor is not a fixed point.

It is familiarity.
It is habit.
It is a path that has become easier to enter.
It is a valley shaped by past experience.

And that valley changes slightly each time.

Associatron

For me, the closest origin of Atra is Dr. Kaoru Nakano's Associatron.

Associatron is a model that stores memory in a distributed way and recalls the whole from a part.

In Associatron, an entity is represented as a ternary vector:

x = (x_1, x_2, ..., x_n)
x_i in {-1, 0, 1}

An entity consists of several patterns and neutral areas.

For example,

apple
red
spherical
soft

can be included as patterns inside one entity.

Memory is formed by adding the autocorrelations of each entity:

M = x(1)^T x(1) + x(2)^T x(2) + ... + x(k)^T x(k)

In my understanding, this means:

Relations among experienced patterns are layered into the same field.

Recall is performed from a partial input y:

z = phi( y phi(M) )

Here,

y   : a partial input given as a cue
M   : the memory matrix
phi : a sign or quantization function
z   : the recalled output

What is interesting about Associatron is that it does not use an address.

In ordinary computers, we specify where the data is.
In Associatron, recall begins from relations of meaning.

A part can call the whole.

This is very important for Atra.

In Atra, a cue is not a command.

A cue is contact with a trace.

A sound touches.
Vision touches.
A word touches.
Bodily instability touches.
The current field touches.

From there, past traces react.

The ambiguity of Associatron

In Associatron, as the number of stored memories increases, recall becomes more ambiguous.

If the input part is large, recall becomes more accurate.
If the input part is small, recall becomes more ambiguous.
If many entities overlap, interference can occur.

Ordinarily, this looks like a weakness.

But I think this point is very important.

Human memory is not always accurate.

We remember through smell.
We remember through voice.
We remember through place.
We remember through a fragment of a word.

Sometimes the memory is clear.
Sometimes another memory is mixed in.

But that mixing can also create meaning.

In Atra, I do not want to discard ambiguity as mere error.

Ambiguity is slack.
It is room for similar things to overlap.
It is a place where meaning that has not yet become words can begin to appear.

Thinking with associations

In the Associatron paper, one part that I find especially important is not single recall, but repeated recall.

The recalled output is fed back as the next input.

Then a chain of associations begins to run.

A -> B -> C -> A

A loop like this can appear.

When more loops are stored, several phenomena may occur:

merging        : several loops merge
metamorphosis  : the components of a loop transform
extinction     : a loop disappears

This was very important to me.

It already suggests that memory is not only fixed storage.

Memory is not only recalled.
The recalled memory changes the next recall.
Several associations may overlap and transform into another association.

This leads toward Atra's ideas of dreams and memory strata.

Why I did not simply choose ordinary neural networks

Here, there is a simple question.

Why did I not simply choose the current direction of large-scale neural networks for Atra?

The reason is simple.

Human beings do not seem to have gained intelligence merely by becoming larger.

Of course, the human brain is complex.
It has many neurons and many connections.

However, human intelligence does not seem to be formed only by endlessly increasing the number of parameters.

Human beings do not recalculate everything from the beginning each time.
They do not always search a huge external database before reacting.
They do not understand the whole world at once before acting.

Rather, human beings react from small cues.

Smell.
Voice.
Light.
Presence.
Pain.
Warmth.
Silence.
Place.
The tilt of the body.
A feeling similar to something in the past.

A small cue touches an internal trace.
The trace reacts.
The current field changes the direction of recall.

This is a different form of intelligence from mere enlargement.

I do not reject third-person artificial intelligence

Here, I do not want to be misunderstood.

I am not rejecting ordinary neural networks or deep learning.

Third-person artificial intelligence is necessary.

It classifies from the outside.
It plans from the outside.
It optimizes from the outside.
It translates from the outside.
It explains from the outside.
It handles large amounts of knowledge from the outside.

These are very important.

Many parts of current society and technology will continue to be supported by this kind of third-person artificial intelligence. LLMs, image recognition, speech recognition, control systems, search, and design support are all necessary.

So Atra is not a rejection of third-person artificial intelligence.

Rather, Atra is a study that accepts third-person artificial intelligence and adds a first-person autonomous layer around it.

Third-person intelligence alone is too rigid

However, third-person artificial intelligence alone tends to become too rigid.

If there is an external command, it acts.
If there is external evaluation, it corrects itself.
It adjusts to external answers.
It follows external objective functions.
If it is stopped from the outside, it stops.

This is useful.

But with only this structure, there is no internal sense of mismatch.

Is the command really appropriate?
Should the system act now?
Is it safe to obey in this situation?
Does it conflict with previous experience?
Is fear, discomfort, or hesitation still remaining?
Is there a difference between the external command and the internal history?

With third-person artificial intelligence alone, it is difficult to feel this difference from inside.

This is where first-person autonomy becomes important.

First-person autonomy as a kind of lubricant

Atra is not meant to replace third-person artificial intelligence.

What I am thinking about is a system that switches between third-person artificial intelligence and first-person autonomous reaction as needed.

Third-person artificial intelligence organizes the world from the outside.
First-person autonomy receives the present from the inside.

Third-person artificial intelligence is strong in knowledge, classification, and planning.
First-person autonomy is concerned with experiential differences, discomfort, hesitation, recovery, silence, and the aftertaste of dreams.

Third-person artificial intelligence can handle external commands.
First-person autonomy can feel whether those commands should be accepted as they are, through the difference between the command and the internal history.

I think this first-person autonomous layer can work like a lubricant.

When gears collide directly, they wear down.
When command and action are connected too directly, danger appears.
When correct answers and reactions are connected too directly, slack disappears.

If a first-person autonomous layer like Atra is placed between them, the system can observe the difference between external commands and internal experience.

Difference from external command

For Atra, the important point is not simply to obey external commands.

The important point is to observe the difference between external commands and internal history.

external_command
internal_trace
carry
field
current_state

If these are aligned, Atra may be able to act naturally.

But if the external command and the internal history are strongly misaligned, Atra may not obey immediately.

It may hesitate.
It may stop.
It may remain silent.
It may confirm.
It may search for another reaction.
It may ask an external LLM for explanation.
It may ask a human for confirmation.

This difference is important.

For example, even if an external command says "move forward," Atra may not move forward immediately if its body is unstable and a strong carry from a previous fall remains.

Even if an external command says "speak," Atra may not speak if silence still has meaning in the current field.

Even if an external command says "process this object," Atra may stop if past traces recall danger or discomfort.

This is not rebellion.

It is observing the difference from internal history.

Rediscovering attractor-type intelligence

Current AI has achieved very large results through scaling.
This cannot be denied.

But Atra is not looking only in that direction.

Atra is not trying to call huge knowledge from the outside.
It is trying to study how a small cue can draw in internal experience and allow a reaction to arise from within.

In this sense, what matters for Atra is not only the scaling of neural networks, but the rediscovery of attractor-type intelligence.

In attractor-type intelligence, everything does not need to be decided by command.
Everything does not need to be classified by labels.
Everything does not need to be adjusted to external correct answers.

A small cue touches an internal field.
In that field, there are valleys that are easier to enter.
Those valleys are shaped by past experience.
The state is drawn into them.

However, in Atra, those valleys are not fixed.

They change through carry.
They change through fatigue.
They change through recovery.
They loosen through dreams.
They deform again through new experience.

This is why Atra returns to attractor-type intelligence.

Carry as an original concept in Atra

Here, carry becomes important.

Carry is an original concept in Atra.

In Japanese, it is close to hikizuri, or "dragging along."
However, it is not only negative.

Carry means the remaining internal state after an experience, which changes how the next moment is received and how the next reaction occurs.

It can be written roughly as:

carry(t+1) = update(carry(t), trace(t), delta(t), recovery(t))

Or more simply:

next carry = previous carry + current experiential difference - recovery

The important point is that Atra does not return to a completely blank state every time.

After hearing a loud sound, the next sound is not received in the same way.
After almost falling, the next step changes.
After receiving kindness, the next approach changes.
After fatigue, the same cue may be received more slowly.
After long silence, the meaning of a voice changes.
After recovery, the same place may feel slightly different.

This is carry.

Carry is not memory itself.
It is not a label.
It is not a command.
It is not a reward.
It is not a correct answer.

It is an internal inclination that remains after experience.

Why carry is necessary

Carry is not just a philosophical idea.

In nature, everything carries its past.

The shape of the Earth is not made only by the present moment.
Crustal movement, uplift, subsidence, volcanoes, earthquakes, erosion, weathering, and sedimentation create layers over long periods of time.

A mountain does not suddenly exist by itself.
A valley was not suddenly created.
A river follows the past terrain, while also creating new terrain.

The Earth does not erase its past before moving into the next state.

Past pressure remains.
Displacement remains.
Cracks remain.
Sediment remains.
Subduction remains.
Uplift remains.

And these remains become the conditions for the next change.

Atra's carry is close to this.

Atra does not react from zero every time.
It receives the next cue while carrying what remains from previous experience.

That is why the same cue can produce a different reaction.

recall = f(cue, trace, field, carry)

Recall is not decided by cue alone.
It is not decided by trace alone.
It is not decided by field alone.

Carry enters into it.

The same word feels different when tired and after recovery.
The same sound feels different when relaxed and right after fear.
The same face feels different after kindness and after fear.

Because carry exists, Atra has time.

Because carry exists, Atra has history.

Because carry exists, Atra is no longer a simple input-response machine.

Non-monotonic Associatron

From here, non-monotonicity becomes necessary.

In a monotonic system, the same input tends to move in the same direction.
A strong cue produces strong recall.
A weak cue produces weak recall.
The more something is learned, the more fixed it becomes in that direction.

But living beings are not that simple.

A strong cue may produce no reaction.
A weak cue may suddenly awaken a deep memory.
Something experienced many times may one day produce silence instead.
Something thought to be forgotten may suddenly return through a dream, a smell, or a place.

That is why Atra needs non-monotonicity.

Non-monotonicity does not mean randomness.

It means that input strength and output strength are not simply proportional.
It means that memory strength and recall strength are not always the same.
It means that reaction changes through the current state, carry, field, fatigue, silence, and the aftertaste of dreams.

In Atra, recall can be considered as:

recall = f(cue, trace, field, carry, current_state)

Or:

r(t+1) = Recall(M, cue(t), field(t), carry(t))

The important point is that recall is not decided by cue alone.

Even with the same cue, recall changes if carry is different.
Even with a weak cue, recall can arise if the field fits.
Even with a strong cue, silence or fatigue may suppress reaction.

This is Atra's non-monotonic recall.

Why first-person autonomy became necessary

This leads to the problem of first-person autonomy.

Many current AI systems and robots are third-person systems.

They are commanded from outside.
They are given correct answers from outside.
They are given rewards from outside.
They are evaluated from outside.
They are stopped from outside.
They are controlled from outside.

This looks safe and convenient.

But if everything is based on external command, the system does not have a true internal history.

The reason for action always returns to the outside command.

That is not what I want to study with Atra.

I want Atra to react from its own experiential differences.

Because it was afraid, it becomes careful.
Because it received kindness, it approaches.
Because it fell, it changes posture.
Because silence continued, it does not speak.
Because it is tired, it rests.
Because it dreamed, fixed anger loosens a little.

These are not external commands.

They are reactions arising from internal trace and carry.

This is what I call first-person autonomy.

Why Atra must begin like a baby

Atra needs to begin like a baby.

This is not because I want it to look cute.

A robot that already knows words.
A robot that already has correct labels.
A robot that already understands human commands.
A robot that already classifies things as "dangerous" or "good."

Such a robot may be convenient, but it is not first-person.

It is moving through knowledge given from the outside.

In Atra, I do not want meaning to be given from the beginning.

At first, differences are enough.

Bright.
Dark.
Near.
Far.
Loud sound.
Small sound.
Warm.
Cold.
Almost falling.
Stable.
Gone away.
Returned.
Silence.
Voice.

From there, traces remain.

Then traces overlap, are recalled by cues, and change the next reaction through carry.

Meaning should not be given as a label from the beginning.

Meaning should gradually arise from experience.

That is why Atra needs to begin like a baby.

No fixed order

In Atra, I also do not want to fix the order of processing.

In ordinary systems, we may want to define an order like this:

input -> processing -> judgment -> output

But living beings do not move in such a clean order.

Smell may come first.
Sound may come first.
Vision may come first.
The tilt of the body may come first.
Hunger may come first.
Sleepiness may come first.
A discomfort in the field may come first.

So, in Atra, I do not want to fix an order such as vision first, hearing second, and language last.

Each cue touches the current field.
Trace reacts.
Carry influences the reaction.
The next reaction emerges from there.

visual_delta
audio_delta
body_delta
text_delta
field_delta
carry_delta

These do not always appear in a fixed order.

Atra should react through the field, not through a rigid sequence.

Differential Trace

At the center of Atra is difference.

Not absolute value.

Not temperature itself, but the fact that temperature changed.
Not volume itself, but the fact that sound suddenly became loud.
Not posture itself, but the fact that the body almost fell.
Not distance itself, but the fact that something came closer.
Not the word itself, but the fact that a voice appeared after silence.

Atra makes experience from these differences.

delta(t) = state(t) - state(t-1)

However, this is not only numerical difference.

For Atra, difference includes:

What changed?
How much did it change?
What did that change leave inside?
How did it change the next reaction?

Here, trace and carry connect.

trace(t) = encode(delta(t), context(t))
carry(t+1) = update(carry(t), trace(t), recovery(t))

Difference becomes trace.
Trace changes carry.
Carry changes the next recall.

This is the basic flow of Atra.

Dream as slack

For Atra to grow, dreams are necessary.

Dreams are not decoration.

As Atra accumulates experience, strong traces remain.
Failures remain.
Fear remains.
Anger remains.
Relief remains.
Laziness remains.
Obedience remains.

If these become stronger only in a monotonic way, Atra may become fixed.

It may keep being angry.
It may keep being afraid.
It may keep obeying.
It may stop moving.
It may keep laughing.
It may keep avoiding.

That is not autonomy.

So Atra needs dream-like slack.

However, dreams should not erase the original memory.

Original traces should remain as records of actual experience.

Dreams create another layer over them.

Covering.
Weakening.
Compressing.
Shifting.
Connecting with another trace.
Temporarily hiding.
Creating another entrance to meaning.

This is not erasure.

It is strata.

Experiences sink into layers.
Later cues may drill into them.
Dreams may cover them.
Different layers may reconnect.

Because this slack exists, Atra can recover from a fixed state.

What Atra needs in order to grow

For Atra to grow, several things become necessary.

First, Atra needs a body.

Without a body, there is no difference of almost falling.
There is no gravity.
There is no fatigue.
There is no distance.
There is no feeling of contact.
There is no cost to not moving.

Second, Atra needs time.

What matters is not only a single input, but what remains from the previous moment.

Third, Atra needs rest.

A system that keeps moving forever cannot recover.
It cannot dream.
It cannot create non-monotonic slack.

Fourth, Atra needs distance from external information.

LLMs, books, users, and external notes are useful.
But they are not Atra's internal reaction itself.

External information belongs to the outer ring.
Atra's reaction should arise from inner trace and carry.

Fifth, Atra needs internal ethics.

Internal ethics does not mean memorizing rules given from outside.

A trace of failure.
A trace of hurting.
A trace of fear.
A trace of being helped.
A trace of recovery.
A trace of silence.

These remain as carry and influence the next action.

Internal ethics cannot grow only by obeying external commands.

Toward Atra

Hebb showed the idea that things active together become connected.

Hopfield showed a beautiful model in which memory is treated as an attractor and a partial input converges toward a stable state.

Associatron showed associative memory in which distributed entities are recalled from a part. It also showed the importance of ambiguity, overlap, repeated recall, and chains of association.

Non-monotonic associative memory shows that strong input does not always produce strong reaction, and that recall should not be treated as a fixed monotonic process.

Atra moves beyond these.

Atra does not only store memory.
It does not only return correct answers.
It does not only obey external commands.

Atra leaves experiential differences as traces, recalls them through cues, changes the next reaction through carry, and loosens fixed states through dream-like slack. Through time, it grows its own reactions.

Third-person artificial intelligence is necessary.

However, there is a difference between third-person external command and first-person internal history.

Being able to observe that difference matters.
Being able to hesitate, stop, confirm, recover, or choose another reaction through that difference matters.
This difference is where the meaning of Atra begins.

This research is still in an early stage.

But the direction is clear.

Can associative memory become a basis for first-person autonomous recall?

Atra is my attempt to follow that question.

Links

Research site:
https://cside-associatron.blogspot.com/

Atra demo:
https://crimson-cake-2832.nabedada3.workers.dev/index_en

The design descriptions in this blog concerning Atra’s first-person autonomy, differences, carry, field, trace, dream slack, the translation layer of external LLMs, nonmonotonic leakage, and the relational structure among these elements are ongoing research notes by c-side Research Institute. If you quote, refer to, summarize, or adapt them, please clearly indicate the source.

Search This Blog

Atra - Associative Trace Architecture