Contents
The problem that comes before AGI is memory
Why what matters before model intelligence is a memory layer that reconnects sessions and working state.
In today’s AI systems, the more urgent challenge is not the arrival of AGI but the design of memory structures that preserve judgment and state between interrupted sessions.
- Even a larger context window does not automatically create long-term task continuity.
- Good memory is a selective structure that reduces the cost of the next action, not a larger archive.
- Summaries and compression are useful tools, but they do not replace memory itself.
The problem is MEMORY
Today’s work already has to endure broken sessions and disappearing judgment.
- NowSessions break first
Working state disappears before any stronger model can help.
- CostExplanation cost accumulates
Every repeated scan and repeated inference leaks more time and tokens.
- DesignMemory needs layers
Good memory is a selective structure that lowers the cost of the next action.
Why memory?
Recent memory research suggests that today’s LLMs are not limited only by how “smart” they are. Even with a larger context window, long-term task continuity and stable state do not solve themselves.[1]
No matter how capable the model becomes, the moment the session changes, plans, decisions, and working state break easily. How a structure was approached, what was tried, where it failed, how far the work already reached. If that context does not persist, the user has to explain again, and the model has to rescan the structure and reason again. The result usually looks the same: inspect the structure → debug → error → retry → debug again. The point is that this is not only annoying. It is a cost and time problem.[2]
I ran into this very directly while working. If I close a conversation and start again, the earlier plan and context break, and even the structure I had just finished becomes “new code” in the next session. Then I explain again, the model scans again, and the same family of mistakes and token waste repeats.
A better model reduces reasoning cost. A better memory layer reduces restart cost.
We have long moved memory outside ourselves
This is not an AI-only problem. Humans, too, have long moved their memory outside themselves. Cave paintings, papyrus, parchment, notes, journals, documents. The more complex things get, the less people work only inside their heads, and the more they build memory devices outside themselves.[3][4]
In that sense, current AI development strangely returns to an older human workflow. Memory files, project rules, long-term knowledge, session summaries, retrieval. OpenAI treats memory as a separate product surface, and Anthropic documents project-level memory files in Claude Code as an official workflow.[5][6]
OpenAI’s Pulse makes that direction even clearer. According to the official help center, Pulse uses past chats, saved memory, and feedback to perform asynchronous daily research and then turns the result into a visual summary the next day. It also depends on both saved memories and chat history references being enabled.[7][8]
That distinction matters. Pulse is not a replacement for memory. It is closer to a higher-level experience built on top of memory. So even if the final briefing or recommendation looks polished, the quality underneath remains shallow if the memory layer underneath is weak.[9]
Memory is closer to reconstruction than accumulation
Good memory looks less like a simple storage box and more like a layered structure. Our memory does not keep every experience in one flat layer. Some memories are captured quickly, some are reorganized over time, and some are merged into more general structures.[10]
That perspective also partially overlaps with Ray Kurzweil’s metaphor of hierarchical pattern recognition. He described human thought not as simple storage but as the connection and reconstruction of patterns across layers.[11] Of course, that description is hard to accept as settled modern neuroscience. But the intuition of memory as a hierarchical reconstruction layer, not a flat warehouse, is still useful for designing memory today.[12]
As a natural metaphor, good memory is closer to a mycelial network than to a giant central warehouse. Instead of stacking everything in one place, it lets only the necessary connections become active at the necessary moment. As Merlin Sheldrake’s work on fungal networks suggests, what matters is not sheer size but the structure of exchange and connection.[13] Moved into a memory context, that means good memory is not the largest amount of stored data but the structure that lets the next action continue at the lowest possible cost.
Putting everything in does not make good memory
The important point is that more memory is not automatically better memory. Unfiltered memory creates another cost. If every diary entry, every conversation, and every log is fed in at once, the model spends more tokens separating what matters from what does not. You give the system memory, and then pay again for the cost of searching through it.[14]
This point also appears in agent research. Generative Agents are not built on the idea of “store everything and search later.” They record experience, then use reflection to compress it into higher-level meaning and patterns. Only when needed do they retrieve the relevant memory and turn it into an action plan.[15] MemoryBank goes one step further by designing memory to fade over time like human recall, while allowing important items to persist longer.[16] The core of good memory is not total volume, but selection.
Reduced into a simple operating flow, it looks like this.
For a while I wanted to feed years of diaries and notes into a model to force deeper conversations. The outcome was simple. Costs rose, context got blurrier, and the local machine got heavier. That experience taught one lesson clearly: good memory is not a warehouse that stores everything, but a structure that lowers the cost of the next action.
Summaries and compression help, but they are not substitutes for memory
Session summaries and context compression are clearly useful. But their implementation differs across vendors, and the public surface remains partial. Some systems separate saved memory and chat history, while others start from project files and rule files.[17]
The biggest problem with summarization is distortion. Overall context, conversational flow, coherent themes, and useful details are all liable to be cut away in the compression step. Compression creates efficiency by accepting loss. Recent work on prompt compression argues that downstream task performance is not enough by itself; information preservation and reconstruction have to be measured more directly.[18]
Simplicity is not taste, it is cost reduction
That is why I ended up thinking of memory not as a storage-size problem, but as a design problem. What should be kept, how should it be summarized, what deserves promotion into long-term memory, and what should be discarded. At the center of all of that is simplicity.
Simplicity is not just a preference. It is a cost-reduction mechanism. It lowers comprehension cost for the user, token waste for the model, and computational load for the local machine. Good memory systems are not the ones that store the most complexity, but the ones that preserve only the complexity that can be reused later.[20]
Why I want to keep writing about memory on this blog
For individuals, teams, and AI companies alike, the question “what kind of memory should we build?” will only become more important. Models will get stronger, but continuity does not appear by itself. Continuity has to be designed.[21]
That is why I want to treat memory as a central topic on this blog. Document structures agents can read, logs that preserve working state, summaries that do not waste tokens, and external memory forms that both humans and models can reuse. My interest is less in displaying finished outputs than in leaving behind the reasons and change history that can be reused later.
The problem that comes before AGI is memory. Good memory is not a giant storage layer. It is a structure that leaves order between broken sessions and lowers the cost of the next action.
The urgent question is not how smart the model may become
A future-facing noun
The promise of stronger intelligence has not arrived yet, so it cannot solve today’s operating problem for us.
Longer still ends
Even if the context window expands, plans, failure history, and working state scatter again the moment the session closes.
Designed continuity
Good memory is not a larger warehouse. It is a structure that lowers the cost of the next action.
Notes
Charles Packer et al., “MemGPT: Towards LLMs as Operating Systems,” arXiv, 2023. The paper frames the limit of LLMs not only as a reasoning issue, but as a lack of memory hierarchy under a constrained context window. arXiv ↩
MemGPT’s core analogy is virtual memory in operating systems. Instead of stuffing everything into a fixed context window, the memory hierarchy itself has to be designed. arXiv ↩
Andy Clark and David Chalmers, “The Extended Mind,” Analysis 58(1), 1998. The paper argues that cognition is not confined to the skull and that external tools and records can become part of the cognitive process. OUP Academic ↩
Atul Gawande, The Checklist Manifesto: How to Get Things Right, Metropolitan Books, 2009. Gawande explains checklists not as a fix for incompetence but as a response to overwhelming complexity. Macmillan ↩
OpenAI, “Memory and new controls for ChatGPT.” OpenAI separates saved memory and chat history into distinct controls. OpenAI ↩
Anthropic, “Manage Claude’s memory.” Claude Code reads and writes memory files during sessions and treats them as project-level memory. Anthropic Docs ↩
OpenAI Help, “ChatGPT Pulse.” OpenAI Help Center ↩
OpenAI Help, “Memory FAQ.” The FAQ explains that Pulse uses both saved memories and chat history. OpenAI Help Center ↩
Here “higher-level experience” means the user-facing layer of briefings, recommendations, and summaries. The official documentation describes Pulse as a feature that uses memory, not as a replacement memory engine. OpenAI Help Center ↩
Wenbo Sun et al., “Organizing memories for generalization in complementary learning systems,” Nature Neuroscience, 2023. The study discusses complementary learning systems and how some memories are reorganized for generalization. Nature ↩
Ray Kurzweil, How to Create a Mind: The Secret of Human Thought Revealed, Viking, 2012. ↩
This section borrows Kurzweil’s philosophical and engineering metaphor rather than adopting a single neuroscience theory as settled fact. ↩
Merlin Sheldrake, Entangled Life: How Fungi Make Our Worlds, Change Our Minds and Shape Our Futures, 2020. Penguin ↩
MemGPT addresses this through layered memory rather than “put everything in.” The key idea is placing the right information in the right layer. arXiv ↩
Joon Sung Park et al., “Generative Agents: Interactive Simulacra of Human Behavior,” arXiv, 2023. The paper proposes a system where observation, planning, reflection, and retrieval work together. arXiv ↩
Wanjun Zhong et al., “MemoryBank: Enhancing Large Language Models with Long-Term Memory,” arXiv 2023; AAAI 2024. The system updates memory using importance and time-sensitive forgetting. arXiv ↩
OpenAI treats saved memories and chat history as separate settings, while Anthropic documents project memory files and rule files as official structures. OpenAI / Anthropic Docs ↩
W. Łajewska et al., “Understanding and Improving Information Preservation in Prompt Compression,” 2025. The paper argues that prompt compression should be evaluated not only by downstream performance but by preservation and reconstructability of information. ACL Anthology ↩
This paragraph synthesizes a shared design problem across memory systems rather than attributing the point to a single product. The common signal across papers and docs is that selection and information-loss control matter. ↩
MemGPT, Generative Agents, and MemoryBank all emphasize retrieval, reflection, selection, and tiering rather than total accumulation. MemGPT, Generative Agents, MemoryBank ↩
OpenAI and Anthropic both treat memory as a separate user-controlled feature or project layer. That alone shows that continuity does not arise automatically. OpenAI / Anthropic Docs ↩
FAQ
Would a longer context window solve this on its own?
A longer context window increases how much a model can hold in one session, but it does not solve what should be preserved as long-term memory once the session ends.
If summaries exist, do we still need memory?
Summaries are useful compression devices, but without a rule for what to preserve and what to discard they cannot recover lost context. They are tools for handling memory, not memory itself.
What is the first external memory layer worth building for individual work?
Start with the judgments you do not want to explain again: project rules, working-state logs, and failure notes that lower the explanation cost of the next session.
What to read next
Recommended from the current topic, reading completion, and prior engagement signals.
TEREO, a proof layer that shows the truth of change
When AI changes code, TEREO checks whether the change is really better and keeps only the gain that beats the noise.
A vector database is a memory device that searches by semantic distance
A vector database is a memory device born to search the world not by exact values but by semantic distance.