Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
PAPER
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
Read paper on arXiv →Title: Memanto and the memory trade-offs you actually care about in production agents
I've been thinking about memory for long-running agents for a long time, and the Memanto paper caught my eye because it makes a set of bold operational claims. The authors propose a single "universal memory layer" with a typed semantic schema, automated conflict resolution, temporal versioning, and what they call an information theoretic search engine (Moorcheh) that requires no indexing. They report state of the art results on LongMemEval and LoCoMo, a single-query retrieval path with no ingestion cost, and sub 90 millisecond deterministic retrieval. Those are the sort of claims that matter when you are shipping systems, so I read the paper with production questions in mind.
Technical summary
Memanto rests on three pillars. First, a strongly typed semantic memory. The system stores memories classified into 13 predefined categories, such as facts, preferences, goals, plans, events, and so on. Second, a set of operational features: automated conflict resolution when memories disagree, and temporal versioning so the system can reason about how facts change over time. Third, a retrieval engine the paper calls Moorcheh's Information Theoretic Search. That engine is described as a "no indexing semantic database" that returns deterministic top results within sub 90 milliseconds and requires no separate ingestion step.
They evaluate on LongMemEval and LoCoMo and report accuracy numbers around 89.8 percent and 87.1 percent, which they say exceed hybrid graph and vector baselines. There is also a five-stage ablation study attributing performance gains to the typed schema, conflict resolution, versioning, and the retrieval method.
What I find interesting
I like that Memanto focuses on concrete operational properties. Persistence, versioning, deterministic retrieval, and fast latencies are the things that break in production. The typed schema is a pragmatic move. In many agent workloads a modest, well-chosen set of categories lets you write predictable update and retrieval rules. Temporal versioning is also a must if your agent needs to reason about changing state or offer audit trails. The ablation study is useful; it is good to see the paper try to quantify how much each piece contributes.
The claims about ingestion-free writes and single-step deterministic retrieval, if true at nontrivial scale, would be a meaningful simplification. Indexing pipelines and complex graph maintenance are operational headaches. Fewer moving parts reduce brittleness and operational cost, and that matters.
What I am skeptical about
There are several places where the paper glosses over hard engineering questions. The phrase "no indexing semantic database" raises immediate scale and cost questions. If retrieval avoids precomputed indices, that often means compute at query time that grows with memory size. The paper gives sub 90 millisecond numbers, but does not provide enough detail about memory size, hardware, concurrency, or tail latency. Sub 90 milliseconds on a small dataset or a beefy GPU is not the same as sub 90 milliseconds for millions of sessions on commodity CPU nodes.
The 13-category typed schema is a double-edged sword. Fixed types give structure and predictable behavior, but they also force modeling trade-offs. Real user data is messy. Categories will leak, edge cases will appear, and schema drift will happen. The paper mentions conflict resolution but does not fully specify the policies or provide robust failure modes. In practice you want explicit audit trails, human-in-the-loop correction, and clear rules for when automated resolution is insufficient. Temporal versioning helps here, but versioning by itself is not an operational contract.
The information theoretic retrieval idea is intriguing, but the paper needs to be clearer about what that actually costs. Does it compute exact information measures relative to every memory entry? Is there a compression-based index under the hood? How is determinism achieved in a probabilistic system that ultimately relies on embeddings and model outputs? Determinism in retrieval is useful for debugging, but it can also lock you into brittle retrieval behavior if not paired with monitoring and fallbacks.
Evaluation concerns
Beating hybrid graph and vector baselines on LongMemEval and LoCoMo is a solid start. But I want to see more downstream task evaluations: how do these memory decisions affect agent planning, hallucination rates, user satisfaction, or safety-critical behaviors? Accuracy numbers are necessary but not sufficient. Also, I want to see the sensitivity of results to prompt design, label noise in memories, adversarial updates, and memory growth. The ablation study is a good step, but I want to see reproducible code and hardware specs before I believe the latency and cost claims.
Practical implications for building agents
If you are building long-horizon agents, Memanto highlights a useful principle: practical memory systems benefit from strong operational constraints. Typing memory and versioning items are low-regret moves. They make audits, rollbacks, and conflict handling tractable. If Memanto's retrieval method truly reduces the need for heavy indexing without sacrificing latency at scale, that would simplify infrastructure.
But do not take the paper's operational claims at face value. When you evaluate Memanto or a memanto-like approach in production, test these things explicitly: ingestion throughput under realistic loads, memory growth and its effect on latency and cost, tail latency under concurrent queries, conflict resolution edge cases, and retrieval fidelity for downstream decision making. Make provenance and explainability first class. Automated resolution is helpful until it makes a bad decision silently.
Where I would take this next
I would like to see an open implementation and benchmarks with clear scaling curves: latency as a function of memory size, hardware, and concurrency. I would also like to see the conflict resolution rules exposed, including failure modes and human recovery workflows. Finally, test the memory layer end-to-end in agent workflows that matter to production systems: customer support agents, multi-session assistants, and safety-critical monitoring agents.
Memanto is a useful step toward memory systems that care about operations as much as raw retrieval accuracy. The paper makes some strong claims that could simplify agent infrastructure if they hold up at scale. My advice to teams is to borrow the disciplined elements from Memanto—typed memory, versioning, provenance—but validate the retrieval and cost claims under your own workload before ripping out indices or graphs. Memory is easy to get wrong in the long tail. Design for observability and recovery first, and then optimize for simplicity.