Open-source · ArXiv-backed · Production-ready

Your agent forgets
everything. Fix the memory,
not the model.

AgeMem is a self-managing memory layer for LLM agents. Runs locally on an 8GB GPU. Needs zero external infrastructure — no Pinecone, no Redis, no cloud. Just SQLite.

Quick Start View Source

terminal

# Install and run — it's one command
$ uv pip install -e ".[ingest]" && agemem

✓ STM loaded (0/9000 tokens)
✓ LTM store: 0 entries (sqlite-vec ready)
✓ Model: Qwen3.5-9B via Ollama (36 tok/s)
✓ 17 tools loaded · 3-layer memory active

AgeMem ready. Start chatting.

The Problem

Every agent hits the memory wall

You can buy a 1M-token context window. You can throw a 70B model at it. Your agent still forgets — because the memory wall is a systems problem, not a model problem.

Context overflow = silent data loss

When the context fills up, facts get evicted silently. Your agent forgets a user preference from 3 turns ago — and you don't even notice until it hallucinates.

Cost: $38k/year per agent in wasted tokens on unmanaged context

Cloud memory = vendor lock-in + data exposure

Mem0, Zep, Pinecone — they all route your users' private conversations through their cloud. That's a compliance nightmare and a dependency trap.

Risk: GDPR non-compliance for any EU deployment with cloud-routed memory

RAG bloat ≠ intelligence

Dumping 10,000 chunks into a vector DB and hoping similarity search finds the right one isn't memory — it's brute force with extra steps. More data ≠ better recall.

Waste: 10x more context needed for equivalent accuracy with raw RAG

Architecture

Three layers. Zero LLM calls for the rules.

AgeMem uses a hybrid control architecture: deterministic system rules (free), an LLM-driven memory agent (targeted), and continuous self-assessment. Memory curates itself.

LAYER 1 — SYSTEM RULES
Deterministic guardrailsFive pure-function rules (R1–R5) that fire without any LLM calls. STM overflow → auto-summarize. Utilization ≥ 90% → force-filter. High learning score → immediate LTM promotion.
Cost: $0.00 per trigger · Latency: ~0ms
LAYER 2 — MEMORY AGENT
LLM-driven qualitative decisionsA focused LLM call every N turns that decides what merits long-term storage, scores context relevance, and generates summaries. Only runs when needed — not on every turn.
Cost: 1 LLM call every N turns · Configurable cadence
LAYER 3 — LEARNING FEEDBACK
Self-assessed memory qualityAfter each turn, the agent rates how much it "learned" (0–1). Scores above 0.8 trigger immediate LTM promotion. Retrieval hit-rates recalibrate future scoring.
Output: Autonomous LTM curation · No manual tuning

"500 perfectly curated memories on a 9B model will consistently outperform 10,000 uncurated RAG chunks on a 70B model."

Proven on an 8GB RTX 4060 at 36 tokens/second.
Not on a leased datacenter cluster — on a laptop.

arXiv:2601.01885 Qwen3.5-9B · llama.cpp sqlite-vec

Comparison

How AgeMem differs from the alternatives

Not better at everything. Better at local-first, self-managing, zero-infra memory. Pick the right tool for your stack.

	AgeMem	Mem0	Letta (MemGPT)	Zep / Graphiti
Runs 100% local	✓ Everything	✗ Cloud-first	~ Self-hostable	✗ Needs Neo4j
Zero infra dependencies	✓ SQLite only	✗ Qdrant/PgVector	✗ PostgreSQL	✗ Neo4j + Postgres
Self-managing memory	✓ 3-layer hybrid	~ CRUD layer	~ Agent-managed	~ Temporal graph
Works with any OpenAI-compat endpoint	✓ Any LLM	✓	~ Letta runtime	✓
Multi-tenant isolation	✓ Built-in	✓	~ Limited	✓
Optimized for ≤8GB GPU	✓ 9B models	✗ Cloud compute	✗ Heavy runtime	✗ Server-side
Open source	✓ MIT	✓ Apache 2.0	✓	~ Source-available
Deterministic memory rules	✓ Zero-cost R1-R5	✗	✗	✗

Built For

Four profiles. One common problem.

They all hit the memory wall. They all need it solved without cloud dependencies, vendor lock-in, or PhD-level infrastructure.

🛠️

AI Agent Builders

Building autonomous agents that need persistent memory across sessions

You've hit the context wall. RAG gives you quantity, not quality. AgeMem gives your agent the ability to decide what to remember, what to summarize, and what to forget — autonomously.

🏠

Self-Hosters & Local-First Devs

Running Ollama/llama.cpp · No cloud · Full data sovereignty

You chose local LLMs for a reason. Don't send your agent's memory to a cloud API now. AgeMem runs on a single file (SQLite), needs zero external services, and keeps everything on your machine.

🚀

Indie AI SaaS Teams

Multi-tenant AI products · Budget-constrained · Need memory per customer

Built-in tenant + org isolation. No Pinecone bill, no vector DB ops. One process handles 50 concurrent tenants via LRU registry. Ship memory features without shipping infrastructure.

🔒

Enterprise AI Teams

GDPR/SOC2 compliance · On-premise · $38k+/year saved per agent

Zero external data transfer. Full audit trail. Context optimization that cuts token costs by 70%. Deploy behind your firewall with a FastAPI REST API and 24 test suites guarding every release.

Benchmarks

Not marketing numbers. Measured performance.

tokens / second

On an 8GB RTX 4060
with Qwen3.5-9B

+8.7pp

over baselines

vs Mem0, LangMem, A-Mem
on HotpotQA (arXiv paper)

test suites

Critical regression coverage
Zero-LLM offline tests

external infra cost

SQLite-vec for vectors
No Pinecone, no Redis

Your agent forgets everything. Fix the memory, not the model.