Open-source · ArXiv-backed · Production-ready

Your agent forgets
everything. Fix the memory,
not the model.

AgeMem is a self-managing memory layer for LLM agents. Runs locally on an 8GB GPU. Needs zero external infrastructure — no Pinecone, no Redis, no cloud. Just SQLite.

Quick Start View Source
terminal
# Install and run — it's one command
$ uv pip install -e ".[ingest]" && agemem

✓ STM loaded (0/9000 tokens)
✓ LTM store: 0 entries (sqlite-vec ready)
✓ Model: Qwen3.5-9B via Ollama (36 tok/s)
✓ 17 tools loaded · 3-layer memory active

AgeMem ready. Start chatting.

Every agent hits the memory wall

You can buy a 1M-token context window. You can throw a 70B model at it. Your agent still forgets — because the memory wall is a systems problem, not a model problem.

Context overflow = silent data loss

When the context fills up, facts get evicted silently. Your agent forgets a user preference from 3 turns ago — and you don't even notice until it hallucinates.

Cost: $38k/year per agent in wasted tokens on unmanaged context

Cloud memory = vendor lock-in + data exposure

Mem0, Zep, Pinecone — they all route your users' private conversations through their cloud. That's a compliance nightmare and a dependency trap.

Risk: GDPR non-compliance for any EU deployment with cloud-routed memory

RAG bloat ≠ intelligence

Dumping 10,000 chunks into a vector DB and hoping similarity search finds the right one isn't memory — it's brute force with extra steps. More data ≠ better recall.

Waste: 10x more context needed for equivalent accuracy with raw RAG

Three layers. Zero LLM calls for the rules.

AgeMem uses a hybrid control architecture: deterministic system rules (free), an LLM-driven memory agent (targeted), and continuous self-assessment. Memory curates itself.

LAYER 1 — SYSTEM RULES

Deterministic guardrails

Five pure-function rules (R1–R5) that fire without any LLM calls. STM overflow → auto-summarize. Utilization ≥ 90% → force-filter. High learning score → immediate LTM promotion.

Cost: $0.00 per trigger · Latency: ~0ms
LAYER 2 — MEMORY AGENT

LLM-driven qualitative decisions

A focused LLM call every N turns that decides what merits long-term storage, scores context relevance, and generates summaries. Only runs when needed — not on every turn.

Cost: 1 LLM call every N turns · Configurable cadence
LAYER 3 — LEARNING FEEDBACK

Self-assessed memory quality

After each turn, the agent rates how much it "learned" (0–1). Scores above 0.8 trigger immediate LTM promotion. Retrieval hit-rates recalibrate future scoring.

Output: Autonomous LTM curation · No manual tuning

"500 perfectly curated memories on a 9B model will consistently outperform 10,000 uncurated RAG chunks on a 70B model."

Proven on an 8GB RTX 4060 at 36 tokens/second.
Not on a leased datacenter cluster — on a laptop.

arXiv:2601.01885 Qwen3.5-9B · llama.cpp sqlite-vec

How AgeMem differs from the alternatives

Not better at everything. Better at local-first, self-managing, zero-infra memory. Pick the right tool for your stack.

AgeMem Mem0 Letta (MemGPT) Zep / Graphiti
Runs 100% local ✓ Everything Cloud-first ~ Self-hostable Needs Neo4j
Zero infra dependencies ✓ SQLite only Qdrant/PgVector PostgreSQL Neo4j + Postgres
Self-managing memory ✓ 3-layer hybrid ~ CRUD layer ~ Agent-managed ~ Temporal graph
Works with any OpenAI-compat endpoint ✓ Any LLM ~ Letta runtime
Multi-tenant isolation ✓ Built-in ~ Limited
Optimized for ≤8GB GPU ✓ 9B models Cloud compute Heavy runtime Server-side
Open source ✓ MIT Apache 2.0 ~ Source-available
Deterministic memory rules ✓ Zero-cost R1-R5

Four profiles. One common problem.

They all hit the memory wall. They all need it solved without cloud dependencies, vendor lock-in, or PhD-level infrastructure.

🛠️

AI Agent Builders

Building autonomous agents that need persistent memory across sessions

You've hit the context wall. RAG gives you quantity, not quality. AgeMem gives your agent the ability to decide what to remember, what to summarize, and what to forget — autonomously.

🏠

Self-Hosters & Local-First Devs

Running Ollama/llama.cpp · No cloud · Full data sovereignty

You chose local LLMs for a reason. Don't send your agent's memory to a cloud API now. AgeMem runs on a single file (SQLite), needs zero external services, and keeps everything on your machine.

🚀

Indie AI SaaS Teams

Multi-tenant AI products · Budget-constrained · Need memory per customer

Built-in tenant + org isolation. No Pinecone bill, no vector DB ops. One process handles 50 concurrent tenants via LRU registry. Ship memory features without shipping infrastructure.

🔒

Enterprise AI Teams

GDPR/SOC2 compliance · On-premise · $38k+/year saved per agent

Zero external data transfer. Full audit trail. Context optimization that cuts token costs by 70%. Deploy behind your firewall with a FastAPI REST API and 24 test suites guarding every release.

Not marketing numbers. Measured performance.

36
tokens / second
On an 8GB RTX 4060
with Qwen3.5-9B
+8.7pp
over baselines
vs Mem0, LangMem, A-Mem
on HotpotQA (arXiv paper)
24
test suites
Critical regression coverage
Zero-LLM offline tests
$0
external infra cost
SQLite-vec for vectors
No Pinecone, no Redis

Start building agents that remember.

One install. Any OpenAI-compatible model. Zero infrastructure. Open-source, community-driven, arXiv-backed.

Star on GitHub Read the Paper

uv pip install -e ".[ingest]" && agemem