Building Lifelong Memory for AI Agents with @sheriax/simplemem

INTRODUCTION

AI agents are getting smarter — but they still have the memory of a goldfish. Most LLM agents process each conversation from scratch, losing valuable context, decisions, and learnings from previous sessions. The result? Repetitive interactions, wasted tokens, and agents that never truly learn.

That's the problem SimpleMem solves. It's an efficient lifelong memory framework for LLM agents, built on semantic lossless compression. And I've published a TypeScript implementation on npm as @sheriax/simplemem — bringing this powerful memory system to the JavaScript/Node.js ecosystem.

In this post, I'll break down what SimpleMem does, how it works under the hood, and how you can integrate it into your own AI-powered applications.

The Problem: LLM Agents Have No Long-Term Memory

Large Language Models are incredible at reasoning within a single conversation. But the moment that conversation ends, everything is lost. Existing approaches to "memory" for LLM agents fall into two camps — and both have serious problems:

Full-context stuffing — Dump the entire conversation history into the prompt. This works for short histories, but quickly becomes unusable as token costs explode and context windows overflow.
Passive accumulation — Store every interaction and retrieve relevant pieces later. This leads to massive redundancy, irrelevant retrievals, and slow lookup times.

What's needed is a system that:

Compresses interactions into compact, meaningful memory units
Synthesizes related information to eliminate redundancy
Retrieves precisely what's needed based on query intent

That's exactly what SimpleMem does.

How SimpleMem Works: The Three-Stage Pipeline

SimpleMem uses a three-stage pipeline based on semantic lossless compression. Each stage is designed to maximize information density while minimizing token usage.

SimpleMem's three-stage pipeline: Semantic Compression, Online Synthesis, and Intent-Aware Retrieval

Stage 1: Semantic Structured Compression

Raw dialogue is messy — full of filler, ambiguity, and relative references. SimpleMem's first stage filters this noise and transforms useful information into compact, self-contained memory units.

Before:

"He'll meet Bob tomorrow at 2pm"
// ❌ Who is "he"? When is "tomorrow"? Where?

After:

"Alice will meet Bob at Starbucks on 2025-11-16T14:00:00"
// ✅ Absolute, atomic, self-contained

Each memory unit is then indexed through three complementary representations — semantic, lexical, and symbolic — enabling flexible, multi-view retrieval later.

Stage 2: Online Semantic Synthesis

Unlike traditional systems that run background maintenance jobs, SimpleMem synthesizes related memory units on-the-fly during the write phase. This means redundant or overlapping information is merged immediately, keeping the memory store compact.

Example:

Fragment 1: "User wants coffee"
Fragment 2: "User prefers oat milk"
Fragment 3: "User likes it hot"

→ Consolidated: "User prefers hot coffee with oat milk"

This proactive synthesis prevents memory bloat and ensures the topology stays clean without any external maintenance processes.

Stage 3: Intent-Aware Retrieval Planning

This is where SimpleMem really shines. Instead of fixed-depth retrieval (which either misses information or returns too much), SimpleMem leverages the LLM's reasoning capabilities to generate a retrieval plan based on query intent.

For simple queries (e.g., "What's Alice's phone number?"), it does a direct fact lookup with minimal retrieval depth. For complex queries (e.g., "Summarize all meetings Alice had this month"), it expands retrieval depth and aggregates across multiple memory units.

The system executes parallel multi-view retrieval across semantic, lexical, and symbolic indexes, then deduplicates results by ID — giving you comprehensive coverage without redundant tokens.

Performance: The Numbers Speak

SimpleMem doesn't just have an elegant architecture — it delivers real results.

SimpleMem performance comparison: highest F1 score with lowest token cost

On the LoCoMo benchmark (using GPT-4.1-mini):

Metric	SimpleMem	Mem0	LightMem	A-Mem
F1 Score	43.24%	34.2%	24.6%	~33%
Retrieval Speed	388.3s	797.4s	577.2s	—
Total Processing	480.9s	—	—	6,000s+
Token Usage	~550	~16,500	~855	~16,500

Key takeaways:

26.4% higher F1 score than Mem0
51.3% faster retrieval than Mem0
30× fewer tokens than full-context methods
12.5× faster end-to-end than A-Mem

SimpleMem achieves the ideal position: highest performance with lowest token cost.

Cross-Session Memory

One of SimpleMem's most powerful features is cross-session memory — the ability to maintain context, decisions, and learnings across multiple conversations. This outperforms Claude-Mem by 64% on the LoCoMo benchmark.

typescript

import { SimpleMem } from '@sheriax/simplemem';

const mem = new SimpleMem({
  provider: 'openai',
  model: 'gpt-4.1-mini',
  embeddingModel: 'text-embedding-3-small',
});

// Store a memory
await mem.store({
  content: 'User prefers TypeScript over JavaScript for all new projects',
  sessionId: 'session-001',
  userId: 'user-123',
});

// Retrieve relevant memories in a future session
const memories = await mem.retrieve({
  query: 'What programming language does the user prefer?',
  userId: 'user-123',
  topK: 5,
});

console.log(memories);
// → [{ content: "User prefers TypeScript over JavaScript...", score: 0.94 }]

Your agents can now recall context from previous sessions automatically — no manual state management required.

Integration: MCP Server

SimpleMem also ships with an MCP (Model Context Protocol) server, making it compatible with AI platforms like Claude Desktop, Cursor, LM Studio, and Cherry Studio. This means you can add long-term memory to your coding AI or chat assistant without writing any integration code.

The cloud-hosted MCP server at mcp.simplemem.cloud handles all the heavy lifting — just configure your token and start using memory-aware AI interactions.

Why I Built @sheriax/simplemem

As someone who builds AI-powered products (like Kizu and Drawink), I kept running into the same problem: my agents couldn't remember anything between sessions. Every conversation started from zero.

I found the SimpleMem research paper and was impressed by the approach — semantic compression, online synthesis, and intent-aware retrieval is exactly the right way to solve long-term memory. But there wasn't a production-ready TypeScript implementation I could drop into my Node.js backends.

So I built one. @sheriax/simplemem brings the SimpleMem framework to the npm ecosystem, with full TypeScript types, vector search support, and embeddings integration out of the box. It's designed to be a drop-in addition to any Node.js or TypeScript AI application.

Getting Started

Install the package:

bash

npm install @sheriax/simplemem

Check out the full documentation and source code:

GitHub: github.com/youhanasheriff/SimpleMemJs
npm: npmjs.com/package/@sheriax/simplemem
SimpleMem Paper: arxiv.org/abs/2601.02553
Original Repo: github.com/aiming-lab/SimpleMem

CONCLUSION

Long-term memory is the missing piece in most LLM agent architectures. Without it, agents are stuck relearning the same information every session, wasting tokens, and providing inconsistent experiences.

SimpleMem solves this with an elegant three-stage pipeline — compress, synthesize, retrieve — that achieves the highest F1 score with 30× fewer tokens than full-context methods. And with @sheriax/simplemem, you can bring this capability to any TypeScript or Node.js project.

If you're building AI agents, chatbots, or any LLM-powered application that needs to remember, give SimpleMem a try. Your agents — and your users — will thank you.

bash

npm install @sheriax/simplemem