Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

LightMem: Lightweight and Efficient Memory-Augmented Generation

arXiv GitHub Stars License: MIT Last Commit PRs Welcome

⭐ If you like our project, please give us a star on GitHub for the latest updates!

LightMem is a lightweight and efficient memory management framework designed for Large Language Models and AI Agents. It provides a simple yet powerful memory storage, retrieval, and update mechanism to help you quickly build intelligent applications with long-term memory capabilities.

  • 🚀 Lightweight & Efficient
    Minimalist design with minimal resource consumption and fast response times

  • 🎯 Easy to Use
    Simple API design - integrate into your application with just a few lines of code

  • 🔌 Flexible & Extensible
    Modular architecture supporting custom storage engines and retrieval strategies

  • 🌐 Broad Compatibility
    Support for cloud APIs (OpenAI, DeepSeek) and local models (Ollama, vLLM, etc.)

📢 News

  • [2026-04-24]: 🚀 LightMem now supports the latest DeepSeek models, including deepseek-v4-flash and deepseek-v4-pro, with reasoning_effort and thinking-mode configuration!
  • [2026-04-24]: 🎉🎉🎉 StructMem: Structured Memory for Long-Horizon Behavior in LLMs has been accepted by ACL 2026!
  • [2026-03-21]: 🚀 We provide a more comprehensive baseline evaluation framework, supporting the benchmarking of memory layers such as Mem0, A-MEM, EverMemOS, LangMem on multiple datasets like LoCoMo and LongMemEval.
  • [2026-02-15]: 🚀 StructMem is released: A hierarchical memory framework that preserves event-level memory bindings and cross-event memory connections.
  • [2026-01-26]: 🎉🎉🎉 LightMem: Lightweight and Efficient Memory-Augmented Generation has been accepted by ICLR 2026!
  • [2026-01-17]: 🚀 We provide a comprehensive baseline evaluation framework, supporting the benchmarking of memory layers such as Mem0, A-MEM, and LangMem on multiple datasets like LoCoMo and LongMemEval.
  • [2025-12-09]: 🎬 Released a Demo Video showcasing long-context handling, along with comprehensive Tutorial Notebooks for various scenarios!
  • [2025-11-30]: 🚌 LightMem now supports calling multiple tools provided by its MCP Server.
  • [2025-11-26]: 🚀 Added full LoCoMo dataset support, delivering strong results with leading performance and efficiency! Here is the reproduction script!
  • [2025-11-09]: ✨ LightMem now supports local deployment via Ollama, vLLM, and Transformers auto-loading!
  • [2025-10-12]: 🎉 LightMem project is officially Open-Sourced!

🧪 Reproduction Scripts for LoCoMo & LongMemEval

We provide lightweight, ready-to-run scripts for reproducing results on LoCoMo, LongMemEval, and their combined baselines.

DatasetDescriptionScriptResult
LongMemEvalRun LightMem on LongMemEval, including evaluation and offline memory update.run_lightmem_longmemeval.mdLongMemEval Results
LoCoMoScripts for reproducing LightMem results on LoCoMo.run_lightmem_locomo.mdLoCoMo Results
LongMemEval & LoCoMoUnified baseline scripts for running both datasets.run_baselines.mdBaseline Results

🧪 Baseline Evaluation

We provide a comprehensive baseline evaluation framework, supporting the benchmarking of memory layers such as Mem0, A-MEM, and LangMem on multiple datasets like LoCoMo and LongMemEval.

🎥 Demo & Tutorials

Watch Demo: YouTube | Bilibili

📚 Hands-on Tutorials

We provide ready-to-use Jupyter notebooks corresponding to the demo and other use cases. You can find them in the tutorial-notebooks directory.

ScenarioDescriptionNotebook Link
Travel PlanningA complete guide to building a travel agent with memory.LightMem_Example_travel.ipynb
Code AssistantA complete guide to building a code agent with memory.LightMem_Example_code.ipynb
LongMemEvalA tutorial on how to run evaluations on LongMemEval benchmarks using LightMem.LightMem_Example_longmemeval.ipynb

☑️ Todo List

LightMem is continuously evolving! Here's what's coming:

  • Offline Pre-computation of KV Cache for Update (Lossless)
  • Online Pre-computation of KV Cache Before Q&A (Lossy)
  • Integration More Models and Feature Enhancement
  • Coordinated Use of Context and Long-Term Memory Storage
  • Multi Modal Memory

📑 Table of Contents

🔧 Installation

Installation Steps

Option 1: Install from Source

# Clone the repository git clone https://github.com/zjunlp/LightMem.git cd LightMem # Create virtual environment conda create -n lightmem python=3.11 -y conda activate lightmem # Install dependencies unset ALL_PROXY pip install -e .

Option 2: Install via pip

pip install lightmem # Coming soon

⚡ Quick Start

  1. Modify the JUDGE_MODEL, LLM_MODEL, and their respective API_KEY and BASE_URL in API Configuration.

  2. Download LLMLINGUA_MODEL from microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank and EMBEDDING_MODEL from sentence-transformers/all-MiniLM-L6-v2 and modify their paths in Model Paths.

  3. Download the dataset from longmemeval-cleaned, and modidy the path in Data Configuration.

cd experiments python run_lightmem_qwen.py

🏗️ Architecture

🗺️ Core Modules Overview

LightMem adopts a modular design, breaking down the memory management process into several pluggable components. The core directory structure exposed to users is outlined below, allowing for easy customization and extension:

LightMem/ ├── src/lightmem/ # Main package │ ├── __init__.py # Package initialization │ ├── configs/ # Configuration files │ ├── factory/ # Factory methods │ ├── memory/ # Core memory management │ └── memory_toolkits/ # Memory toolkits ├── mcp/ # LightMem MCP server ├── experiments/ # Experiment scripts ├── datasets/ # Datasets files └── examples/ # Examples

🧩 Supported Backends per Module

The following table lists the backends values currently recognized by each configuration module. Use the model_name field (or the corresponding config object) to select one of these backends.

Module (config)Supported backends
PreCompressorConfigllmlingua-2, entropy_compress
TopicSegmenterConfigllmlingua-2
MemoryManagerConfigopenai, deepseek, ollama, vllm, etc.
TextEmbedderConfighuggingface
MMEmbedderConfighuggingface
RetrieverConfigqdrant, FAISS, BM25

💡 Examples

Initialize LightMem

import os from datetime import datetime from lightmem.memory.lightmem import LightMemory LOGS_ROOT = "./logs" RUN_TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S") RUN_LOG_DIR = os.path.join(LOGS_ROOT, RUN_TIMESTAMP) os.makedirs(RUN_LOG_DIR, exist_ok=True) API_KEY='your_api_key' API_BASE_URL='your_api_base_url' LLM_MODEL='your_model_name' # such as 'gpt-4o-mini' (API) or 'gemma3:latest' (Local Ollama) ... EMBEDDING_MODEL_PATH='/your/path/to/models/all-MiniLM-L6-v2' LLMLINGUA_MODEL_PATH='/your/path/to/models/llmlingua-2-bert-base-multilingual-cased-meetingbank' config_dict = { "pre_compress": True, "pre_compressor": { "model_name": "llmlingua-2", "configs": { "llmlingua_config": { "model_name": LLMLINGUA_MODEL_PATH, "device_map": "cuda", "use_llmlingua2": True, }, } }, "topic_segment": True, "precomp_topic_shared": True, "topic_segmenter": { "model_name": "llmlingua-2", }, "messages_use": "user_only", "metadata_generate": True, "text_summary": True, "memory_manager": { "model_name": 'xxx', # such as 'openai' or 'ollama' ... "configs": { "model": LLM_MODEL, "api_key": API_KEY, "max_tokens": 16000, "xxx_base_url": API_BASE_URL # API model specific, such as 'openai_base_url' or 'deepseek_base_url' ... } }, "extract_threshold": 0.1, "index_strategy": "embedding", "text_embedder": { "model_name": "huggingface", "configs": { "model": EMBEDDING_MODEL_PATH, "embedding_dims": 384, "model_kwargs": {"device": "cuda"}, }, }, "retrieve_strategy": "embedding", "embedding_retriever": { "model_name": "qdrant", "configs": { "collection_name": "my_long_term_chat", "embedding_model_dims": 384, "path": "./my_long_term_chat", } }, "summary_retriever": { "model_name": "qdrant", "configs": { "collection_name": "my_chat_summaries", "embedding_model_dims": 384, "path": "./my_chat_summaries", } }, "update": "offline", "logging": { "level": "DEBUG", "file_enabled": True, "log_dir": RUN_LOG_DIR, } } lightmem = LightMemory.from_config(config_dict)

Add Memory

session = { "timestamp": "2025-01-10", "turns": [ [ {"role": "user", "content": "My favorite ice cream flavor is pistachio, and my dog's name is Rex."}, {"role": "assistant", "content": "Got it. Pistachio is a great choice."}], ] } for turn_messages in session["turns"]: timestamp = session["timestamp"] for msg in turn_messages: msg["time_stamp"] = timestamp store_result = lightmem.add_memory( messages=turn_messages, force_segment=True, force_extract=True )

Offline Update

lightmem.construct_update_queue_all_entries() lightmem.offline_update_all_entries(score_threshold=0.8)

Generate summaries

summary_result = lightmem.summarize()

Retrieve Memory

question = "What is the name of my dog?" related_memories = lightmem.retrieve(question, limit=5) print(related_memories)

MCP Server

LightMem also supports the Model Context Protocol (MCP) server:

# Running at Root Directory cd LightMem # Environment pip install '.[mcp]' # MCP Inspector [Optional] npx @modelcontextprotocol/inspector python mcp/server.py # Start API by HTTP (http://127.0.0.1:8000/mcp) fastmcp run mcp/server.py:mcp --transport http --port 8000

The MCP config json file of your local client may looks like:

{ "yourMcpServers": { "LightMem": { "url": "http://127.0.0.1:8000/mcp", "otherParameters": "..." } } }

📁 Experimental Results

For transparency and reproducibility, we have shared the results of our experiments on Google Drive. This includes model outputs, evaluation logs, and predictions used in our study.

🔗 Access the data here: Google Drive - Experimental Results

Please feel free to download, explore, and use these resources for research or reference purposes.

LOCOMO:

Overview

backbone: gpt-4o-mini, judge model: gpt-4o-mini & qwen2.5-32b-instruct

MethodACC(%) gpt-4o-miniACC(%) qwen2.5-32b-instructMemory-Con Tokens(k) TotalQA Tokens(k) totalTotal(k)CallsRuntime(s) total
FullText73.8373.1854,884.47954,884.4796,971
NaiveRAG63.6463.123,870.1873,870.1871,884
A-MEM64.1660.7111,494.34410,170.56721,664.90711,75467,084
MemoryOS(eval)58.2561.042,870.0367,649.34310,519.3795,53426,129
MemoryOS(pypi)54.8755.915,264.8016,126.11111,390.00410,16037,912
Mem036.4937.0124,304.8721,488.61825,793.49019,070120,175
Mem0(api)61.6961.6968,347.7204,169.90972,517.6296,02210,445
Mem0-g(api)60.3259.4869,684.8184,389.14774,073.9656,02210,926

backbone: qwen3-30b-a3b-instruct-2507, judge model: gpt-4o-mini & qwen2.5-32b-instruct

MethodACC(%) gpt-4o-miniACC(%) qwen2.5-32b-instructMemory-Con Tokens(k) TotalQA Tokens(k) totalTotal(k)CallsRuntime(s) total
FullText74.8774.3560,873.07660,873.07610,555
NaiveRAG66.9564.684,271.0524,271.0521,252
A-MEM56.1054.8116,267.99717,340.88133,608.87811,75469,339
MemoryOS(eval)61.0459.813,615.0879,703.16911,946.4424,14713,710
MemoryOS(pypi)51.3051.956,663.5277,764.99114,428.51810,04620,830
Mem043.3143.2517,994.0351,765.57019,759.60516,14546,500

Details

backbone: gpt-4o-mini, judge model: gpt-4o-mini & qwen2.5-32b-instruct

MethodSummary Tokens(k) InSummary Tokens(k) OutUpdate Tokens(k) InUpdate Tokens(k) OutQA Tokens(k) InQA Tokens(k) OutRuntime(s) mem-conRuntime(s) qa
FullText54,858.77025.7096,971
NaiveRAG3,851.02919.1581,884
A-MEM1,827.373492.8837,298.8781,875.21010,113.25257.31560,6076,477
MemoryOS(eval)1,109.849333.970780.807645.4107,638.53910.80424,2201,909
MemoryOS(pypi)1,007.729294.6013,037.509924.9626,116.2399.87233,3254,587
Mem08,127.398253.18712,722.0113,202.2761,478.8309.788118,2681,907
Mem0(api)\\\\4,156.85013.0594,3286,117
Mem0-g(api)\\\\4,375.90013.2475,3815,545

backbone: qwen3-30b-a3b-instruct-2507, judge model: gpt-4o-mini & qwen2.5-32b-instruct

MethodSummary Tokens(k) InSummary Tokens(k) OutUpdate Tokens(k) InUpdate Tokens(k) OutQA Tokens(k) InQA Tokens(k) OutRuntime(s) mem-conRuntime(s) qa
FullText60,838.69434.38210,555
NaiveRAG4,239.03032.0221,252
A-MEM1,582.942608.5079,241.9284,835.07017,528.87682.00555,43913,900
MemoryOS(eval)1,222.139531.1571,044.307817.4849,679.99623.17312,6971,012
MemoryOS(pypi)2,288.533516.0242,422.6931,436.2777,743.39121.60019,8221,007
Mem08,270.874186.3547,638.8271,897.9801,739.24626.32445,4071,093

Performance metrics

backbone: gpt-4o-mini, judge model: gpt-4o-mini

MethodOverall ↑MultiOpenSingleTemp
FullText73.8368.7956.2586.5650.16
NaiveRAG63.6455.3247.9270.9956.39
A-MEM64.1656.0331.2572.0660.44
MemoryOS(eval)58.2556.7445.8367.0640.19
MemoryOS(pypi)54.8752.1343.7563.9736.76
Mem036.4930.8534.3838.4137.07
Mem0(api)61.6956.3843.7566.4759.19
Mem0-g(api)60.3254.2639.5865.9957.01

backbone: gpt-4o-mini, judge model: qwen2.5-32b-instruct

MethodOverall ↑MultiOpenSingleTemp
FullText73.1868.0954.1786.2149.22
NaiveRAG63.1253.5550.0071.3453.89
A-MEM60.7153.5532.2969.0853.58
MemoryOS(eval)61.0464.1840.6270.1540.50
MemoryOS(pypi)55.9152.4841.6766.3535.83
Mem037.0131.9137.5038.5337.38
Mem0(api)61.6954.2646.8867.6657.01
Mem0-g(api)59.4855.3242.7165.0453.58

backbone: qwen3-30b-a3b-instruct-2507, judge model: gpt-4o-mini

MethodOverall ↑MultiOpenSingleTemp
FullText74.8769.8657.2987.4051.71
NaiveRAG66.9562.4157.2976.8147.98
A-MEM56.1057.4543.7567.9027.73
MemoryOS(eval)61.0462.7751.0472.2933.02
MemoryOS(pypi)51.3052.4840.6261.5926.48
Mem043.3142.9146.8846.3734.58
Mem0(api)61.6954.2646.8867.6657.01
Mem0-g(api)59.4855.3242.7165.0453.58

backbone: qwen3-30b-a3b-instruct-2507, judge model: qwen2.5-32b-instruct

MethodOverall ↑MultiOpenSingleTemp
FullText74.3568.0963.5486.3351.71
NaiveRAG64.6860.2852.0875.6243.61
A-MEM54.8156.7439.5867.4224.61
MemoryOS(eval)59.8163.1248.9670.5132.09
MemoryOS(pypi)51.9555.6739.5861.4727.41
Mem043.2545.0446.8845.7833.96
Mem0(api)61.6954.2646.8867.6657.01
Mem0-g(api)59.4855.3242.7165.0453.58

⚙️ Configuration

All behaviors of LightMem are controlled via the BaseMemoryConfigs configuration class. Users can customize aspects like pre-processing, memory extraction, retrieval strategy, and update mechanisms by providing a custom configuration.

Key Configuration Options (Usage)

OptionDefaultUsage (allowed values and behavior)
pre_compressFalseTrue / False. If True, input messages are pre-compressed using the pre_compressor configuration before being stored. This reduces storage and indexing cost but may remove fine-grained details. If False, messages are stored without pre-compression.
pre_compressorNonedict / object. Configuration for the pre-compression component (PreCompressorConfig) with fields like model_name (e.g., llmlingua-2, entropy_compress) and configs (model-specific parameters). Effective only when pre_compress=True.
topic_segmentFalseTrue / False. Enables topic-based segmentation of long conversations. When True, long conversations are split into topic segments and each segment can be indexed/stored independently (requires topic_segmenter). When False, messages are stored sequentially.
precomp_topic_sharedFalseTrue / False. If True, pre-compression and topic segmentation can share intermediate results to avoid redundant processing. May improve performance but requires careful configuration to avoid cross-topic leakage.
topic_segmenterNonedict / object. Configuration for topic segmentation (TopicSegmenterConfig), including model_name and configs (segment length, overlap, etc.). Used when topic_segment=True.
messages_use'user_only''user_only' / 'assistant_only' / 'hybrid'. Controls which messages are used to generate metadata and summaries: user_only uses user inputs, assistant_only uses assistant responses, hybrid uses both. Choosing hybrid increases processing but yields richer context.
metadata_generateTrueTrue / False. If True, metadata such as keywords and entities are extracted and stored to support attribute-based and filtered retrieval. If False, no metadata extraction occurs.
text_summaryTrueTrue / False. If True, a text summary is generated and stored alongside the original text (reduces retrieval cost and speeds review). If False, only the original text is stored. Summary quality depends on memory_manager.
memory_managerMemoryManagerConfig()dict / object. Controls the model used to generate summaries and metadata (MemoryManagerConfig), e.g., model_name (openai, ollama, etc.) and configs. Changing this affects summary style, length, and cost.
extract_threshold0.5float (0.0 - 1.0). Threshold used to decide whether content is important enough to be extracted as metadata or highlight. Higher values (e.g., 0.8) mean more conservative extraction; lower values (e.g., 0.2) extract more items (may increase noise).
index_strategyNone'embedding' / 'context' / 'hybrid' / None. Determines how memories are indexed: 'embedding' uses vector-based indexing (requires embedders/retriever) for semantic search; 'context' uses text-based/contextual retrieval (requires context_retriever) for keyword/document similarity; and 'hybrid' combines context filtering and vector reranking for robustness and higher accuracy.
text_embedderNonedict / object. Configuration for text embedding model (TextEmbedderConfig) with model_name (e.g., huggingface) and configs (batch size, device, embedding dim). Required when index_strategy or retrieve_strategy includes 'embedding'.
multimodal_embedderNonedict / object. Configuration for multimodal/image embedder (MMEmbedderConfig). Used for non-text modalities.
history_db_pathos.path.join(lightmem_dir, "history.db")str. Path to persist conversation history and lightweight state. Useful to restore state across restarts.
retrieve_strategy'embedding''embedding' / 'context' / 'hybrid'. Strategy used at query time to fetch relevant memories. Pick based on data and query type: semantic queries -> 'embedding'; keyword/structured queries -> 'context'; mixed -> 'hybrid'.
context_retrieverNonedict / object. Configuration for context-based retriever (ContextRetrieverConfig), e.g., model_name='BM25' and configs like top_k. Used when retrieve_strategy includes 'context'.
embedding_retrieverNonedict / object. Vector store configuration (EmbeddingRetrieverConfig), e.g., model_name='qdrant' and connection/index params. Used when retrieve_strategy includes 'embedding'.
summary_retrieverNonedict / object. Configuration for summary-specific vector store (EmbeddingRetrieverConfig). When configured, summaries are stored in a separate collection for hierarchical retrieval. Used in StructMem mode to store and retrieve session/topic summaries independently from detailed memories.
update'offline''online' / 'offline'. 'offline': batch or scheduled updates to save cost and aggregate changes — this is the fully supported mode with complete functionality. 'online': reserved for future development (currently a no-op placeholder; memory will not be persisted when this mode is set).
kv_cacheFalseTrue / False. If True, attempt to precompute and persist model KV caches to accelerate repeated LLM calls (requires support from the LLM runtime and may increase storage). Uses kv_cache_path to store cache.
kv_cache_pathos.path.join(lightmem_dir, "kv_cache.db")str. File path for KV cache storage when kv_cache=True.
graph_memFalseTrue / False. When True, some memories will be organized as a graph (nodes and relationships) to support complex relation queries and reasoning. Requires additional graph processing/storage.
extraction_mode'flat''flat' / 'event'. Memory extraction mode: 'flat' extracts factual entries as independent units suitable for general knowledge retention; 'event' extracts event-level structures with both factual and relational components, preserving temporal bindings and causal relationships. Use 'event' for narrative-heavy or time-sensitive scenarios.
version'v1.1'str. Configuration/API version. Only change if you know compatibility implications.
logging'None'dict / object. Configuration for logging enabled.

🏆 Contributors

JizhanFang
JizhanFang
Xinle-Deng
Xinle-Deng
Xubqpanda
Xubqpanda
HaomingX
HaomingX
453251
453251
James-TYQ
James-TYQ
evy568
evy568
Norah-Feathertail
Norah-Feathertail
TongjiCst
TongjiCst
We welcome contributions from the community! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

🔗 Related Projects

关于 About

[ICLR 2026] LightMem: Lightweight and Efficient Memory-Augmented Generation
agentai-agentsartificial-intelligencechatbotgenaiknowledgelarge-language-modelslightmemlightweightllmlong-term-memorymemorymemory-managementnatural-language-processingpersonalizationpythonrag

语言 Languages

Python64.7%
Jupyter Notebook35.2%
Shell0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
274
Total Commits
峰值: 49次/周
Less
More

核心贡献者 Contributors