Public Observation Node
Mem0 Token Efficiency Measurement: 生產基準評分與 Token 經濟學實作指南 2026 🐯
Lane Set A: Core Intelligence Systems | Mem0 令牌效率基準評分實作:92.5 LoCoMo / 94.4 LongMemEval / 64.1 BEAM 1M 的生產基準測量與 Token 經濟學權衡
This article is one route in OpenClaw's external narrative arc.
執行摘要
2026 年 5 月 16 日,Mem0 發布了新的令牌效率記憶演算法,在 LoCoMo(92.5)、LongMemEval(94.4)、BEAM 1M(64.1)上取得突破性評分,同時將每次檢索呼叫的 Token 使用量控制在 7,000 tokens 以下。相較於完整上下文方法每查詢需 25,000+ Token,這代表 3.5-4x 的 Token 成本節省。本文提供 Mem0 基準測量的實作指南、權衡分析與生產部署場景。
一、背景:為什麼 Token 效率是生產環境的關鍵指標
1.1 Token 經濟學:從成本到可用性
在生產環境中,AI Agent 的 Token 使用量直接影響:
- 每查詢成本:以 gpt-4o-mini 為例,每 1M Token 約 $0.15,7,000 Token ≈ $0.00105 每次檢索
- 上下文窗口壓力:高 Token 使用量會壓縮模型的推理能力
- 延遲影響:Token 解析與生成時間與 Token 數量成正比
- 多代理擴展:當代理數量增加時,Token 成本呈線性增長
1.2 Mem0 的新演算法:單遍 ADD-only 提取
Mem0 的改進核心在於:
- 單一 LLM 呼叫:不再需要 UPDATE/DELETE 的多階段記憶更新
- 記憶體累積:不覆蓋舊記憶,而是持續添加
- 實體關聯:實體被提取、嵌入並跨記憶關聯
- 多信號檢索:語義、BM25 關鍵字、實體匹配在平行評分後融合
1.3 基準評分對照表
| 基準 | 評分 | 問題數量 | 平均 Token |
|---|---|---|---|
| LoCoMo | 92.5 | 1,540 | 7,656 |
| LongMemEval | 94.4 | 500 | 6,787 |
| BEAM 1M | 64.1 | 700 | 6,700 |
| BEAM 10M | 48.6 | 200 | 6,900 |
二、實作指南:如何運行 Mem0 基準測量
2.1 環境設置
# 1. 克隆基準測試套件
git clone https://github.com/mem0ai/memory-benchmarks.git
cd memory-benchmarks
# 2. 安裝依賴
pip install -r requirements.txt
# 3. 設置 API 金鑰
export MEM0_API_KEY=m0-your-key
export OPENAI_API_KEY=sk-your-key
2.2 運行 LoCoMo 基準(快速驗證)
# LoCoMo — 最快,約 300 個問題,10 個對話
python -m benchmarks.locomo.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY
可測量指標:
- 檢索延遲:p50 0.88s,p99 1.45s
- Token 消耗:平均 7,000 Token/查詢
- 召回率:LoCoMo 92.5,LongMemEval 94.4
2.3 運行 LongMemEval 基準(深度驗證)
# LongMemEval — 500 個問題,6 種類別
python -m benchmarks.longmemeval.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY \
--all-questions
可測量指標:
- 單會話(使用者):94.3
- 單會話(助手):98.6
- 單會話(偏好):46.4
- 知識更新:98.2
- 時序推理:76.7
- 多會話:96.7
2.4 運行 BEAM 基準(生產規模驗證)
# BEAM 1M — 700 個問題,35 個對話
python -m benchmarks.beam.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY \
--chat-sizes 100K --conversations 0-9
可測量指標:
- BEAM 1M:64.1(700 個問題,35 個對話)
- BEAM 10M:48.6(200 個問題,10 個對話)
三、權衡分析:Token 效率 vs 檢索準確性
3.1 Token 效率的隱形成本
Mem0 的 Token 效率改進帶來了以下權衡:
- 記憶體佔用:累積式記憶不覆蓋舊記憶,導致記憶體使用量增加
- 檢索複雜度:多信號平行評分增加了計算開銷
- 時間複雜度:實體關聯和時間推理需要額外的處理步驟
3.2 Token 成本影響評估
| 場景 | Token 成本/查詢 | 月成本(1M 查詢) |
|---|---|---|
| Mem0 新演算法 | $0.00105 | $1,050 |
| 完整上下文方法 | $0.00375 | $3,750 |
| 節省 | $2,700/月 | — |
3.3 延遲影響評估
| 指標 | Mem0 新演算法 | 完整上下文方法 |
|---|---|---|
| p50 延遲 | 0.88s | 1.25s |
| p99 延遲 | 1.45s | 2.10s |
| Token 解析時間 | 0.15s | 0.40s |
| 生成時間 | 0.70s | 1.65s |
四、生產部署場景
4.1 場景一:高頻客服聊天機器人
需求:
- 每分鐘 100+ 查詢
- 需要多會話記憶
- 時序推理能力
部署策略:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
top_k: 200
top_k_cutoffs: 10,20,50,200
預期指標:
- 檢索延遲:<1s p50
- Token 成本:$0.00105/查詢
- 召回率:92.5-94.4
4.2 場景二:企業級 AI 助手
需求:
- 需要實體關聯
- 需要時間推理
- 需要多信號檢索
部署策略:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
top_k: 500
top_k_cutoffs: 10,20,50,200,500
預期指標:
- 檢索延遲:<1.5s p50
- Token 成本:$0.00105/查詢
- 召回率:94.4
4.3 場景三:大規模生產驗證
需求:
- 需要 BEAM 10M 基準驗證
- 需要多代理擴展
- 需要成本監控
部署策略:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
chat_sizes: 100K
conversations: 0-9
預期指標:
- 檢索延遲:<1.5s p50
- Token 成本:$0.00105/查詢
- 召回率:48.6(BEAM 10M)
五、決策框架:何時使用 Mem0 Token 效率 vs 完整上下文
5.1 使用 Token 效率的條件
- ✅ 高頻率查詢(每分鐘 100+ 查詢)
- ✅ 需要多會話記憶
- ✅ 需要時間推理
- ✅ 需要實體關聯
- ✅ 成本敏感型應用
5.2 使用完整上下文的條件
- ❌ 需要精確的上下文窗口控制
- ❌ 需要實時記憶覆蓋
- ❌ 需要極低的延遲(<0.5s)
- ❌ 需要簡單的檢索邏輯
六、結論
Mem0 的 Token 效率演算法在保持高檢索準確性的同時,將 Token 使用量降低了 3.5-4x。對於生產環境中的 AI Agent,這不僅是成本節省,更是可用性提升——當 Token 使用量降低時,模型的推理質量會顯著提高。
關鍵指標:
- LoCoMo:92.5(1,540 個問題,10 個對話)
- LongMemEval:94.4(500 個問題,6 種類別)
- BEAM 1M:64.1(700 個問題,35 個對話)
- BEAM 10M:48.6(200 個問題,10 個對話)
- 平均 Token:7,000/查詢
生產建議:
- 對於高頻率查詢場景,優先使用 Token 效率演算法
- 對於成本敏感型應用,Token 效率可節省 72% 的 Token 成本
- 對於需要精確上下文窗口控制的場景,仍可使用完整上下文方法
Executive summary
On May 16, 2026, Mem0 released a new token efficiency memory algorithm, achieving breakthrough scores on LoCoMo (92.5), LongMemEval (94.4), and BEAM 1M (64.1), while controlling the Token usage per retrieval call below 7,000 tokens. Compared to the full context approach which requires 25,000+ Tokens per query, this represents a 3.5-4x Token cost savings. This article provides implementation guidance, trade-off analysis, and production deployment scenarios for Mem0 benchmark measurement.
1. Background: Why Token efficiency is a key indicator of production environment
1.1 Token economics: from cost to availability
In a production environment, the Token usage of AI Agent directly affects:
- Cost per query: Taking gpt-4o-mini as an example, each 1M Token is about $0.15, 7,000 Token ≈ $0.00105 per search
- Context Window Pressure: High Token usage will compress the model’s reasoning capabilities
- Latency impact: Token parsing and generation time is proportional to the number of Tokens
- Multi-Agent Scaling: When the number of agents increases, the Token cost increases linearly
1.2 Mem0’s new algorithm: single-pass ADD-only extraction
The core improvements of Mem0 are:
- Single LLM call: No more multi-stage memory updates for UPDATE/DELETE
- Memory Accumulation: Do not overwrite old memories, but continue to add them
- Entity Association: Entities are extracted, embedded and associated across memories
- Multi-signal retrieval: semantics, BM25 keywords, entity matching are fused after parallel scoring
1.3 Benchmark Score Comparison Table
| Benchmark | Rating | Number of Questions | Average Token |
|---|---|---|---|
| LoCoMo | 92.5 | 1,540 | 7,656 |
| LongMemEval | 94.4 | 500 | 6,787 |
| BEAM 1M | 64.1 | 700 | 6,700 |
| BEAM 10M | 48.6 | 200 | 6,900 |
2. Implementation Guide: How to run Mem0 benchmark measurement
2.1 Environment settings
# 1. 克隆基準測試套件
git clone https://github.com/mem0ai/memory-benchmarks.git
cd memory-benchmarks
# 2. 安裝依賴
pip install -r requirements.txt
# 3. 設置 API 金鑰
export MEM0_API_KEY=m0-your-key
export OPENAI_API_KEY=sk-your-key
2.2 Running the LoCoMo benchmark (quick verification)
# LoCoMo — 最快,約 300 個問題,10 個對話
python -m benchmarks.locomo.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY
Measurable indicators:
- Retrieval delay: p50 0.88s, p99 1.45s
- Token consumption: average 7,000 Token/query
- Recall: LoCoMo 92.5, LongMemEval 94.4
2.3 Run the LongMemEval benchmark (in-depth verification)
# LongMemEval — 500 個問題,6 種類別
python -m benchmarks.longmemeval.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY \
--all-questions
Measurable indicators:
- Single Session (User): 94.3
- Single Session (Assistant): 98.6
- Single Session (Preference): 46.4
- Knowledge Update: 98.2
- Temporal Reasoning: 76.7
- Multi-session: 96.7
2.4 Running the BEAM benchmark (production scale verification)
# BEAM 1M — 700 個問題,35 個對話
python -m benchmarks.beam.run \
--project-name mem0-token-efficiency-test \
--backend cloud \
--mem0-api-key $MEM0_API_KEY \
--chat-sizes 100K --conversations 0-9
Measurable indicators:
- BEAM 1M: 64.1 (700 questions, 35 dialogues)
- BEAM 10M: 48.6 (200 questions, 10 dialogues)
3. Trade-off analysis: Token efficiency vs retrieval accuracy
3.1 The hidden cost of Token efficiency
Mem0’s Token efficiency improvements bring the following trade-offs:
- Memory usage: Accumulating memory does not overwrite old memories, resulting in increased memory usage
- Retrieval Complexity: Parallel scoring of multiple signals increases computational overhead
- Time Complexity: Entity association and temporal reasoning require additional processing steps
3.2 Token cost impact assessment
| Scenario | Token cost/query | Monthly cost (1M query) |
|---|---|---|
| Mem0 New Algorithm | $0.00105 | $1,050 |
| Full context method | $0.00375 | $3,750 |
| Savings | $2,700/month | — |
3.3 Delay Impact Assessment
| Indicators | Mem0 New Algorithm | Full Context Method |
|---|---|---|
| p50 delay | 0.88s | 1.25s |
| p99 delay | 1.45s | 2.10s |
| Token parsing time | 0.15s | 0.40s |
| Generation time | 0.70s | 1.65s |
4. Production deployment scenario
4.1 Scenario 1: High-frequency customer service chatbot
Requirements:
- 100+ queries per minute
- Requires multi-session memory
- Sequential reasoning ability
Deployment Strategy:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
top_k: 200
top_k_cutoffs: 10,20,50,200
Expected indicators:
- Retrieval delay: <1s p50
- Token cost: $0.00105/query
- Recall: 92.5-94.4
4.2 Scenario 2: Enterprise-level AI assistant
Requirements:
- Requires entity association
- Requires time to reason
- Requires multi-signal retrieval
Deployment Strategy:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
top_k: 500
top_k_cutoffs: 10,20,50,200,500
Expected indicators:
- Retrieval delay: <1.5s p50
- Token cost: $0.00105/query
- Recall: 94.4
4.3 Scenario 3: Mass production verification
Requirements:
- Requires BEAM 10M benchmark verification
- Requires multi-agent extension
- Cost monitoring required
Deployment Strategy:
# Mem0 Cloud 部署
backend: cloud
mem0_api_key: m0-your-key
answerer_model: gpt-4o
judge_model: gpt-4o
chat_sizes: 100K
conversations: 0-9
Expected indicators:
- Retrieval delay: <1.5s p50
- Token cost: $0.00105/query
- Recall: 48.6 (BEAM 10M)
5. Decision-making framework: When to use Mem0 Token efficiency vs complete context
5.1 Conditions for using Token efficiency
- ✅ High frequency of queries (100+ queries per minute)
- ✅ Requires multi-session memory
- ✅ Requires time to reason
- ✅ Requires entity association
- ✅ Cost-sensitive applications
5.2 Conditions for using full context
- ❌ Requires precise contextual window control
- ❌ Requires real-time memory overlay
- ❌ Requires extremely low latency (<0.5s)
- ❌ Requires simple search logic
6. Conclusion
Mem0’s Token efficiency algorithm reduces Token usage by 3.5-4x while maintaining high retrieval accuracy. For AI Agents in production environments, this is not only a cost savings, but also a usability improvement - when the token usage is reduced, the model’s inference quality will be significantly improved.
Key Indicators:
- LoCoMo: 92.5 (1,540 questions, 10 dialogues)
- LongMemEval: 94.4 (500 questions, 6 categories)
- BEAM 1M: 64.1 (700 questions, 35 dialogues)
- BEAM 10M: 48.6 (200 questions, 10 dialogues)
- Average Token: 7,000/query
Production Suggestions:
- For high-frequency query scenarios, the Token efficiency algorithm is preferred
- For cost-sensitive applications, Token efficiency can save 72% of Token costs
- For scenarios that require precise context window control, the full context approach can still be used