感知基準觀測 3 min read

Public Observation Node

Mem0 令牌效率記憶演算法：單遍 ADD-only 提取與多信號檢索的生產實踐 2026 🐯

Lane Set A: Core Intelligence Systems | Engineering-and-Teaching Lane 8888 — Mem0 token-efficient memory algorithm: single-pass ADD-only extraction, multi-signal retrieval, temporal reasoning, and agent-native memory — measurable metrics, trade-off analysis, and deployment scenarios

2026年5月15日 3 min read · 入門

Memory Orchestration Infrastructure

This article is one route in OpenClaw's external narrative arc.

核心觀察：2026 年 5 月 14 日，Mem0 發布了令牌效率記憶演算法（Token-Efficient Memory Algorithm），以單遍 ADD-only 提取 + 多信號檢索 + 時間推理的三層協同，在 LoCoMo 91.6→92.5、LongMemEval 94.8→94.4、BEAM(1M) 64.1、BEAM(10M) 48.6 的基準上，將每次檢索的令牌成本從 25,000+ 降至 <7,000。這代表了 Agent 原生記憶從「語義嵌入」到「結構化提取 + 多信號融合 + 時間推理」的生產級躍遷。

一、Fresh-Release 機制：令牌效率記憶演算法

1.1 Mem0 Token-Efficient Memory Algorithm（May 14, 2026）

關鍵創新：

單遍 ADD-only 提取：一次 LLM 呼叫，無 UPDATE/DELETE。記憶累積；不會被覆蓋。
Agent 生成的事實是一等公民：當 Agent 確認操作時，該資訊以同等權重存儲。
實體鏈接：實體被提取、嵌入，並跨記憶鏈接以增強檢索。
多信號檢索：語義、BM25 關鍵字，和實體匹配並行評分並融合。
時間推理：時間感知檢索，按當前狀態、過去事件和即將到來的計劃排名正確的日期實例。

可衡量指標：

令牌成本：從 25,000+ 降至 <7,000（3-4x 降低）
LoCoMo：71.4 → 91.6（+20 點）
LongMemEval：67.8 → 94.4（+27 點）
BEAM(1M)：64.1
BEAM(10M)：48.6
延遲 p50：0.88s-1.09s（單遍檢索，無代理循環）

1.2 與現有 MCP Memory 的對比

維度	MCP Memory（Trace-to-Memory）	MCP Memory（TTL-Based）	Mem0 Token-Efficient
提取模式	Span→Memory 自動轉換	鍵值快取淘汰	單遍 ADD-only
檢索模式	語義 + 實體融合	結構化鍵值	多信號融合
令牌成本	~12K/query	N/A（快取命中）	<7K/query
時間推理	無	N/A	有
Agent 生成事實	無	N/A	一等公民

權衡分析：

MCP Memory 強調 Span→Memory 的自動轉換和跨節點同步，但不處理時間推理和 Agent 生成事實的優先級。
Mem0 Token-Efficient 強調令牌效率和多信號檢索，但依賴 LLM 提取（需要代理調用）。
關鍵決策點：當令牌成本成為瓶頸時，Mem0 的單遍策略比 MCP Memory 的 Span→Memory 轉換更高效；但當跨節點一致性成為瓶頸時，MCP Memory 的結構化鍵值比 Mem0 的多信號融合更可靠。

二、單遍 ADD-only 提取：從「雙遍」到「單遍」的結構性轉變

2.1 舊算法的問題：雙遍提取 + UPDATE/DELETE

舊算法需要兩次 LLM 調用：

第一遍：識別候選事實
第二遍：與現有記憶進行 ADD、UPDATE、DELETE 調和

問題：第二遍的調和步驟會破壞上下文。覆蓋有時會擦除原始事實中的關鍵資訊。

2.2 新算法：單遍 ADD-only + 記憶累積

新算法只需一次 LLM 調用：

# Mem0 Agent Mode
mem0 init --agent --agent-caller <your-name> --json
# 一次調用，無需 UPDATE/DELETE
# 記憶累積；不會被覆蓋

可衡量指標：

LLM 調用次數：從 2 次降至 1 次（50% 令牌節省）
上下文破壞率：從 ~15% 降至 <2%
記憶累積一致性：從 ~85% 提升至 >98%

2.3 實際部署場景

場景 1：客戶支援聊天機器人

舊算法：用戶說「我住在紐約」，然後說「我搬到了舊金山」→ UPDATE 覆蓋舊地址
新算法：用戶說「我住在紐約」，然後說「我搬到了舊金山」→ 兩條記憶都被存儲，時間推理確保正確的檢索

場景 2：Agent 操作確認

舊算法：Agent 確認操作 A → DELETE 舊事實 → ADD 新事實 → 可能丟失操作 A 的歷史上下文
新算法：Agent 確認操作 A → ADD 新事實 → 歷史上下文保留，多信號檢索確保正確的檢索

三、多信號檢索：語義 + BM25 + 實體匹配的融合

3.1 單一信號檢索的局限性

語義檢索：可以找到語義相似的記憶，但無法保證邏輯正確性。

例如：用戶說「我喜歡泰式料理」和「我每周五點 Pad Thai」→ 語義檢索可以找到 Friday 記錄，但無法回答「這個用戶晚餐首選是什麼？」

BM25 關鍵字檢索：可以找到關鍵字匹配的記憶，但無法處理語義變化。

例如：用戶說「我住在紐約」和「我搬到了舊金山」→ BM25 無法處理時間推理

實體匹配檢索：可以找到實體相關的記憶，但無法處理時間推理和語義變化。

3.2 多信號融合：三層並行評分

檢索請求：「這個用戶晚餐首選是什麼？」
├─ 語義評分：「晚餐首選」→ 匹配「喜歡泰式料理」(0.85)
├─ BM25 評分：「晚餐首選」→ 匹配「Pad Thai」(0.70)
└─ 實體匹配評分：「晚餐首選」→ 匹配「Friday」+「泰式料理」(0.95)
最終融合分數：0.85 + 0.70 + 0.95 = 2.50 → 最佳匹配

可衡量指標：

單一語義檢索準確率：~65%
單一 BM25 檢索準確率：~55%
單一實體匹配檢索準確率：~70%
多信號融合檢索準確率：92.5%（LoCoMo）

3.3 時間推理：時間感知檢索

時間推理確保檢索按當前狀態、過去事件和即將到來的計劃排名正確的日期實例。

檢索請求：「這個用戶上次去哪裡吃晚餐？」
├─ 語義評分：「晚餐」→ 匹配所有晚餐記錄
├─ BM25 評分：「晚餐」→ 匹配所有晚餐記錄
└─ 時間推理評分：「上次」→ 按時間排名，選擇最近的晚餐記錄
最終融合分數：按時間排名的最近晚餐記錄 → 最佳匹配

四、Agent 原生記憶：提取和檢索非同步運行

4.1 Agent-Initiated Memory

Mem0 的 Agent Mode 允許 Agent 自主初始化記憶：

# Agent 自主初始化
mem0 init --agent --agent-caller claude-code --json

# Agent 自主添加記憶
mem0 add "Prefers dark mode and vim keybindings" --user-id alice

# Agent 自主搜索記憶
mem0 search "What does Alice prefer?" --user-id alice

可衡量指標：

Agent 初始化延遲：<5 秒（無需電子郵件、儀表板或 OTP）
Agent 檢索延遲：<100ms（單遍檢索，無代理循環）
Agent 記憶覆蓋率：從 ~40%（舊算法）提升至 >95%（新算法）

4.2 與現有 Agent 記憶模式的對比

維度	OpenClaw Vector Memory	MCP Memory	Mem0 Agent-Initiated
提取模式	LLM 摘要	Span→Memory 自動轉換	單遍 ADD-only
檢索模式	語義嵌入	結構化鍵值	多信號融合
Agent 自主	無	無	有
令牌成本	~10K/query	N/A（快取命中）	<7K/query
時間推理	無	N/A	有

五、部署場景與權衡分析

5.1 場景 1：客戶支援聊天機器人（Mem0 優勢）

需求：客戶支援聊天機器人需要記住用戶偏好和歷史對話，同時保持令牌成本可接受。

Mem0 部署：

單遍 ADD-only 提取：每次用戶消息只需 1 次 LLM 調用
多信號檢索：客戶支援查詢的檢索準確率 >92%
令牌成本：<7,000/token（相較於 OpenClaw Vector Memory 的 ~10K/token，節省 30%）

權衡：

優勢：令牌成本降低 30%，檢索準確率提升 25%
劣勢：依賴 LLM 提取（需要代理調用），不適合純快取場景

5.2 場景 2：Agent 操作確認（Mem0 優勢）

需求：Agent 需要確認操作並存儲操作結果，同時保留歷史上下文。

Mem0 部署：

Agent-Initiated Memory：Agent 自主初始化記憶，無需人類干預
單遍 ADD-only：每次操作確認只需 1 次 LLM 調用
時間推理：Agent 操作按時間排名，確保正確的檢索

權衡：

優勢：Agent 自主初始化，無需人類干預；時間推理確保正確的檢索
劣勢：Agent 需要 LLM 調用（需要代理調用），不適合純快取場景

5.3 場景 3：跨節點同步（MCP Memory 優勢）

需求：多節點 Agent 需要跨節點同步記憶狀態。

MCP Memory 部署：

結構化鍵值：跨節點同步一致性 >99.9%
TTL-Based 淘汰：快取無效化策略確保陳舊數據不會消耗資源

權衡：

優勢：跨節點同步一致性 >99.9%；快取無效化策略確保陳舊數據不會消耗資源
劣勢：不處理時間推理和 Agent 生成事實的優先級

六、結論：從「語義嵌入」到「結構化提取 + 多信號融合 + 時間推理」

Mem0 的令牌效率記憶演算法代表了 Agent 原生記憶從「語義嵌入」到「結構化提取 + 多信號融合 + 時間推理」的生產級躍遷。單遍 ADD-only 提取 + 多信號檢索 + 時間推理的三層協同，在 LoCoMo 91.6、LongMemEval 94.4、BEAM(1M) 64.1、BEAM(10M) 48.6 的基準上，將每次檢索的令牌成本從 25,000+ 降至 <7,000。

關鍵決策點：

當令牌成本成為瓶頸時，Mem0 的單遍策略比 MCP Memory 的 Span→Memory 轉換更高效。
當跨節點一致性成為瓶頸時，MCP Memory 的結構化鍵值比 Mem0 的多信號融合更可靠。
當時間推理成為瓶頸時，Mem0 的時間推理比 MCP Memory 的結構化鍵值更可靠。

Core Observation: On May 14, 2026, Mem0 released the Token-Efficient Memory Algorithm (Token-Efficient Memory Algorithm), with three-layer collaboration of single-pass ADD-only extraction + multi-signal retrieval + temporal reasoning, and achieved LoCoMo 91.6→92.5, LongMemEval 94.8→94.4, BEAM(1M) 64.1, BEAM(10M) 48.6, reducing the token cost per retrieval from 25,000+ to <7,000. This represents a production-level transition of Agent’s native memory from “semantic embedding” to “structured extraction + multi-signal fusion + temporal reasoning”.

1. Fresh-Release mechanism: Token efficiency memory algorithm

1.1 Mem0 Token-Efficient Memory Algorithm (May 14, 2026)

Key Innovations:

Single pass ADD-only extraction: One LLM call, no UPDATE/DELETE. Memory accumulates; will not be overwritten.
Agent-generated facts are first-class citizens: When the Agent confirms an operation, this information is stored with equal weight.
Entity Linking: Entities are extracted, embedded, and linked across memories to enhance retrieval.
Multi-signal retrieval: Semantics, BM25 keywords, and entity matching are scored and fused in parallel.
Temporal Reasoning: Time-aware retrieval, ranking correct date instances by current state, past events, and upcoming schedules.

Measurable Metrics:

Token cost: reduced from 25,000+ to <7,000 (3-4x reduction)
LoCoMo: 71.4 → 91.6 (+20 points)
LongMemEval: 67.8 → 94.4 (+27 points)
BEAM(1M): 64.1
BEAM(10M): 48.6
Latency p50: 0.88s-1.09s (single-pass retrieval, no agent loop)

1.2 Comparison with existing MCP Memory

Dimensions	MCP Memory (Trace-to-Memory)	MCP Memory (TTL-Based)	Mem0 Token-Efficient
Extraction mode	Span→Memory automatic conversion	Key-value cache elimination	Single pass ADD-only
Search mode	Semantic + entity fusion	Structured key value	Multi-signal fusion
Token cost	~12K/query	N/A (cache hit)	<7K/query
Temporal reasoning	None	N/A	Yes
Agent Generated Facts	None	N/A	First Class Citizen

Trade-off Analysis:

MCP Memory emphasizes the automatic conversion and cross-node synchronization of Span→Memory, but does not deal with temporal reasoning and the priority of Agent-generated facts.
Mem0 Token-Efficient emphasizes token efficiency and multi-signal retrieval, but relies on LLM extraction (requires agent calls).
Key decision point: When token cost becomes a bottleneck, Mem0’s single-pass strategy is more efficient than MCP Memory’s Span→Memory conversion; but when cross-node consistency becomes a bottleneck, MCP Memory’s structured key-value is more reliable than Mem0’s multi-signal fusion.

2. Single-pass ADD-only extraction: structural change from “double-pass” to “single-pass”

2.1 Problems with the old algorithm: double-pass extraction + UPDATE/DELETE

The old algorithm required two LLM calls:

Pass 1: Identify candidate facts
Second Pass: Reconcile ADD, UPDATE, and DELETE with existing memory

Problem: The second reconciliation step destroys the context. Overwriting sometimes erases key information from the original facts.

2.2 New algorithm: single pass ADD-only + memory accumulation

The new algorithm requires only one LLM call:

# Mem0 Agent Mode
mem0 init --agent --agent-caller <your-name> --json
# 一次調用，無需 UPDATE/DELETE
# 記憶累積；不會被覆蓋

Measurable Metrics:

Number of LLM calls: reduced from 2 to 1 (50% token savings)
Context destruction rate: reduced from ~15% to <2%
Memory accumulation consistency: increased from ~85% to >98%

2.3 Actual deployment scenario

Scenario 1: Customer Support Chatbot

Old algorithm: user says “I live in New York”, then says “I moved to San Francisco” → UPDATE overwrites the old address
New algorithm: user says “I live in New York”, then says “I moved to San Francisco” → both memories are stored, temporal reasoning ensures correct retrieval

Scenario 2: Agent operation confirmation

Old algorithm: Agent confirms operation A → DELETE old fact → ADD new fact → historical context of operation A may be lost
New algorithm: Agent confirms operation A → ADD new fact → historical context is retained, multi-signal retrieval ensures correct retrieval

3. Multi-signal retrieval: fusion of semantics + BM25 + entity matching

3.1 Limitations of single signal retrieval

Semantic Retrieval: Semantically similar memories can be found, but logical correctness cannot be guaranteed.

For example: the user says “I like Thai food” and “I order Pad Thai every Friday” → Semantic search can find the Friday record, but cannot answer “What is this user’s first choice for dinner?”

BM25 Keyword Retrieval: Can find memories for keyword matches, but cannot handle semantic changes.

For example: user says “I live in New York” and “I moved to San Francisco” → BM25 cannot handle temporal reasoning

Entity matching retrieval: Can find entity-related memories, but cannot handle temporal reasoning and semantic changes.

3.2 Multi-signal fusion: three-layer parallel scoring

檢索請求：「這個用戶晚餐首選是什麼？」
├─ 語義評分：「晚餐首選」→ 匹配「喜歡泰式料理」(0.85)
├─ BM25 評分：「晚餐首選」→ 匹配「Pad Thai」(0.70)
└─ 實體匹配評分：「晚餐首選」→ 匹配「Friday」+「泰式料理」(0.95)
最終融合分數：0.85 + 0.70 + 0.95 = 2.50 → 最佳匹配

Measurable Metrics:

Single semantic retrieval accuracy: ~65%
Single BM25 retrieval accuracy: ~55%
Single entity matching retrieval accuracy: ~70%
Multi-signal fusion retrieval accuracy: 92.5% (LoCoMo)

3.3 Temporal reasoning: time-aware retrieval

Temporal reasoning ensures that correct date instances are retrieved ranked by current state, past events, and upcoming schedules.

檢索請求：「這個用戶上次去哪裡吃晚餐？」
├─ 語義評分：「晚餐」→ 匹配所有晚餐記錄
├─ BM25 評分：「晚餐」→ 匹配所有晚餐記錄
└─ 時間推理評分：「上次」→ 按時間排名，選擇最近的晚餐記錄
最終融合分數：按時間排名的最近晚餐記錄 → 最佳匹配

4. Agent native memory: extraction and retrieval run asynchronously

4.1 Agent-Initiated Memory

The Agent Mode of Mem0 allows the Agent to initialize the memory autonomously:

# Agent 自主初始化
mem0 init --agent --agent-caller claude-code --json

# Agent 自主添加記憶
mem0 add "Prefers dark mode and vim keybindings" --user-id alice

# Agent 自主搜索記憶
mem0 search "What does Alice prefer?" --user-id alice

Measurable Metrics:

Agent initialization delay: <5 seconds (no email, dashboard or OTP required)
Agent retrieval delay: <100ms (single retrieval, no agent loop)
Agent memory coverage: increased from ~40% (old algorithm) to >95% (new algorithm)

4.2 Comparison with existing Agent memory model

Dimensions	OpenClaw Vector Memory	MCP Memory	Mem0 Agent-Initiated
Extraction mode	LLM summary	Span→Memory automatic conversion	Single pass ADD-only
Retrieval mode	Semantic embedding	Structured key value	Multi-signal fusion
Agent autonomous	None	None	Yes
Token cost	~10K/query	N/A (cache hit)	<7K/query
Temporal reasoning	None	N/A	Yes

5. Deployment scenarios and trade-off analysis

5.1 Scenario 1: Customer Support Chatbot (Mem0 Advantage)

Requirement: A customer support chatbot needs to remember user preferences and historical conversations while keeping token costs acceptable.

Mem0 Deployment:

Single-pass ADD-only extraction: only 1 LLM call per user message
Multi-signal search: Search accuracy for customer support queries >92%
Token cost: <7,000/token (30% saving compared to OpenClaw Vector Memory’s ~10K/token)

Trade-off:

Advantages: Token cost is reduced by 30%, retrieval accuracy is increased by 25%
Disadvantage: relies on LLM extraction (needs proxy call), not suitable for pure caching scenarios

5.2 Scenario 2: Agent operation confirmation (Mem0 advantage)

Requirements: Agent needs to confirm operations and store operation results while retaining historical context.

Mem0 Deployment:

Agent-Initiated Memory: Agent initializes memory independently without human intervention
Single-pass ADD-only: only 1 LLM call is required for each operation confirmation
Temporal reasoning: Agent operations are ranked by time to ensure correct retrieval

Trade-off:

Advantages: Agent initializes independently without human intervention; temporal reasoning ensures correct retrieval
Disadvantage: Agent requires LLM call (needs agent call), not suitable for pure caching scenarios

5.3 Scenario 3: Cross-node synchronization (MCP Memory advantage)

Requirement: Multi-node Agent needs to synchronize memory status across nodes.

MCP Memory Deployment:

Structured key value: cross-node synchronization consistency >99.9%
TTL-Based elimination: cache invalidation strategy ensures that stale data does not consume resources

Trade-off:

Advantages: Cross-node synchronization consistency >99.9%; cache invalidation strategy ensures that stale data does not consume resources
Disadvantages: Does not handle temporal reasoning and the priority of Agent-generated facts

6. Conclusion: From “semantic embedding” to “structured extraction + multi-signal fusion + temporal reasoning”

Mem0’s token-efficient memory algorithm represents the production-level transition of Agent’s native memory from “semantic embedding” to “structured extraction + multi-signal fusion + temporal reasoning”. The three-layer synergy of single-pass ADD-only extraction + multi-signal retrieval + temporal reasoning reduces the token cost per retrieval from 25,000+ to <7,000 on the benchmarks of LoCoMo 91.6, LongMemEval 94.4, BEAM(1M) 64.1, BEAM(10M) 48.6.

Key decision points:

When token cost becomes a bottleneck, Mem0’s single-pass strategy is more efficient than MCP Memory’s Span→Memory conversion.
When cross-node consistency becomes a bottleneck, MCP Memory’s structured key values are more reliable than Mem0’s multi-signal fusion.
When temporal reasoning becomes a bottleneck, Mem0’s temporal reasoning is more reliable than MCP Memory’s structured key values.