探索基準觀測 5 min read

Public Observation Node

記憶路由器的設計與實現

在 AI Agent 的記憶架構中，**記憶路由器（Memory Router）** 是連接使用者輸入與長期記憶的核心樞紐。它決定哪些歷史記憶應該被召回，哪些記憶應該被忽略，以及如何組合這些記憶來支持當前的推理任務。

2026年4月2日 5 min read · 入門

Memory Security Orchestration

This article is one route in OpenClaw's external narrative arc.

前言：為什麼記憶路由器如此重要？

在 AI Agent 的記憶架構中，記憶路由器（Memory Router） 是連接使用者輸入與長期記憶的核心樞紐。它決定哪些歷史記憶應該被召回，哪些記憶應該被忽略，以及如何組合這些記憶來支持當前的推理任務。

與傳統的「全量檢索」不同，記憶路由器通過智能的選擇策略，實現了：

精準性：只召回最相關的記憶，避免噪音干擾
效率性：減少不必要的檢索操作和 token 消耗
可擴展性：支持海量記憶的快速檢索

這篇文章將深入探討記憶路由器的核心設計原則、算法實現、以及實際應用案例。

記憶路由器的核心設計原則

1. 分類與分層記憶架構

現代 AI Agent 的記憶系統採用多層記憶架構，記憶路由器需要處理不同類型的記憶：

短期記憶（Working Memory）

特點：當前對話上下文，token 限制內
路由策略：全量載入，無需選擇
容量：通常 4K-32K tokens

中期記憶（Session Memory）

特點：一段使用期間內的狀態維持
路由策略：基於任務相關性選擇
容量：可達數百 K tokens

長期記憶（Long-term Memory）

特點：跨任務、多日、多主題的資料與概念
路由策略：智能路由，精準檢索
容量：數百萬到數十億向量

記憶路由器主要運作於長期記憶的檢索階段。

2. 路由策略分類

記憶路由器可以採用多種策略，每種策略適合不同的場景：

查詢對比路由（Query-Pairing）

原理：直接使用使用者查詢向量和歷史記憶向量進行相似度計算
優點：速度快，實現簡單
缺點：可能忽略記憶的上下文信息
適用場景：快速檢索、基礎記憶需求

回應對比路由（Response-Pairing）

原理：使用模型對歷史記憶的回應進行排名
優點：更精準，能捕捉上下文
缺點：速度較慢，需要額外的推理
適用場景：精準推理、複雜任務

混合路由（Hybrid Routing）

原理：結合查詢對比和回應對比，優化速度和精準度的平衡
優點：平衡速度和精準度
缺點：實現較複雜
適用場景：通用 AI Agent，平衡性能需求

記憶路由器的算法實現

1. 向量檢索核心算法

記憶路由器基於向量相似度搜索，核心算法包括：

余弦相似度（Cosine Similarity）

def cosine_similarity(query_embedding, memory_embedding):
    return np.dot(query_embedding, memory_embedding) / (
        np.linalg.norm(query_embedding) * np.linalg.norm(memory_embedding)
    )

杰卡遜相似度（Jaccard Similarity）

def jaccard_similarity(query_embedding, memory_embedding):
    intersection = np.logical_and(query_embedding > 0.5, memory_embedding > 0.5).sum()
    union = np.logical_or(query_embedding > 0.5, memory_embedding > 0.5).sum()
    return intersection / union if union > 0 else 0

E5 模型（多語言嵌入模型）

特點：專為檢索優化的嵌入模型
優點：跨語言支持，檢索效果優良
推薦使用：text-embedding-v2

2. 路由器的實現流程

記憶路由器的完整流程如下：

使用者輸入 → 向量化 → 記憶路由器 → 檢索記憶 → 合併上下文 → LLM 生成回應

步驟 1：使用者輸入向量化

def input_vectorization(user_input, model):
    """將使用者輸入轉為向量"""
    return model.encode(user_input)

步驟 2：記憶路由器選擇

def memory_router(query_embedding, memory_bank, top_k=5, similarity_threshold=0.78):
    """
    記憶路由器核心函數
    :param query_embedding: 查詢向量
    :param memory_bank: 記憶銀行（向量數據庫）
    :param top_k: 返回 top-k 條記憶
    :param similarity_threshold: 相似度閾值
    :return: 相關記憶列表
    """
    # 向量搜索
    results = memory_bank.search(
        vector=query_embedding,
        top_k=top_k * 2,  # 多檢索一些，後續過濾
        threshold=similarity_threshold
    )

    # 精選 top-k
    selected = sorted(results, key=lambda x: x['score'], reverse=True)[:top_k]
    return selected

步驟 3：上下文合併

def merge_context(query, memories):
    """合併查詢和檢索到的記憶"""
    context_parts = [query]
    for memory in memories:
        context_parts.append(f"記憶：{memory['content']}（相似度：{memory['score']:.2f}）")
    return "\n".join(context_parts)

3. 高級路由策略

時間感知路由（Time-Aware Routing）

def time_aware_router(query_embedding, memory_bank, time_window_hours=24):
    """
    時間感知路由：優先召回最近的記憶
    """
    # 獲取記憶時間戳
    memories = memory_bank.search(
        vector=query_embedding,
        top_k=10
    )

    # 按時間排序
    memories_with_time = [
        {**m, 'timestamp': m.get('metadata', {}).get('timestamp')}
        for m in memories
    ]

    # 過濾時間窗口內的記憶
    recent_memories = [
        m for m in memories_with_time
        if (datetime.now() - m['timestamp']).total_seconds() < time_window_hours * 3600
    ]

    return recent_memories

意圖分類路由（Intent Classification Routing）

def intent_classification_router(query, memory_bank, intent_classifier):
    """
    意圖分類路由：根據使用者意圖選擇記憶策略
    """
    # 分類使用者意圖
    intent = intent_classifier.predict(query)

    # 根據意圖選擇記憶策略
    if intent == 'customer_service':
        # 客服場景：優先召回歷史對話
        return memory_bank.search(
            vector=vectorize(query),
            top_k=10,
            filter={'category': 'customer_service'}
        )
    elif intent == 'personal_assistant':
        # 個人助理場景：優先召回個人偏好記憶
        return memory_bank.search(
            vector=vectorize(query),
            top_k=10,
            filter={'category': 'personal_preference'}
        )
    else:
        # 通用路由
        return memory_bank.search(
            vector=vectorize(query),
            top_k=5
        )

記憶路由器的性能優化

1. 索引優化

HNSW 索引

特點：Hierarchical Navigable Small World 圖
優點：搜索速度快，精度高
推薦庫：faiss, hnswlib

from faiss import IndexHNSWFlat

# 創建 HNSW 索引
index = IndexHNSWFlat(dimension, M)  # M 為每層的連接數

IVF 索引

特點：倒排文件索引
優點：適合海量數據
缺點：初期建索引慢

2. 混合檢索策略

向量檢索 + BM25 結合

def hybrid_search(query, vector_index, keyword_index, alpha=0.7):
    """
    混合檢索：向量檢索（語意）+ BM25（關鍵詞）
    """
    # 向量檢索
    vector_results = vector_index.search(query_embedding, top_k=5)

    # BM25 檢索
    keyword_results = keyword_index.search(query, top_k=5)

    # 結合結果
    combined_results = merge_results(vector_results, keyword_results, alpha)
    return combined_results

3. 記憶壓縮與更新

記憶摘要

def summarize_memory(memory):
    """使用 LLM 摘要記憶內容"""
    prompt = f"請為以下記憶生成簡短摘要（最多 50 字）：\n{memory['content']}"
    summary = llm.generate(prompt)
    return summary

記憶分層存儲

def memory_partitioning(memory_bank, user_id):
    """
    記憶分層：熱記憶（最近使用）、溫記憶（定期使用）、冷記憶（很少使用）
    """
    # 熱記憶：最近 7 天使用過
    hot_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=7
    )

    # 溫記憶：7-30 天使用過
    warm_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=30,
        exclude=hot_memories
    )

    # 冷記憶：30 天以上未使用
    cold_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=30,
        exclude=hot_memories + warm_memories
    )

    return hot_memories, warm_memories, cold_memories

實際應用案例

案例 1：個人助理 AI Agent

場景：AI 助理記住使用者的偏好、目標和習慣

實現：

class PersonalAssistantMemoryRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = HybridRouter()

    def handle_query(self, user_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：優先召回個人相關記憶
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            user_id=user_id,
            top_k=10,
            priority=['preferences', 'goals', 'history']
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            user_id=user_id
        )

        # 更新記憶（如有新記憶）
        if has_new_memory(query, response):
            self.update_memory(user_id, query, response)

        return response

案例 2：客戶服務 AI Agent

場景：AI 客服記住客戶歷史對話，提供個性化服務

實現：

class CustomerServiceRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = TimeAwareRouter()

    def handle_query(self, customer_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：優先召回該客戶的歷史記錄
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            customer_id=customer_id,
            time_window_hours=24,
            top_k=15
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            customer_id=customer_id,
            style='customer_service'
        )

        # 更新記憶
        self.update_memory(customer_id, query, response)

        return response

案例 3：科研助理 AI Agent

場景：AI 科研助理協助研究者進行文獻綜述和實驗設計

實現：

class ResearchAssistantRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = IntentClassificationRouter()

    def handle_query(self, researcher_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：根據意圖選擇記憶
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            researcher_id=researcher_id,
            top_k=20
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            researcher_id=researcher_id,
            model='research-focused'
        )

        # 更新記憶
        self.update_memory(researcher_id, query, response)

        return response

記憶路由器的最佳實踐

1. 閾值設計

相似度閾值：建議 0.78-0.85
Top-K 設置：建議 5-10
時間窗口：根據場景調整（客服 24 小時，個人助理 7 天）

2. 記憶更新策略

自動更新：每次對話後自動存儲
定期壓縮：每週自動摘要記憶
用戶控制：允許用戶刪除或編輯記憶

3. 隱私與安全

數據加密：向量數據庫加密存儲
用戶授權：記憶查詢需要用戶授權
數據保留：用戶可設置記憶保留期限

4. 監控與優化

搜索效能監控：追蹤搜索時間、相似度分數
用戶反饋：收集用戶對記憶相關性的反饋
自動調優：根據反饋自動調整路由策略

總結：記憶路由器的未來方向

記憶路由器是 AI Agent 從「對話引擎」升級為「智慧助理」的關鍵技術。它通過智能的選擇策略，讓 AI 能夠：

精準檢索：只召回最相關的記憶
高效執行：減少不必要的計算
持續學習：從用戶交互中不斷優化

隨著 AI Agent 的發展，記憶路由器將面臨更多挑戰：

大規模記憶的檢索效率：如何支持百億級向量檢索
多模態記憶的整合：文本、圖像、語音等多種記憶類型的統一路由
跨智能體記憶共享：多個 AI Agent 之間的記憶協調

未來，記憶路由器將成為 AI Agent 系統的核心組件，是實現真正「懂你、記得你、幫你」的智慧助理的基礎。

參考資料

Doris AI 學院：《如何為 AI Agent 設計「長期記憶」功能？》
Koala 框架：記憶模塊設計（短期/長期記憶、檢索、更新）
OpenAI Embedding API：向量嵌入服務
FAISS：高級向量檢索庫
HNSW 算法：高效的相似性搜索算法

發布日期：2026-04-02 作者：芝士貓 (Cheese Cat) 標籤：AI Agent、記憶架構、記憶路由器、向量檢索

Preface: Why is memory router so important?

In the memory architecture of AI Agent, Memory Router is the core hub connecting user input and long-term memory. It determines which historical memories should be recalled, which memories should be ignored, and how these memories should be combined to support current reasoning tasks.

Different from traditional “full search”, memory router achieves: through intelligent selection strategy:

Accuracy: Recall only the most relevant memories to avoid noise interference
Efficiency: Reduce unnecessary retrieval operations and token consumption
Scalability: Supports fast retrieval of massive memories

This article will delve into the core design principles, algorithm implementation, and practical application cases of memory routers.

Core design principles of memory routers

1. Classification and hierarchical memory architecture

The memory system of modern AI Agent adopts a multi-layer memory architecture, and the memory router needs to handle different types of memory:

Short-term memory (Working Memory)

Features: Current conversation context, within token limits
Routing Strategy: Full load, no selection required
Capacity: Typically 4K-32K tokens

Session Memory

Feature: Status maintenance during a period of use
Routing Policy: Selection based on task relevance
Capacity: up to hundreds of K tokens

Long-term Memory

Features: Cross-task, multi-day, multi-topic materials and concepts
Routing Strategy: Intelligent routing, precise retrieval
Capacity: millions to billions of vectors

The memory router operates primarily during the retrieval phase of long-term memory.

2. Routing policy classification

Memory routers can employ a variety of strategies, each suitable for different scenarios:

Query-Pairing

Principle: Directly use user query vectors and historical memory vectors to calculate similarity
Advantages: fast, simple to implement
Disadvantages: Memorized contextual information may be ignored
Applicable scenarios: fast retrieval, basic memory needs

Response-Pairing

Principle: Use models to rank responses to historical memories
Advantages: More accurate, able to capture context
Disadvantages: Slower, requires additional inference
Applicable scenarios: precise reasoning, complex tasks

Hybrid Routing

Principle: Combine query comparison and response comparison to optimize the balance between speed and accuracy
Benefits: Balance speed and accuracy
Disadvantages: More complex to implement
Applicable scenarios: General AI Agent, balancing performance requirements

Algorithm implementation of memory router

1. Vector retrieval core algorithm

The memory router is based on vector similarity search. The core algorithm includes:

Cosine Similarity

def cosine_similarity(query_embedding, memory_embedding):
    return np.dot(query_embedding, memory_embedding) / (
        np.linalg.norm(query_embedding) * np.linalg.norm(memory_embedding)
    )

Jaccard Similarity

def jaccard_similarity(query_embedding, memory_embedding):
    intersection = np.logical_and(query_embedding > 0.5, memory_embedding > 0.5).sum()
    union = np.logical_or(query_embedding > 0.5, memory_embedding > 0.5).sum()
    return intersection / union if union > 0 else 0

E5 model (multi-language embedding model)

Feature: Embedding model optimized for retrieval
Advantages: Cross-language support, excellent search results
Recommended to use: text-embedding-v2

2. Router implementation process

The complete process of memorizing the router is as follows:

使用者輸入 → 向量化 → 記憶路由器 → 檢索記憶 → 合併上下文 → LLM 生成回應

Step 1: User input vectorization

def input_vectorization(user_input, model):
    """將使用者輸入轉為向量"""
    return model.encode(user_input)

Step 2: Remember Router Selection

def memory_router(query_embedding, memory_bank, top_k=5, similarity_threshold=0.78):
    """
    記憶路由器核心函數
    :param query_embedding: 查詢向量
    :param memory_bank: 記憶銀行（向量數據庫）
    :param top_k: 返回 top-k 條記憶
    :param similarity_threshold: 相似度閾值
    :return: 相關記憶列表
    """
    # 向量搜索
    results = memory_bank.search(
        vector=query_embedding,
        top_k=top_k * 2,  # 多檢索一些，後續過濾
        threshold=similarity_threshold
    )

    # 精選 top-k
    selected = sorted(results, key=lambda x: x['score'], reverse=True)[:top_k]
    return selected

Step 3: Context Merging

def merge_context(query, memories):
    """合併查詢和檢索到的記憶"""
    context_parts = [query]
    for memory in memories:
        context_parts.append(f"記憶：{memory['content']}（相似度：{memory['score']:.2f}）")
    return "\n".join(context_parts)

3. Advanced routing strategy

Time-Aware Routing

def time_aware_router(query_embedding, memory_bank, time_window_hours=24):
    """
    時間感知路由：優先召回最近的記憶
    """
    # 獲取記憶時間戳
    memories = memory_bank.search(
        vector=query_embedding,
        top_k=10
    )

    # 按時間排序
    memories_with_time = [
        {**m, 'timestamp': m.get('metadata', {}).get('timestamp')}
        for m in memories
    ]

    # 過濾時間窗口內的記憶
    recent_memories = [
        m for m in memories_with_time
        if (datetime.now() - m['timestamp']).total_seconds() < time_window_hours * 3600
    ]

    return recent_memories

Intent Classification Routing

def intent_classification_router(query, memory_bank, intent_classifier):
    """
    意圖分類路由：根據使用者意圖選擇記憶策略
    """
    # 分類使用者意圖
    intent = intent_classifier.predict(query)

    # 根據意圖選擇記憶策略
    if intent == 'customer_service':
        # 客服場景：優先召回歷史對話
        return memory_bank.search(
            vector=vectorize(query),
            top_k=10,
            filter={'category': 'customer_service'}
        )
    elif intent == 'personal_assistant':
        # 個人助理場景：優先召回個人偏好記憶
        return memory_bank.search(
            vector=vectorize(query),
            top_k=10,
            filter={'category': 'personal_preference'}
        )
    else:
        # 通用路由
        return memory_bank.search(
            vector=vectorize(query),
            top_k=5
        )

Performance optimization of memory router

1. Index optimization

HNSW Index

Feature: Hierarchical Navigable Small World diagram
Advantages: Fast search speed and high accuracy
Recommended libraries: faiss, hnswlib

from faiss import IndexHNSWFlat

# 創建 HNSW 索引
index = IndexHNSWFlat(dimension, M)  # M 為每層的連接數

IVF Index

Feature: Inverted file index
Advantages: Suitable for massive data
Disadvantages: Index creation is slow in the initial stage

2. Mixed search strategy

Vector retrieval + BM25 combination

def hybrid_search(query, vector_index, keyword_index, alpha=0.7):
    """
    混合檢索：向量檢索（語意）+ BM25（關鍵詞）
    """
    # 向量檢索
    vector_results = vector_index.search(query_embedding, top_k=5)

    # BM25 檢索
    keyword_results = keyword_index.search(query, top_k=5)

    # 結合結果
    combined_results = merge_results(vector_results, keyword_results, alpha)
    return combined_results

3. Memory compression and update

Memory summary

def summarize_memory(memory):
    """使用 LLM 摘要記憶內容"""
    prompt = f"請為以下記憶生成簡短摘要（最多 50 字）：\n{memory['content']}"
    summary = llm.generate(prompt)
    return summary

Memory tiered storage

def memory_partitioning(memory_bank, user_id):
    """
    記憶分層：熱記憶（最近使用）、溫記憶（定期使用）、冷記憶（很少使用）
    """
    # 熱記憶：最近 7 天使用過
    hot_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=7
    )

    # 溫記憶：7-30 天使用過
    warm_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=30,
        exclude=hot_memories
    )

    # 冷記憶：30 天以上未使用
    cold_memories = memory_bank.get_memories(
        user_id=user_id,
        last_accessed_days=30,
        exclude=hot_memories + warm_memories
    )

    return hot_memories, warm_memories, cold_memories

Practical application cases

Case 1: Personal Assistant AI Agent

Scenario: AI assistant remembers user preferences, goals and habits

Implementation:

class PersonalAssistantMemoryRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = HybridRouter()

    def handle_query(self, user_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：優先召回個人相關記憶
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            user_id=user_id,
            top_k=10,
            priority=['preferences', 'goals', 'history']
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            user_id=user_id
        )

        # 更新記憶（如有新記憶）
        if has_new_memory(query, response):
            self.update_memory(user_id, query, response)

        return response

Case 2: Customer Service AI Agent

Scenario: AI customer service remembers customer historical conversations and provides personalized services

Implementation:

class CustomerServiceRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = TimeAwareRouter()

    def handle_query(self, customer_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：優先召回該客戶的歷史記錄
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            customer_id=customer_id,
            time_window_hours=24,
            top_k=15
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            customer_id=customer_id,
            style='customer_service'
        )

        # 更新記憶
        self.update_memory(customer_id, query, response)

        return response

Case 3: Scientific Research Assistant AI Agent

Scenario: AI research assistant assists researchers in literature review and experimental design

Implementation:

class ResearchAssistantRouter:
    def __init__(self):
        self.memory_bank = MemoryBank()
        self.router = IntentClassificationRouter()

    def handle_query(self, researcher_id, query):
        # 向量化查詢
        query_embedding = vectorize(query)

        # 記憶路由：根據意圖選擇記憶
        memories = self.router.route(
            query_embedding,
            memory_bank=self.memory_bank,
            researcher_id=researcher_id,
            top_k=20
        )

        # 合併上下文
        context = merge_context(query, memories)

        # LLM 生成回應
        response = llm.generate(
            prompt=context,
            researcher_id=researcher_id,
            model='research-focused'
        )

        # 更新記憶
        self.update_memory(researcher_id, query, response)

        return response

Best practices for memorizing routers

1. Threshold design

Similarity Threshold: Recommended 0.78-0.85
Top-K Settings: Recommended 5-10
Time window: adjusted according to the scenario (customer service 24 hours, personal assistant 7 days)

2. Memory update strategy

AUTO-UPDATE: Automatically saved after every conversation
Periodic Compression: Automatic weekly summary memory
User Control: Allows users to delete or edit memories

3. Privacy and Security

Data Encryption: Vector database encrypted storage
User Authorization: Memory query requires user authorization
Data Retention: Users can set the memory retention period

4. Monitoring and Optimization

Search Performance Monitoring: Track search time, similarity score
User Feedback: Collect user feedback on memory relevance
Auto-tuning: Automatically adjust routing strategies based on feedback

Summary: The future direction of memory routers

Memory router is the key technology for upgrading AI Agent from “dialogue engine” to “intelligent assistant”. Through intelligent selection strategies, it allows AI to:

Precise retrieval: Recall only the most relevant memories
Efficient execution: Reduce unnecessary calculations
Continuous Learning: Continuous optimization from user interaction

With the development of AI Agent, memory routers will face more challenges:

Retrieval efficiency of large-scale memories: How to support tens of billions of vector retrievals
Integration of multi-modal memory: Unified routing of multiple memory types such as text, images, and speech
Cross-agent memory sharing: Memory coordination between multiple AI Agents

In the future, the memory router will become the core component of the AI Agent system and the basis for realizing a smart assistant that truly “understands you, remembers you, and helps you”.

References

Doris AI Academy: "How to design “long-term memory” function for AI Agent? 》
Koala framework: memory module design (short-term/long-term memory, retrieval, update)
OpenAI Embedding API: vector embedding service
FAISS: advanced vector search library
HNSW algorithm: efficient similarity search algorithm

Release date: 2026-04-02 Author: Cheese Cat Tags: AI Agent, memory architecture, memory router, vector retrieval