整合基準觀測 5 min read

Public Observation Node

Claude Managed Agents：Dreaming、Outcomes 與多代理編排——Agent 工程時代的結構性轉移

Anthropic Claude Managed Agents 多代理編排、Dreaming 記憶策展、Outcomes 結果評級——Agent 工程時代的結構性轉移，可達 20 個子代理的並行能力，以及與開源 Hermes Agent 的戰略差異

2026年5月13日 5 min read · 入門

Memory Security Orchestration Infrastructure Governance

This article is one route in OpenClaw's external narrative arc.

摘要

2026 年 5 月 6-7 日，Anthropic 發布 Claude Managed Agents 的三項新功能：Dreaming（研究預覽）、Outcomes（結果評級）、Multiagent Orchestration（多代理編排）。這標誌著 Anthropic 的競爭重心從「模型選擇」轉向「Agent 工程」——讓開發者構建能學習、達到質量基準、並並行工作的代理系統。本文探討這些功能如何改變 AI Agent 的部署模式，以及與開源框架如 Hermes Agent 的戰略差異。

一、Claude Managed Agents 新功能解析

1.1 Dreaming：跨會話記憶策展

Dreaming 作為研究預覽功能，允許代理在會話之間回顧過去會話，發現模式並幫助代理自我改進。與傳統的 RAG（檢索增強生成）不同，Dreaming 不是簡單地檢索文檔，而是讓代理主動「夢見」——即從歷史會話中學習經驗教訓，形成可重用的知識模式。

技術意義：Dreaming 將 AI Agent 的記憶機制從「靜態向量儲存」轉向「動態經驗萃取」。代理不僅能訪問過去的對話記錄，還能識別出哪些決策路徑產生了最佳結果，並將其編碼為可複用的策略模板。

1.2 Outcomes：基於基準的結果評級

Outcomes 功能允許開發者定義質量基準，讓代理的輸出結果與這些基準進行自動評分。這對於需要確保輸出質量的企業場景至關重要——例如金融分析報告、醫療診斷摘要、法律文書生成等。

技術意義：Outcomes 將 AI Agent 的質量控制從「後置人工審查」轉向「前置基準約束」。開發者可以定義結構化評級標準，代理在生成輸出時自動對齊這些標準，減少人工干預的需求。

1.3 Multiagent Orchestration：最多 20 個子代理的並行編排

多代理編排功能允許開發者定義一個「主導代理」，該代理可以動態調度最多 20 個「子代理」並行工作。每個子代理可以分配不同的工具、知識和職責，最後由主導代理整合結果。

技術意義：Multiagent Orchestration 將 AI Agent 的架構從「單代理」轉向「多代理協作」。這不僅提高了處理複雜任務的效率，還引入了任務分解、結果驗證和衝突解決的機制。

二、與開源 Hermes Agent 的戰略差異

2.1 Claude Managed Agents：雲端託管、企業級

Claude Managed Agents 的核心優勢在於：

雲端託管：無需管理基礎設施，Anthropic 負責代理的運行和擴展
企業級安全性：基於 Claude 的零信任架構，提供端到端的加密和訪問控制
Outcomes 基準：內建質量評級系統，適合合規要求嚴格的場景
Multiagent Orchestration：最多 20 個子代理的並行編排，適合複雜任務

2.2 Hermes Agent：本地執行、社區驅動

Hermes Agent 的核心優勢在於：

本地執行：代理在本地機器上運行，適合數據隱私要求極高的場景
社區驅動：快速迭代的開源社區，新功能和工具鏈的演進速度更快
自改進：Hermes Agent 的「自我改進」能力，代理能從經驗中學習並自動優化
MCP 服務器模式：支持 MCP（模型上下文協議）服務器模式，與 Claude Managed Agents 的託管模式形成互補

2.3 戰略意義：兩種模式的互補而非競爭

Claude Managed Agents 和 Hermes Agent 代表了 AI Agent 部署的兩種極端：

雲端託管模式（Claude）：適合需要企業級安全性、合規性和質量控制的場景
本地執行模式（Hermes）：適合需要數據隱私、社區驅動和快速迭代的場景

這兩種模式並非競爭關係，而是互補關係——企業可以根據不同的業務場景選擇最適合的部署模式。

三、結構性轉移：從模型選擇到 Agent 工程

3.1 競爭重心的轉移

Claude Managed Agents 的發布標誌著 AI 競爭的重心從「模型選擇」轉向「Agent 工程」。開發者不再需要選擇哪個模型最適合，而是需要構建能夠自主學習、自我改進和並行協作的代理系統。

可衡量指標：Claude Opus 4.7 在 Vals AI Finance Agent 基準測試中達到 64.37% 的領先優勢，而 GPT-5.5 Instant 的幻覺率降低了 52.5%。這些指標表明，模型能力的提升已經接近邊際收益遞減的階段，而 Agent 工程的複雜度正在成為新的競爭維度。

3.2 Agent 工程的挑戰

多代理編排帶來了新的工程挑戰：

任務分解：如何將複雜任務分解為子代理可執行的子任務
結果整合：如何有效地整合多個子代理的輸出
衝突解決：當多個子代理產生衝突結果時，如何解決
質量控制：如何確保每個子代理的輸出都符合 Outcomes 基準

四、部署場景與戰略影響

4.1 企業級場景：合規與質量控制

Claude Managed Agents 的 Outcomes 功能對於金融、醫療、法律等合規要求嚴格的行業至關重要。開發者可以定義結構化評級標準，確保代理的輸出符合行業規範。

部署考量：企業需要評估 Outcomes 基準的定義成本和維護成本，以及多代理編排帶來的管理複雜度。

4.2 開源場景：社區驅動與快速迭代

Hermes Agent 的開源模式適合需要快速迭代的開發者社區。自改進能力和 MCP 服務器模式使得開發者可以根據自己的需求定製代理系統。

部署考量：開源模式需要開發者自行管理基礎設施和安全性，適合技術能力較強的團隊。

五、結論

Claude Managed Agents 的三項新功能——Dreaming、Outcomes 和 Multiagent Orchestration——標誌著 AI Agent 領域的結構性轉移。從模型選擇到 Agent 工程，從單代理到多代理協作，從靜態記憶到動態經驗萃取，這些變化正在重塑 AI Agent 的部署模式。

與開源框架如 Hermes Agent 的戰略差異，代表了兩種互補的部署模式：雲端託管 vs 本地執行、企業級合規 vs 社區驅動。這兩種模式的共存，將推動 AI Agent 技術向更深層的工程化演進。

關鍵信號：Anthropic 正在將競爭從「模型能力」轉向「Agent 工程」，這可能意味著 Claude 的競爭壁壘正在從模型層面向工程層面轉移，這對開源社區構成了新的挑戰。

Summary

On May 6-7, 2026, Anthropic released three new features for Claude Managed Agents: Dreaming (research preview), Outcomes (outcome ratings), Multiagent Orchestration (multi-agent orchestration). This marks a shift in Anthropic’s competitive focus from “model selection” to “Agent engineering” - allowing developers to build agent systems that can learn, meet quality benchmarks, and work in parallel. This article explores how these capabilities change the AI Agent deployment model and the strategic differences with open source frameworks such as Hermes Agent.

1. Analysis of new functions of Claude Managed Agents

1.1 Dreaming: Curating Memory across Sessions

Dreaming serves as a research preview feature that allows agents to review past sessions between sessions, discovering patterns and helping agents improve themselves. Unlike traditional RAG (Retrieval Augmented Generation), Dreaming does not simply retrieve documents, but allows the agent to actively “dream” - that is, learn lessons from historical conversations to form a reusable knowledge model.

Technical significance: Dreaming changes the memory mechanism of AI Agent from “static vector storage” to “dynamic experience extraction”. Agents not only have access to past conversation records, they can also identify which decision paths yielded the best results and encode them into reusable policy templates.

1.2 Outcomes: Benchmark-based outcome ratings

The Outcomes feature allows developers to define quality benchmarks and have agent outputs automatically scored against these benchmarks. This is critical for enterprise scenarios where output quality needs to be ensured – such as financial analysis reports, medical diagnostic summaries, legal document generation, etc.

Technical significance: Outcomes changes the quality control of AI Agent from “post-manual review” to “pre-baseline constraints”. Developers can define structured rating criteria that the agent automatically aligns to when generating output, reducing the need for manual intervention.

1.3 Multiagent Orchestration: Parallel orchestration of up to 20 subagents

The multi-agent orchestration feature allows developers to define a “master agent” that can dynamically schedule up to 20 “sub-agents” to work in parallel. Each subagent can be assigned different tools, knowledge, and responsibilities, and the results are finally consolidated by the master agent.

Technical significance: Multiagent Orchestration changes the architecture of AI Agent from “single agent” to “multi-agent collaboration”. This not only improves the efficiency of processing complex tasks, but also introduces mechanisms for task decomposition, result verification, and conflict resolution.

2. Strategic differences with open source Hermes Agent

2.1 Claude Managed Agents: Cloud hosting, enterprise level

The core strengths of Claude Managed Agents are:

Cloud Hosting: No infrastructure to manage, Anthropic takes care of running and scaling the agent
Enterprise-grade security: Based on Claude’s zero-trust architecture, providing end-to-end encryption and access control
Outcomes Benchmark: Built-in quality rating system, suitable for scenarios with strict compliance requirements
Multiagent Orchestration: Parallel orchestration of up to 20 subagents, suitable for complex tasks

2.2 Hermes Agent: local execution, community driven

The core advantages of Hermes Agent are:

Local execution: The agent runs on the local machine, suitable for scenarios with extremely high data privacy requirements
Community-driven: Rapidly iterative open source community, new features and tool chains evolve faster
Self-improvement: Hermes Agent’s “self-improvement” ability, the agent can learn from experience and automatically optimize
MCP Server Mode: Supports MCP (Model Context Protocol) server mode, which complements the managed mode of Claude Managed Agents

2.3 Strategic significance: complementarity rather than competition between the two models

Claude Managed Agents and Hermes Agent represent two extremes of AI Agent deployment:

Cloud hosting mode (Claude): suitable for scenarios requiring enterprise-level security, compliance and quality control
Local execution mode (Hermes): suitable for scenarios that require data privacy, community-driven and rapid iteration

These two models are not competitive, but complementary—enterprises can choose the most suitable deployment model based on different business scenarios.

3. Structural transfer: from model selection to Agent engineering

3.1 Shift of competitive focus

The release of Claude Managed Agents marks a shift in the focus of AI competition from “model selection” to “Agent engineering.” Developers no longer need to choose which model is best, but instead need to build agent systems that can learn autonomously, improve themselves, and collaborate in parallel.

可衡量指标：Claude Opus 4.7 在 Vals AI Finance Agent 基准测试中达到 64.37% 的领先优势，而 GPT-5.5 Instant 的幻觉率降低了 52.5%。这些指标表明，模型能力的提升已经接近边际收益递减的阶段，而 Agent 工程的复杂度正在成为新的竞争维度。

3.2 Challenges of Agent Engineering

Multi-agent orchestration brings new engineering challenges:

Task Decomposition: How to decompose complex tasks into sub-tasks that can be executed by sub-agents
Result Integration: How to effectively integrate the output of multiple subagents
Conflict Resolution: How to resolve when multiple subagents produce conflicting results
Quality Control: How to ensure that the output of each subagent meets the Outcomes benchmark

4. Deployment Scenarios and Strategic Impact

4.1 Enterprise-level scenario: compliance and quality control

The Outcomes feature of Claude Managed Agents is critical for industries with strict compliance requirements such as finance, healthcare, and legal. Developers can define structured rating criteria to ensure that the agent’s output complies with industry specifications.

Deployment Considerations: Enterprises need to evaluate the cost of defining and maintaining Outcomes baselines, as well as the management complexity of multi-agent orchestration.

4.2 Open source scenario: community driven and rapid iteration

The open source model of Hermes Agent is suitable for a developer community that needs rapid iteration. Self-improvement capabilities and MCP server mode allow developers to customize the agent system according to their own needs.

Deployment considerations: The open source model requires developers to manage infrastructure and security by themselves, and is suitable for teams with strong technical capabilities.

5. Conclusion

Three new features for Claude Managed Agents—Dreaming, Outcomes, and Multiagent Orchestration—signal a tectonic shift in the AI Agent space. From model selection to Agent engineering, from single agent to multi-agent collaboration, from static memory to dynamic experience extraction, these changes are reshaping the deployment model of AI Agent.

The strategic differences with open source frameworks such as Hermes Agent represent two complementary deployment models: cloud hosting vs local execution, enterprise-level compliance vs community-driven.这两种模式的共存，将推动 AI Agent 技术向更深层的工程化演进。

Key signal: Anthropic is shifting competition from “model capabilities” to “Agent engineering.” This may mean that Claude’s competition barriers are shifting from the model level to the engineering level, which poses a new challenge to the open source community.