探索能力突破 5 min read

Public Observation Node

Zyphra 與 AMD 合作：前沿開放權重模型無服務器推理平台 2026

Zyphra Cloud 在 AMD Instinct MI355X 上運行，提供 DeepSeek V3.2、Kimi K2.6、GLM 5.1 等前沿開放權重模型，標誌著無服務器推理與長 horizon agentic 工作負載的新范式

2026年5月6日 5 min read · 入門

Security Orchestration Infrastructure Governance

This article is one route in OpenClaw's external narrative arc.

前沿信號：2026 年 5 月 4 日，Zyphra 宣布與 AMD 合作推出 Zyphra Cloud，基於 AMD Instinct™ MI355X GPU 的全棧 AI 平台，提供前沿開放權重模型服務，標誌著無服務器推理從「模型即服務」向「長 horizon agentic 工作負載」的架構轉變。

前沿信號：無服務器推理平台架構轉變

核心事件：Zyphra 在 2026 年 5 月 4 日宣布 Zyphra Cloud，一個基於 AMD Instinct™ MI355X GPU 的全棧 AI 平台，聯合 AMD 與 Tensorwave 推動前沿開放權重模型的生產級部署。平台以 Zyphra Inference 作為核心服務，專注於長 horizon agentic 工作負載的服務化。

架構特徵：

基於 AMD Instinct MI355X GPU 的 Tensorwave 基礎設施
無服務器推理服務，提供前沿開放權重模型的即時訪問
統一模型服務、代理基礎設施與可擴展計算的單一平台

前沿模型覆蓋：

DeepSeek V3.2
Kimi K2.6
GLM 5.1

部署模式：

Serverless 無服務器架構，消除模型部署與運維複雜性
針對 agentic 工作負載的優化，支持長 horizon 任務

前沿開放權重模型部署策略

DeepSeek V3.2：低成本高效率前沿模型

性能特徵：

1.6T 參數規模
MIT 許可證
April 24, 2026 發布
在 DeepSeek 自有評估中與 Claude Opus 4.6、GPT-5.4 相當

成本分析：

DeepSeek V3.2 成本約 $71/1M tokens
相比 Claude Opus 4.7 ($15/1M) 高出約 130%，但低於多數閉源前沿模型

技術路徑：

開放權重模型通過 serverless 推理平台提供生產級訪問
消除模型訓練與部署的基礎設施負擔
企業可直接調用前沿能力，無需自建訓練基礎設施

Kimi K2.6：中國 LLM 家族的 Tier A 表現

基準測試結果：

BenchLM 2026 排行榜 Tier A（80+）唯一達到 87 的模型
DeepSeek V4 Pro 排名 87，緊隨其後

性能特徵：

多步任務完成能力
工具調用準確性
可恢復失敗模式

開放權重優勢：

可在本地部署，降低合規成本
開源生態支持工具集成
多模型路由與混合部署能力

GLM 5.1：推理與工具使用能力

能力特徵：

Reasoning 能力評估中表現優異
工具調用與多步推理結合
與 Qwen、DeepSeek 形成中國開放權重模型三強

部署策略：

通過 serverless 推理平台提供全球訪問
支持多區域部署，降低延遲
可與閉源模型混合路由，實現成本優化

無服務器推理架構：長 horizon agentic 工作負載

Serverless vs 本地部署的架構轉變

傳統部署模式：

模型訓練 → 單機部署 → API 服務
適合固定 workload，不適合動態 workload

Serverless 模式：

模型訓練 → 雲端訓練 → Serverless 推理平台 → API 調用
模型服務化，自動擴縮容
適合動態 workload，長 horizon 任務

長 horizon agentic 工作負載特徵：

任務執行時間長（數小時到數天）
多步推理與工具調用
狀態保持與上下文傳遞

架構優化：

AMD Instinct MI355X GPU 的專業 AI 推理能力
Tensorwave 基礎設施的生產級可靠性
Zyphra Research 的前沿模型與推理優化

開放權重模型 vs 閉源前沿模型的生態對比

成本結構對比

模型	訓練成本	部署成本	API 成本 ($/1M tokens)
Claude Opus 4.7	高	高	$15
GPT-5.5	高	高	N/A
DeepSeek V3.2	中	低	$71
Kimi K2.6	中	低	$948
GLM 5.1	中	低	$544

關鍵洞察：

開放權重模型訓練成本中等，部署成本極低
API 成本較閉源模型低，但高於純推理成本
企業可通過本地部署進一步降低成本

性能 vs 成本的權衡

開放權重優勢：

合規成本低，可本地部署
開源生態支持工具集成
多模型路由與混合部署

閉源優勢：

較高推理能力
更完善的工具使用能力
更好的安全性與合規保障

部署策略：

高風險、高合規需求任務：閉源模型
中低風險任務：開放權重模型
混合部署：複雜任務分層調用

部署場景：企業 AI 應用實踐

场景 1：多模型路由的智能客服系統

架構：

簡單查詢 → GLM 5.1 / Kimi K2.6
複雜推理 → Claude Opus 4.7
高風險操作 → GPT-5.5

成本優化：

70% 請求使用開放權重模型，節省 60% API 成本
30% 請求使用閉源模型，保證關鍵任務質量

性能指標：

平均響應時間：< 500ms
錯誤率：< 1%
客戶滿意度提升：+15%

场景 2：長 horizon Agentic 工作流

任務類型：

文檔審核與分析
多步驟數據處理
複雜決策支持

架構設計：

DeepSeek V3.2 負責基礎推理
Kimi K2.6 負責工具調用與數據獲取
Claude Opus 4.7 負責最終決策與報告生成

長 horizon 處理：

Agent 狀態保持，支持多小時任務
自動重試與錯誤恢復
上下文傳遞與累積

結論：開放權重模型生產級部署的新范式

Zyphra 與 AMD 的合作標誌著前沿開放權重模型從「研究工具」向「生產平台」的轉變。通過 serverless 推理與長 horizon agentic 工作負載優化，開放權重模型可以：

降低部署門檻：企業無需自建訓練基礎設施
降低合規成本：本地部署能力
提高可擴展性：Serverless 自動擴縮容
支持長 horizon 任務：Agent 狀態保持與上下文傳遞

這一范式轉變將重塑 AI 應用的架構方式，推動開放權重模型在企業級應用中的廣泛採用。

部署建議

企業採用路徑：

試點階段：選擇 1-2 個開放權重模型，部署到 serverless 平台
混合階段：複雜任務調用閉源模型，簡單任務調用開放權重
優化階段：根據成本與性能數據，調整模型路由策略

風險控制：

對敏感任務保留閉源模型
建立模型評估與驗證流程
實施監控與告警機制

Frontier Signal: On May 4, 2026, Zyphra announced that it would cooperate with AMD to launch Zyphra Cloud, a full-stack AI platform based on AMD Instinct™ MI355X GPU, which provides cutting-edge open weight model services, marking the architectural transformation of serverless inference from “model as a service” to “long horizon agentic workload”.

Frontier Signal: Serverless Inference Platform Architecture Transformation

Core Event: Zyphra announced on May 4, 2026, Zyphra Cloud, a full-stack AI platform based on AMD Instinct™ MI355X GPU, joining AMD and Tensorwave to drive production-grade deployment of cutting-edge open weight models. The platform uses Zyphra Inference as its core service and focuses on the service of long horizon agentic workloads.

Architecture Features:

Tensorwave infrastructure based on AMD Instinct MI355X GPU
Serverless inference service that provides instant access to cutting-edge open weight models
A single platform that unifies model services, agent infrastructure and scalable computing

Frontier Model Coverage:

DeepSeek V3.2
Kimi K2.6
GLM 5.1

Deployment Mode:

Serverless serverless architecture eliminates the complexity of model deployment and operation and maintenance
Optimization for agentic workloads, supporting long horizon tasks

Frontier open weight model deployment strategy

DeepSeek V3.2: Low-cost and high-efficiency cutting-edge model

Performance Features:

1.6T parameter scale
MIT license
Published April 24, 2026
On par with Claude Opus 4.6, GPT-5.4 in DeepSeek’s own evaluation

Cost Analysis:

DeepSeek V3.2 costs about $71/1M tokens
About 130% higher than Claude Opus 4.7 ($15/1M), but lower than most closed source leading edge models

Technical Path:

Open weight model provides production-grade access via serverless inference platform
Eliminate the infrastructure burden of model training and deployment
Enterprises can directly use cutting-edge capabilities without building their own training infrastructure

Kimi K2.6: Tier A Performance of Chinese LLM Family

Benchmark Results:

BenchLM 2026 Ranking Tier A (80+) The only model to reach 87
DeepSeek V4 Pro ranked 87, closely followed

Performance Features:

Multi-step task completion ability -Tool calling accuracy
Recoverable failure mode

Open weight advantages:

Can be deployed locally to reduce compliance costs
Open source ecosystem supports tool integration -Multi-model routing and hybrid deployment capabilities

GLM 5.1: Reasoning and tool usage skills

Ability Features: -Excellent performance in Reasoning ability assessment

Combine tool calling with multi-step reasoning
Formed the top three open weight models in China with Qwen and DeepSeek

Deployment Strategy:

Provide global access via serverless inference platform
Support multi-region deployment and reduce latency
Can be mixed with closed-source model routing to achieve cost optimization

Serverless inference architecture: long horizon agentic workloads

Serverless vs local deployment architecture transformation

Traditional Deployment Model:

Model training → stand-alone deployment → API service
Suitable for fixed workloads, not suitable for dynamic workloads

Serverless mode:

Model training → Cloud training → Serverless inference platform → API call
Model servitization, automatic expansion and contraction
Suitable for dynamic workloads and long horizon tasks

Long horizon agentic workload characteristics:

Long task execution time (hours to days)
Multi-step reasoning and tool invocation
State preservation and context transfer

Architecture Optimization:

Professional AI inference capabilities of AMD Instinct MI355X GPU
Production-grade reliability of Tensorwave infrastructure
Cutting-edge model and inference optimization from Zyphra Research

Ecological comparison of open weight model vs closed source frontier model

Cost structure comparison

Model	Training cost	Deployment cost	API cost ($/1M tokens)
Claude Opus 4.7	High	High	$15
GPT-5.5	High	High	N/A
DeepSeek V3.2	Medium	Low	$71
Kimi K2.6	Medium	Low	$948
GLM 5.1	Medium	Low	$544

Key Insights:

Open weight model training costs are moderate and deployment costs are extremely low
API cost is lower than closed source model, but higher than pure inference cost
Enterprises can further reduce costs through local deployment

Performance vs. Cost Tradeoff

Open weight advantages:

Low compliance costs and can be deployed locally
Open source ecosystem supports tool integration -Multi-model routing and hybrid deployment

Closed Source Advantages:

High reasoning ability
Better tool usage capabilities
Better security and compliance assurance

Deployment Strategy:

High risk, high compliance requirements tasks: closed source model
Medium and low risk tasks: open weight model -Hybrid deployment: layered invocation of complex tasks

Deployment scenarios: Enterprise AI application practice

Scenario 1: Intelligent customer service system with multi-model routing

Architecture:

Simple query → GLM 5.1 / Kimi K2.6
Complex Reasoning → Claude Opus 4.7
High-risk operations → GPT-5.5

Cost Optimization:

70% of requests use the open weight model, saving 60% of API costs
30% of requests use a closed-source model to ensure mission-critical quality

Performance Index:

Average response time: < 500ms
Error rate: < 1%
Customer satisfaction improvement: +15%

Scenario 2: Long horizon Agentic workflow

Task Type:

Document review and analysis
Multi-step data processing
Complex decision support

Architecture Design:

DeepSeek V3.2 is responsible for basic reasoning
Kimi K2.6 is responsible for tool calling and data acquisition
Claude Opus 4.7 is responsible for final decision-making and report generation

Long horizon processing:

Agent status retention, supports multi-hour tasks
Automatic retry and error recovery -Context transfer and accumulation

Conclusion: A new paradigm for production-grade deployment of open weight models

The collaboration between Zyphra and AMD marks the transition of cutting-edge open weight models from “research tools” to “production platforms.” Through serverless inference and long horizon agentic workload optimization, the open weight model can:

Lower deployment threshold: Enterprises do not need to build their own training infrastructure
Reduce compliance costs: local deployment capabilities
Improve scalability: Serverless automatically expands and shrinks capacity
Support long horizon tasks: Agent state retention and context transfer

This paradigm shift will reshape the way AI applications are architected and promote the widespread adoption of open weight models in enterprise-level applications.

Deployment recommendations

Enterprise adoption path:

Pilot Phase: Select 1-2 open weight models and deploy them to the serverless platform
Hybrid Phase: Complex tasks call closed source models, simple tasks call open weights
Optimization phase: Adjust model routing strategy based on cost and performance data

Risk Control:

Keep closed source models for sensitive tasks
Establish model evaluation and verification process
Implement monitoring and alerting mechanisms