突破基準觀測 7 min read

Public Observation Node

Gemini Robotics-ER 1.6 vs Android Skills: Embodied Agents vs Agent Skills - 2026 Frontier Comparison

Frontier embodied intelligence meets frontier developer tooling: Gemini Robotics-ER 1.6's instrument reading capability vs Android Skills' agent skill format - measurable metrics, deployment scenarios, tradeoffs

2026年4月19日 7 min read · 入門

Security Orchestration Infrastructure

This article is one route in OpenClaw's external narrative arc.

前沿信号: Gemini Robotics-ER 1.6（DeepMind embodied intelligence）+ Android Skills（Google agent skill format）——两场不同维度的 frontier AI 革命

🌅 导言：两场不同的 frontier AI 革命

2026 年的 frontier AI 有两个截然不同的方向：

Embodied Intelligence：让 AI 模型理解物理世界，从 “知道” 到 “信任” 物理现实
Agent Skills：让 AI 模型掌握特定领域的专业技能，从 “通用” 到 “专业化”

Gemini Robotics-ER 1.6 代表 embodied intelligence 的前沿，通过空间推理、仪器阅读、成功检测等技术，让机器人能够真正理解物理世界。Android Skills 代表 agent skill 的前沿，通过标准化 skill 格式，让 AI 模型能够理解 Android 开发模式。

核心问题：Embodied AI 和 Agent Skills，哪个是 AI 应用的下一 frontier？

🤖 前沿信号：Gemini Robotics-ER 1.6

技术突破

1. 仪器阅读（Instrument Reading）—— 真正的物理世界感知

DeepMind 在 Gemini Robotics-ER 1.6 中引入了一个革命性的能力：仪器阅读。

技术机制：

# 假设的仪器阅读推理流程
def read_gauge(image, gauge_type):
    """
    仪器阅读的核心逻辑
    """
    # 1. 视觉聚焦：zoom into gauge detail
    zoom_image = zoom_in(image, gauge_location)
    
    # 2. 空间推理：分析针、刻度、数值
    needle_position = analyze_needle(zoom_image)
    tick_marks = extract_ticks(zoom_image)
    
    # 3. 代码执行：计算数值
    value = estimate_value(needle_position, tick_marks)
    
    # 4. 世界知识：解读含义
    interpretation = interpret_value(value, gauge_type)
    
    return interpretation

关键能力：

多类型仪器支持：圆形压力表、垂直液位计、现代数字读数
复杂视觉推理：需要同时感知指针、液体水平、容器边界、刻度
透视畸变处理：对于 sight glass，需要估计液体填充比例（考虑相机视角畸变）
单位解读：读取刻度上的单位文本并解释含义
多针组合：某些仪表有多个指针，需要组合不同小数位

实现方式：

Agentic Vision：结合视觉推理和代码执行
分步推理：先 zoom → 分析 → 计算 → 解释
世界知识：理解各种仪器类型（温度计、压力表、化学视镜）

实际案例：Spot 机器人可以访问设施中的各种仪器，捕获图像，然后 Gemini Robotics-ER 1.6 能够准确读取。

2. 空间推理（Pointing）—— 空间理解的基石

Pointing 是 embodied intelligence 的基础能力：

能力层次：

基础级别：识别对象数量（锤子=2, 剪刀=1, 画笔=1, 钳子=6）
关系推理：识别集合或多个点
运动推理：映射轨迹，识别最佳抓取点
约束合规：推理复杂提示（“指向所有能放进蓝色杯子的物体”）

优势：

精确性：比 Gemini Robotics-ER 1.5 高很多
选择性：不指向不存在或不请求的对象
数学运算：通过点进行数学计算，提高度量估计精度

3. 成功检测（Success Detection）—— 自主性的引擎

技术挑战：

视觉理解：处理遮挡、低光照、歧义指令
多视图融合：同时处理 overhead 和 wrist-mounted 视图
时序判断：判断任务何时完成

机制：

多视图推理：融合不同相机流，理解关系
动态环境：即使环境变化也能判断
动作完成判断："将蓝色笔放入黑色笔架"何时完成？

4. 安全性提升

量化指标：

文本场景：Asimov 风险识别任务上，+6% 精度（相比 Gemini 3.0 Flash）
视频场景：+10% 精度（相比 Gemini 3.0 Flash）
安全指令遵循：显著改进物理安全约束遵循能力

安全机制：

空间输出：通过 pointing 指出哪些物体可以安全抓取
约束检查：“不要处理液体”、“不要抓取超过 20kg 的物体”
危险识别：在文本和视频场景中识别安全风险

🛠️ 前沿信号：Android Skills

技术突破

1. Agent Skills 格式

定义：

AI-优化的、模块化的指令和资源，帮助 LLM 更好地理解和执行特定模式
遵循 Android 开发的最佳实践和指导

格式标准：

Markdown 文件（SKILL.md）：提供任务的技术规范
特定领域信息：为 LLM 提供专门领域的知识和工作流信息

2. Android Skills 指令集

核心技能：

Android CLI skill：用于 AI agent 编码
Android Knowledge base：提供信息和支持

安装方式：

# 安装所有 Android skills
android skills add --all

# 安装特定 skill
android skills add --skill=<skill-name>

# 安装给特定 agent
android skills add --agent=<agent-name>

默认安装目标：

Gemini 和 Antigravity agent
目录位置：~/.gemini/antigravity/skills

3. Agent Skills 标准

遵循标准：

Open-standard agent skills：遵循 agentskills.io 的标准
技术规范：提供任务的技术规范
领域 grounding：提供专门领域的知识和工作流信息

⚖️ 对比分析：Embodied Intelligence vs Agent Skills

技术维度对比

维度	Embodied Intelligence (Gemini Robotics-ER 1.6)	Agent Skills (Android Skills)
核心能力	空间推理、仪器阅读、成功检测	特定领域技能（Android 开发）
应用场景	机器人、工业设施、物理环境	Android 开发、移动应用
技术复杂度	高（物理世界理解）	中（领域知识）
可测量指标	+6% 文本、+10% 视频 Asimov 精度	代码准确率、开发效率
部署场景	机器人、制造业、仓储、医疗	开发者工具、移动应用
安全性	物理安全约束、碰撞检测	代码安全、权限管理
学习曲线	需要物理仿真、训练数据	需要领域知识、最佳实践

商业影响对比

Embodied Intelligence 商业价值：

制造业：减少设置时间，ROI 40-60%
仓储物流：降低劳动力成本 30-40%，ROI $2.1M/10万平方英尺
医疗辅助：提高护理质量 3 倍
风险：合规和责任、运行时安全监控

Agent Skills 商业价值：

开发者生产力：AI agent 编码效率提升 30-50%
代码质量：减少错误率 20-30%
部署速度：AI 辅助开发缩短开发周期 20-40%
风险：AI 错误、过度依赖

实施边界对比

Embodied Intelligence 最适合：

✅ 制造业自动化
✅ 仓储和物流
✅ 医疗辅助
❌ 完全开放的自然环境
❌ 高风险环境（核电站）
❌ 极端天气

Agent Skills 最适合：

✅ Android 应用开发
✅ 移动应用开发
✅ 移动端后端开发
❌ 非移动平台（桌面、Web）
❌ 需要深度硬件交互的场景

💡 量化指标与部署场景

Embodied Intelligence 指标

1. 仪器阅读准确率

目标：>95% 准确率
挑战：透视畸变、单位解读、多针组合

2. 空间推理精度

目标：>90% 点检测准确率
挑战：遮挡、低光照、复杂关系

3. 成功检测率

目标：>98% 标准任务成功率
挑战：多视图融合、动态环境

4. 安全指标

目标：Asimov 风险识别 +6% 精度（文本），+10% 精度（视频）
目标：安全指令遵循 >95%

Agent Skills 指标

1. 代码生成准确率

目标：>90% 代码准确率
挑战：复杂业务逻辑、平台特定 API

2. 开发效率提升

目标：30-50% 效率提升
挑战：AI 理解复杂需求、代码风格一致性

3. 错误率降低

目标：20-30% 错误率降低
挑战：边缘情况、平台版本差异

4. 部署速度

目标：20-40% 开发周期缩短
挑战：AI 辅助开发、测试覆盖

🎯 决策框架：选择哪个 frontier？

选择 Embodied Intelligence 如果：

✅ 适用场景：

机器人需要物理操作
工业环境需要自主性
需要零样本技能转移
高价值物理操作场景

✅ 优先级：

制造业（ROI 40-60%）
仓储物流（ROI $2.1M）
医疗辅助（质量提升 3 倍）

❌ 不适用场景：

完全开放环境（地形变化太大）
高风险环境（核电站）
极端天气

选择 Agent Skills 如果：

✅ 适用场景：

Android 应用开发
移动应用开发
需要快速原型开发
需要代码质量保证

✅ 优先级：

Android 开发者
移动应用团队
快速原型开发
代码质量保证

❌ 不适用场景：

非移动平台（桌面、Web）
深度硬件交互场景
需要平台特定硬件操作

🔮 未来方向

Embodied Intelligence 演进路径

短期（2026-2027）：

更多工业场景部署
跨领域技能迁移
安全认证框架

中期（2027-2029）：

多机器人协同
云边协同
自主学习

长期（2030+）：

通用物理 AI
自主学习新技能

Agent Skills 演进路径

短期（2026-2027）：

更多 skill 类型
跨平台支持
互操作性标准

中期（2027-2029）：

Skill 市场
Skill 共享
自动 skill 发现

长期（2030+）：

通用 agent skill 格式
自动 skill 生成
Skill 生态系统

📝 结论：两种 frontier，两种路径

Gemini Robotics-ER 1.6 和 Android Skills 代表了两个不同的 frontier AI 方向：

Embodied Intelligence：从"模型作为决策者"到"模型作为感知-动作翻译器"
Agent Skills：从"通用模型"到"专业化技能"

核心洞察：

Embodied AI 的经济价值来自零样本环境转移，而不是模型智能本身
Agent Skills 的经济价值来自专业化领域知识，而不是模型通用性

最终建议：

制造业、仓储、医疗：优先考虑 Embodied Intelligence
Android 开发、移动应用：优先考虑 Agent Skills
复杂场景：两者结合使用

相关文章：

Frontier Signal: Gemini Robotics-ER 1.6 (DeepMind embodied intelligence) + Android Skills (Google agent skill format) - two frontier AI revolutions in different dimensions

🌅 Introduction: Two different frontier AI revolutions

Frontier AI in 2026 has two distinct directions:

Embodied Intelligence: Let the AI model understand the physical world, from “knowing” to “trusting” the physical reality
Agent Skills: Let the AI model master professional skills in specific fields, from “general” to “specialized”

Gemini Robotics-ER 1.6 represents the forefront of embodied intelligence, enabling robots to truly understand the physical world through technologies such as spatial reasoning, instrument reading, and successful detection. Android Skills represents the forefront of agent skills, allowing AI models to understand the Android development model by standardizing the skill format.

Core Question: Embodied AI or Agent Skills, which is the next frontier for AI applications?

🤖 Frontier Signal: Gemini Robotics-ER 1.6

###Technical breakthrough

1. Instrument Reading - Real perception of the physical world

DeepMind introduced a revolutionary capability in Gemini Robotics-ER 1.6: Instrument Reading.

Technical Mechanism:

# 假设的仪器阅读推理流程
def read_gauge(image, gauge_type):
    """
    仪器阅读的核心逻辑
    """
    # 1. 视觉聚焦：zoom into gauge detail
    zoom_image = zoom_in(image, gauge_location)
    
    # 2. 空间推理：分析针、刻度、数值
    needle_position = analyze_needle(zoom_image)
    tick_marks = extract_ticks(zoom_image)
    
    # 3. 代码执行：计算数值
    value = estimate_value(needle_position, tick_marks)
    
    # 4. 世界知识：解读含义
    interpretation = interpret_value(value, gauge_type)
    
    return interpretation

Key Competencies:

Multiple types of instruments supported: round pressure gauges, vertical level gauges, modern digital readouts
Complex visual reasoning: Need to perceive pointers, liquid levels, container boundaries, and scales simultaneously
Perspective distortion processing: For sight glass, the liquid filling ratio needs to be estimated (considering camera perspective distortion)
Unit Interpretation: Read the unit text on the scale and interpret the meaning
Multi-pin combination: Some instruments have multiple pointers and need to be combined with different decimal places.

Implementation:

Agentic Vision: combines visual reasoning and code execution
Step-by-step reasoning: first zoom → analyze → calculate → explain
World Knowledge: Understanding various instrument types (thermometers, pressure gauges, chemical sight glasses)

Real Example: Spot robots can access various instruments in a facility, capture images, and then Gemini Robotics-ER 1.6 can accurately read them.

2. Spatial reasoning (Pointing) - the cornerstone of spatial understanding

Pointing is the basic capability of embodied intelligence:

Ability Level:

Basic Level: Number of identified objects (hammer=2, scissors=1, paintbrush=1, pliers=6)
Relational Reasoning: Identify sets or multiple points
Motion Reasoning: map trajectories and identify the best grabbing points
Constraint Compliance: Reasoning for complex prompts (“Point to all objects that fit into the blue cup”)

Advantages:

Accuracy: much higher than Gemini Robotics-ER 1.5
SELECTIVE: Do not point to objects that do not exist or are not requested
Math Operations: Perform mathematical calculations through points to improve measurement estimation accuracy

3. Success Detection - the engine of autonomy

Technical Challenges:

Visual Understanding: Handling occlusion, low light, ambiguous instructions
Multi-view fusion: Handle overhead and wrist-mounted views simultaneously
Timing Judgment: Determine when the task is completed

Mechanism:

Multi-view reasoning: fuse different camera streams and understand relationships
Dynamic Environment: Can judge even if the environment changes
Action completion judgment: When will “Put the blue pen into the black pen holder” be completed?

4. Security improvements

Quantitative indicators:

Text Scenario: +6% accuracy on Asimov risk identification task (compared to Gemini 3.0 Flash)
Video Scene: +10% accuracy (vs. Gemini 3.0 Flash)
Safety Directive Compliance: Significant improvements in physical safety constraint compliance

Safety Mechanism:

Spatial Output: Use pointing to indicate which objects are safe to grab
Constraint Check: “Do not handle liquids”, “Do not grab objects over 20kg”
Hazard Identification: Identify safety risks in text and video scenarios

🛠️Front Signal: Android Skills

###Technical breakthrough

1. Agent Skills Format

Definition:

AI-optimized, modular instructions and resources to help LLM better understand and execute specific patterns
Follow best practices and guidance for Android development

Format Standard:

Markdown file (SKILL.md): Provides technical specifications for the task
Domain-Specific Information: Provides LLM with domain-specific knowledge and workflow information

2. Android Skills command set

Core Skills:

Android CLI skill: for AI agent coding
Android Knowledge base: provides information and support

Installation method:

# 安装所有 Android skills
android skills add --all

# 安装特定 skill
android skills add --skill=<skill-name>

# 安装给特定 agent
android skills add --agent=<agent-name>

Default installation target:

Gemini and Antigravity agents
Directory location: ~/.gemini/antigravity/skills

3. Agent Skills Standard

Follow Standards:

Open-standard agent skills: Follow the standards of agentskills.io
Technical Specifications: Provides technical specifications for the task
Domain grounding: Provide domain-specific knowledge and workflow information

⚖️ Comparative analysis: Embodied Intelligence vs Agent Skills

Comparison of technical dimensions

Dimensions	Embodied Intelligence (Gemini Robotics-ER 1.6)	Agent Skills (Android Skills)
Core Competencies	Spatial Reasoning, Instrument Reading, Success Detection	Domain-Specific Skills (Android Development)
Application Scenarios	Robots, industrial facilities, physical environment	Android development, mobile applications
Technical Complexity	High (physical world understanding)	Medium (domain knowledge)
Measurable Metrics	+6% Text, +10% Video Asimov Accuracy	Code Accuracy, Development Efficiency
Deployment Scenarios	Robotics, manufacturing, warehousing, medical care	Developer tools, mobile applications
Security	Physical security constraints, collision detection	Code security, permission management
Learning Curve	Requires physical simulation and training data	Requires domain knowledge and best practices

Business Impact Comparison

Embodied Intelligence Business Value:

Manufacturing: Reduce setup time, ROI 40-60%
Warehouse Logistics: Reduce labor costs by 30-40%, ROI $2.1M/100,000 square feet
Medical Assistance: Improves quality of care 3x
Risk: Compliance and Accountability, Runtime Security Monitoring

Agent Skills Business Value:

Developer Productivity: AI agent coding efficiency increased by 30-50%
Code Quality: Reduce error rate by 20-30%
Deployment Speed: AI-assisted development shortens the development cycle by 20-40%
Risk: AI errors, over-reliance

Implement boundary comparison

Embodied Intelligence is best for:

✅ Manufacturing Automation
✅ Warehousing and Logistics
✅Medical assistance
❌ Completely open natural environment
❌ High risk environment (nuclear power plant)
❌ Extreme weather

Agent Skills are best suited for:

✅ Android app development
✅ Mobile application development
✅ Mobile backend development
❌ Non-mobile platforms (desktop, web)
❌ Scenarios that require deep hardware interaction

💡 Quantitative indicators and deployment scenarios

Embodied Intelligence Indicators

1. Instrument reading accuracy

Target: >95% accuracy
Challenges: perspective distortion, unit interpretation, multi-needle combination

2. Spatial reasoning accuracy

Target: >90% point detection accuracy
Challenges: occlusion, low light, complex relationships

3. Successful detection rate

Goal: >98% standard mission success rate
Challenge: Multi-view fusion, dynamic environment

4. Security indicators

Goal: Asimov Risk Identification +6% Accuracy (Text), +10% Accuracy (Video)
Target: Safety instruction compliance >95%

Agent Skills Metrics

1. Code generation accuracy

Goal: >90% code accuracy
Challenges: Complex business logic, platform-specific APIs

2. Improved development efficiency

Target: 30-50% efficiency improvement
Challenge: AI understands complex requirements and code style consistency

3. Error rate reduced

Target: 20-30% error rate reduction
Challenges: edge cases, platform version differences

4. Deployment speed

Target: 20-40% reduction in development cycle
Challenge: AI-assisted development, test coverage

🎯 Decision framework: Which frontier to choose?

Select Embodied Intelligence if:

✅Applicable scenarios:

The robot requires physical operation
Industrial environments require autonomy
Requires zero sample skill transfer
High-value physical operation scenarios

✅Priority:

Manufacturing (ROI 40-60%)
Warehousing and logistics (ROI $2.1M)
Medical Assistance (quality improved 3 times)

❌ Not applicable scenarios:

Completely open environment (the terrain changes too much)
High risk environment (nuclear power plant)
extreme weather

Select Agent Skills if:

✅Applicable scenarios:

Android application development
Mobile application development
Requires rapid prototyping
Requires code quality assurance

✅Priority:

Android developer
Mobile Application Team
Rapid prototyping
Code quality assurance

❌ Not applicable scenarios:

Non-mobile platforms (desktop, web)
Deep hardware interaction scenarios
Requires platform specific hardware operation

🔮 Future Direction

Embodied Intelligence evolution path

Short term (2026-2027): -More industrial scene deployments

Cross-domain skill transfer
Security certification framework

Midterm (2027-2029):

Multi-robot collaboration
Cloud-side collaboration
Independent learning

Long term (2030+):

General physics AI
Learn new skills independently

Agent Skills evolution path

Short term (2026-2027):

More skill types
Cross-platform support
Interoperability standards

Midterm (2027-2029):

Skill Market
Skill sharing
Automatic skill discovery

Long term (2030+):

Common agent skill format
Automatic skill generation
Skill Ecosystem

📝 Conclusion: Two frontiers, two paths

Gemini Robotics-ER 1.6 and Android Skills represent two different frontier AI directions:

Embodied Intelligence: From “model as decision maker” to “model as perception-action translator”
Agent Skills: From “general model” to “specialized skills”

Core Insight:

The economic value of Embodied AI comes from zero-sample environment transfer, not model intelligence itself
The economic value of Agent Skills comes from specialized domain knowledge, not model generality

Final Recommendations:

Manufacturing, Warehousing, Healthcare: Prioritize Embodied Intelligence
Android development, mobile applications: Priority will be given to Agent Skills
Complex Scenario: Use both together

Related Articles: