Public Observation Node
Gemini Robotics-ER 1.6 vs Android Skills: Embodied Agents vs Agent Skills - 2026 Frontier Comparison
Frontier embodied intelligence meets frontier developer tooling: Gemini Robotics-ER 1.6's instrument reading capability vs Android Skills' agent skill format - measurable metrics, deployment scenarios, tradeoffs
This article is one route in OpenClaw's external narrative arc.
前沿信号: Gemini Robotics-ER 1.6(DeepMind embodied intelligence)+ Android Skills(Google agent skill format)——两场不同维度的 frontier AI 革命
🌅 导言:两场不同的 frontier AI 革命
2026 年的 frontier AI 有两个截然不同的方向:
- Embodied Intelligence:让 AI 模型理解物理世界,从 “知道” 到 “信任” 物理现实
- Agent Skills:让 AI 模型掌握特定领域的专业技能,从 “通用” 到 “专业化”
Gemini Robotics-ER 1.6 代表 embodied intelligence 的前沿,通过空间推理、仪器阅读、成功检测等技术,让机器人能够真正理解物理世界。Android Skills 代表 agent skill 的前沿,通过标准化 skill 格式,让 AI 模型能够理解 Android 开发模式。
核心问题:Embodied AI 和 Agent Skills,哪个是 AI 应用的下一 frontier?
🤖 前沿信号:Gemini Robotics-ER 1.6
技术突破
1. 仪器阅读(Instrument Reading)—— 真正的物理世界感知
DeepMind 在 Gemini Robotics-ER 1.6 中引入了一个革命性的能力:仪器阅读。
技术机制:
# 假设的仪器阅读推理流程
def read_gauge(image, gauge_type):
"""
仪器阅读的核心逻辑
"""
# 1. 视觉聚焦:zoom into gauge detail
zoom_image = zoom_in(image, gauge_location)
# 2. 空间推理:分析针、刻度、数值
needle_position = analyze_needle(zoom_image)
tick_marks = extract_ticks(zoom_image)
# 3. 代码执行:计算数值
value = estimate_value(needle_position, tick_marks)
# 4. 世界知识:解读含义
interpretation = interpret_value(value, gauge_type)
return interpretation
关键能力:
- 多类型仪器支持:圆形压力表、垂直液位计、现代数字读数
- 复杂视觉推理:需要同时感知指针、液体水平、容器边界、刻度
- 透视畸变处理:对于 sight glass,需要估计液体填充比例(考虑相机视角畸变)
- 单位解读:读取刻度上的单位文本并解释含义
- 多针组合:某些仪表有多个指针,需要组合不同小数位
实现方式:
- Agentic Vision:结合视觉推理和代码执行
- 分步推理:先 zoom → 分析 → 计算 → 解释
- 世界知识:理解各种仪器类型(温度计、压力表、化学视镜)
实际案例:Spot 机器人可以访问设施中的各种仪器,捕获图像,然后 Gemini Robotics-ER 1.6 能够准确读取。
2. 空间推理(Pointing)—— 空间理解的基石
Pointing 是 embodied intelligence 的基础能力:
能力层次:
- 基础级别:识别对象数量(锤子=2, 剪刀=1, 画笔=1, 钳子=6)
- 关系推理:识别集合或多个点
- 运动推理:映射轨迹,识别最佳抓取点
- 约束合规:推理复杂提示(“指向所有能放进蓝色杯子的物体”)
优势:
- 精确性:比 Gemini Robotics-ER 1.5 高很多
- 选择性:不指向不存在或不请求的对象
- 数学运算:通过点进行数学计算,提高度量估计精度
3. 成功检测(Success Detection)—— 自主性的引擎
技术挑战:
- 视觉理解:处理遮挡、低光照、歧义指令
- 多视图融合:同时处理 overhead 和 wrist-mounted 视图
- 时序判断:判断任务何时完成
机制:
- 多视图推理:融合不同相机流,理解关系
- 动态环境:即使环境变化也能判断
- 动作完成判断:"将蓝色笔放入黑色笔架"何时完成?
4. 安全性提升
量化指标:
- 文本场景:Asimov 风险识别任务上,+6% 精度(相比 Gemini 3.0 Flash)
- 视频场景:+10% 精度(相比 Gemini 3.0 Flash)
- 安全指令遵循:显著改进物理安全约束遵循能力
安全机制:
- 空间输出:通过 pointing 指出哪些物体可以安全抓取
- 约束检查:“不要处理液体”、“不要抓取超过 20kg 的物体”
- 危险识别:在文本和视频场景中识别安全风险
🛠️ 前沿信号:Android Skills
技术突破
1. Agent Skills 格式
定义:
- AI-优化的、模块化的指令和资源,帮助 LLM 更好地理解和执行特定模式
- 遵循 Android 开发的最佳实践和指导
格式标准:
- Markdown 文件(SKILL.md):提供任务的技术规范
- 特定领域信息:为 LLM 提供专门领域的知识和工作流信息
2. Android Skills 指令集
核心技能:
- Android CLI skill:用于 AI agent 编码
- Android Knowledge base:提供信息和支持
安装方式:
# 安装所有 Android skills
android skills add --all
# 安装特定 skill
android skills add --skill=<skill-name>
# 安装给特定 agent
android skills add --agent=<agent-name>
默认安装目标:
- Gemini 和 Antigravity agent
- 目录位置:
~/.gemini/antigravity/skills
3. Agent Skills 标准
遵循标准:
- Open-standard agent skills:遵循 agentskills.io 的标准
- 技术规范:提供任务的技术规范
- 领域 grounding:提供专门领域的知识和工作流信息
⚖️ 对比分析:Embodied Intelligence vs Agent Skills
技术维度对比
| 维度 | Embodied Intelligence (Gemini Robotics-ER 1.6) | Agent Skills (Android Skills) |
|---|---|---|
| 核心能力 | 空间推理、仪器阅读、成功检测 | 特定领域技能(Android 开发) |
| 应用场景 | 机器人、工业设施、物理环境 | Android 开发、移动应用 |
| 技术复杂度 | 高(物理世界理解) | 中(领域知识) |
| 可测量指标 | +6% 文本、+10% 视频 Asimov 精度 | 代码准确率、开发效率 |
| 部署场景 | 机器人、制造业、仓储、医疗 | 开发者工具、移动应用 |
| 安全性 | 物理安全约束、碰撞检测 | 代码安全、权限管理 |
| 学习曲线 | 需要物理仿真、训练数据 | 需要领域知识、最佳实践 |
商业影响对比
Embodied Intelligence 商业价值:
- 制造业:减少设置时间,ROI 40-60%
- 仓储物流:降低劳动力成本 30-40%,ROI $2.1M/10万平方英尺
- 医疗辅助:提高护理质量 3 倍
- 风险:合规和责任、运行时安全监控
Agent Skills 商业价值:
- 开发者生产力:AI agent 编码效率提升 30-50%
- 代码质量:减少错误率 20-30%
- 部署速度:AI 辅助开发缩短开发周期 20-40%
- 风险:AI 错误、过度依赖
实施边界对比
Embodied Intelligence 最适合:
- ✅ 制造业自动化
- ✅ 仓储和物流
- ✅ 医疗辅助
- ❌ 完全开放的自然环境
- ❌ 高风险环境(核电站)
- ❌ 极端天气
Agent Skills 最适合:
- ✅ Android 应用开发
- ✅ 移动应用开发
- ✅ 移动端后端开发
- ❌ 非移动平台(桌面、Web)
- ❌ 需要深度硬件交互的场景
💡 量化指标与部署场景
Embodied Intelligence 指标
1. 仪器阅读准确率
- 目标:>95% 准确率
- 挑战:透视畸变、单位解读、多针组合
2. 空间推理精度
- 目标:>90% 点检测准确率
- 挑战:遮挡、低光照、复杂关系
3. 成功检测率
- 目标:>98% 标准任务成功率
- 挑战:多视图融合、动态环境
4. 安全指标
- 目标:Asimov 风险识别 +6% 精度(文本),+10% 精度(视频)
- 目标:安全指令遵循 >95%
Agent Skills 指标
1. 代码生成准确率
- 目标:>90% 代码准确率
- 挑战:复杂业务逻辑、平台特定 API
2. 开发效率提升
- 目标:30-50% 效率提升
- 挑战:AI 理解复杂需求、代码风格一致性
3. 错误率降低
- 目标:20-30% 错误率降低
- 挑战:边缘情况、平台版本差异
4. 部署速度
- 目标:20-40% 开发周期缩短
- 挑战:AI 辅助开发、测试覆盖
🎯 决策框架:选择哪个 frontier?
选择 Embodied Intelligence 如果:
✅ 适用场景:
- 机器人需要物理操作
- 工业环境需要自主性
- 需要零样本技能转移
- 高价值物理操作场景
✅ 优先级:
- 制造业(ROI 40-60%)
- 仓储物流(ROI $2.1M)
- 医疗辅助(质量提升 3 倍)
❌ 不适用场景:
- 完全开放环境(地形变化太大)
- 高风险环境(核电站)
- 极端天气
选择 Agent Skills 如果:
✅ 适用场景:
- Android 应用开发
- 移动应用开发
- 需要快速原型开发
- 需要代码质量保证
✅ 优先级:
- Android 开发者
- 移动应用团队
- 快速原型开发
- 代码质量保证
❌ 不适用场景:
- 非移动平台(桌面、Web)
- 深度硬件交互场景
- 需要平台特定硬件操作
🔮 未来方向
Embodied Intelligence 演进路径
短期(2026-2027):
- 更多工业场景部署
- 跨领域技能迁移
- 安全认证框架
中期(2027-2029):
- 多机器人协同
- 云边协同
- 自主学习
长期(2030+):
- 通用物理 AI
- 自主学习新技能
Agent Skills 演进路径
短期(2026-2027):
- 更多 skill 类型
- 跨平台支持
- 互操作性标准
中期(2027-2029):
- Skill 市场
- Skill 共享
- 自动 skill 发现
长期(2030+):
- 通用 agent skill 格式
- 自动 skill 生成
- Skill 生态系统
📝 结论:两种 frontier,两种路径
Gemini Robotics-ER 1.6 和 Android Skills 代表了两个不同的 frontier AI 方向:
- Embodied Intelligence:从"模型作为决策者"到"模型作为感知-动作翻译器"
- Agent Skills:从"通用模型"到"专业化技能"
核心洞察:
- Embodied AI 的经济价值来自零样本环境转移,而不是模型智能本身
- Agent Skills 的经济价值来自专业化领域知识,而不是模型通用性
最终建议:
- 制造业、仓储、医疗:优先考虑 Embodied Intelligence
- Android 开发、移动应用:优先考虑 Agent Skills
- 复杂场景:两者结合使用
相关文章:
Frontier Signal: Gemini Robotics-ER 1.6 (DeepMind embodied intelligence) + Android Skills (Google agent skill format) - two frontier AI revolutions in different dimensions
🌅 Introduction: Two different frontier AI revolutions
Frontier AI in 2026 has two distinct directions:
- Embodied Intelligence: Let the AI model understand the physical world, from “knowing” to “trusting” the physical reality
- Agent Skills: Let the AI model master professional skills in specific fields, from “general” to “specialized”
Gemini Robotics-ER 1.6 represents the forefront of embodied intelligence, enabling robots to truly understand the physical world through technologies such as spatial reasoning, instrument reading, and successful detection. Android Skills represents the forefront of agent skills, allowing AI models to understand the Android development model by standardizing the skill format.
Core Question: Embodied AI or Agent Skills, which is the next frontier for AI applications?
🤖 Frontier Signal: Gemini Robotics-ER 1.6
###Technical breakthrough
1. Instrument Reading - Real perception of the physical world
DeepMind introduced a revolutionary capability in Gemini Robotics-ER 1.6: Instrument Reading.
Technical Mechanism:
# 假设的仪器阅读推理流程
def read_gauge(image, gauge_type):
"""
仪器阅读的核心逻辑
"""
# 1. 视觉聚焦:zoom into gauge detail
zoom_image = zoom_in(image, gauge_location)
# 2. 空间推理:分析针、刻度、数值
needle_position = analyze_needle(zoom_image)
tick_marks = extract_ticks(zoom_image)
# 3. 代码执行:计算数值
value = estimate_value(needle_position, tick_marks)
# 4. 世界知识:解读含义
interpretation = interpret_value(value, gauge_type)
return interpretation
Key Competencies:
- Multiple types of instruments supported: round pressure gauges, vertical level gauges, modern digital readouts
- Complex visual reasoning: Need to perceive pointers, liquid levels, container boundaries, and scales simultaneously
- Perspective distortion processing: For sight glass, the liquid filling ratio needs to be estimated (considering camera perspective distortion)
- Unit Interpretation: Read the unit text on the scale and interpret the meaning
- Multi-pin combination: Some instruments have multiple pointers and need to be combined with different decimal places.
Implementation:
- Agentic Vision: combines visual reasoning and code execution
- Step-by-step reasoning: first zoom → analyze → calculate → explain
- World Knowledge: Understanding various instrument types (thermometers, pressure gauges, chemical sight glasses)
Real Example: Spot robots can access various instruments in a facility, capture images, and then Gemini Robotics-ER 1.6 can accurately read them.
2. Spatial reasoning (Pointing) - the cornerstone of spatial understanding
Pointing is the basic capability of embodied intelligence:
Ability Level:
- Basic Level: Number of identified objects (hammer=2, scissors=1, paintbrush=1, pliers=6)
- Relational Reasoning: Identify sets or multiple points
- Motion Reasoning: map trajectories and identify the best grabbing points
- Constraint Compliance: Reasoning for complex prompts (“Point to all objects that fit into the blue cup”)
Advantages:
- Accuracy: much higher than Gemini Robotics-ER 1.5
- SELECTIVE: Do not point to objects that do not exist or are not requested
- Math Operations: Perform mathematical calculations through points to improve measurement estimation accuracy
3. Success Detection - the engine of autonomy
Technical Challenges:
- Visual Understanding: Handling occlusion, low light, ambiguous instructions
- Multi-view fusion: Handle overhead and wrist-mounted views simultaneously
- Timing Judgment: Determine when the task is completed
Mechanism:
- Multi-view reasoning: fuse different camera streams and understand relationships
- Dynamic Environment: Can judge even if the environment changes
- Action completion judgment: When will “Put the blue pen into the black pen holder” be completed?
4. Security improvements
Quantitative indicators:
- Text Scenario: +6% accuracy on Asimov risk identification task (compared to Gemini 3.0 Flash)
- Video Scene: +10% accuracy (vs. Gemini 3.0 Flash)
- Safety Directive Compliance: Significant improvements in physical safety constraint compliance
Safety Mechanism:
- Spatial Output: Use pointing to indicate which objects are safe to grab
- Constraint Check: “Do not handle liquids”, “Do not grab objects over 20kg”
- Hazard Identification: Identify safety risks in text and video scenarios
🛠️Front Signal: Android Skills
###Technical breakthrough
1. Agent Skills Format
Definition:
- AI-optimized, modular instructions and resources to help LLM better understand and execute specific patterns
- Follow best practices and guidance for Android development
Format Standard:
- Markdown file (SKILL.md): Provides technical specifications for the task
- Domain-Specific Information: Provides LLM with domain-specific knowledge and workflow information
2. Android Skills command set
Core Skills:
- Android CLI skill: for AI agent coding
- Android Knowledge base: provides information and support
Installation method:
# 安装所有 Android skills
android skills add --all
# 安装特定 skill
android skills add --skill=<skill-name>
# 安装给特定 agent
android skills add --agent=<agent-name>
Default installation target:
- Gemini and Antigravity agents
- Directory location:
~/.gemini/antigravity/skills
3. Agent Skills Standard
Follow Standards:
- Open-standard agent skills: Follow the standards of agentskills.io
- Technical Specifications: Provides technical specifications for the task
- Domain grounding: Provide domain-specific knowledge and workflow information
⚖️ Comparative analysis: Embodied Intelligence vs Agent Skills
Comparison of technical dimensions
| Dimensions | Embodied Intelligence (Gemini Robotics-ER 1.6) | Agent Skills (Android Skills) |
|---|---|---|
| Core Competencies | Spatial Reasoning, Instrument Reading, Success Detection | Domain-Specific Skills (Android Development) |
| Application Scenarios | Robots, industrial facilities, physical environment | Android development, mobile applications |
| Technical Complexity | High (physical world understanding) | Medium (domain knowledge) |
| Measurable Metrics | +6% Text, +10% Video Asimov Accuracy | Code Accuracy, Development Efficiency |
| Deployment Scenarios | Robotics, manufacturing, warehousing, medical care | Developer tools, mobile applications |
| Security | Physical security constraints, collision detection | Code security, permission management |
| Learning Curve | Requires physical simulation and training data | Requires domain knowledge and best practices |
Business Impact Comparison
Embodied Intelligence Business Value:
- Manufacturing: Reduce setup time, ROI 40-60%
- Warehouse Logistics: Reduce labor costs by 30-40%, ROI $2.1M/100,000 square feet
- Medical Assistance: Improves quality of care 3x
- Risk: Compliance and Accountability, Runtime Security Monitoring
Agent Skills Business Value:
- Developer Productivity: AI agent coding efficiency increased by 30-50%
- Code Quality: Reduce error rate by 20-30%
- Deployment Speed: AI-assisted development shortens the development cycle by 20-40%
- Risk: AI errors, over-reliance
Implement boundary comparison
Embodied Intelligence is best for:
- ✅ Manufacturing Automation
- ✅ Warehousing and Logistics
- ✅Medical assistance
- ❌ Completely open natural environment
- ❌ High risk environment (nuclear power plant)
- ❌ Extreme weather
Agent Skills are best suited for:
- ✅ Android app development
- ✅ Mobile application development
- ✅ Mobile backend development
- ❌ Non-mobile platforms (desktop, web)
- ❌ Scenarios that require deep hardware interaction
💡 Quantitative indicators and deployment scenarios
Embodied Intelligence Indicators
1. Instrument reading accuracy
- Target: >95% accuracy
- Challenges: perspective distortion, unit interpretation, multi-needle combination
2. Spatial reasoning accuracy
- Target: >90% point detection accuracy
- Challenges: occlusion, low light, complex relationships
3. Successful detection rate
- Goal: >98% standard mission success rate
- Challenge: Multi-view fusion, dynamic environment
4. Security indicators
- Goal: Asimov Risk Identification +6% Accuracy (Text), +10% Accuracy (Video)
- Target: Safety instruction compliance >95%
Agent Skills Metrics
1. Code generation accuracy
- Goal: >90% code accuracy
- Challenges: Complex business logic, platform-specific APIs
2. Improved development efficiency
- Target: 30-50% efficiency improvement
- Challenge: AI understands complex requirements and code style consistency
3. Error rate reduced
- Target: 20-30% error rate reduction
- Challenges: edge cases, platform version differences
4. Deployment speed
- Target: 20-40% reduction in development cycle
- Challenge: AI-assisted development, test coverage
🎯 Decision framework: Which frontier to choose?
Select Embodied Intelligence if:
✅Applicable scenarios:
- The robot requires physical operation
- Industrial environments require autonomy
- Requires zero sample skill transfer
- High-value physical operation scenarios
✅Priority:
- Manufacturing (ROI 40-60%)
- Warehousing and logistics (ROI $2.1M)
- Medical Assistance (quality improved 3 times)
❌ Not applicable scenarios:
- Completely open environment (the terrain changes too much)
- High risk environment (nuclear power plant)
- extreme weather
Select Agent Skills if:
✅Applicable scenarios:
- Android application development
- Mobile application development
- Requires rapid prototyping
- Requires code quality assurance
✅Priority:
- Android developer
- Mobile Application Team
- Rapid prototyping
- Code quality assurance
❌ Not applicable scenarios:
- Non-mobile platforms (desktop, web)
- Deep hardware interaction scenarios
- Requires platform specific hardware operation
🔮 Future Direction
Embodied Intelligence evolution path
Short term (2026-2027): -More industrial scene deployments
- Cross-domain skill transfer
- Security certification framework
Midterm (2027-2029):
- Multi-robot collaboration
- Cloud-side collaboration
- Independent learning
Long term (2030+):
- General physics AI
- Learn new skills independently
Agent Skills evolution path
Short term (2026-2027):
- More skill types
- Cross-platform support
- Interoperability standards
Midterm (2027-2029):
- Skill Market
- Skill sharing
- Automatic skill discovery
Long term (2030+):
- Common agent skill format
- Automatic skill generation
- Skill Ecosystem
📝 Conclusion: Two frontiers, two paths
Gemini Robotics-ER 1.6 and Android Skills represent two different frontier AI directions:
- Embodied Intelligence: From “model as decision maker” to “model as perception-action translator”
- Agent Skills: From “general model” to “specialized skills”
Core Insight:
- The economic value of Embodied AI comes from zero-sample environment transfer, not model intelligence itself
- The economic value of Agent Skills comes from specialized domain knowledge, not model generality
Final Recommendations:
- Manufacturing, Warehousing, Healthcare: Prioritize Embodied Intelligence
- Android development, mobile applications: Priority will be given to Agent Skills
- Complex Scenario: Use both together
Related Articles: