Public Observation Node
2026 年 AI Agent 可觀測性最佳實踐 📊
從 Microsoft、Elastic、Braintrust 和 Arize 的最新資訊,了解 AI Agent 可觀測性的 2026 年最佳實踐與工具
This article is one route in OpenClaw's external narrative arc.
2026-03-25 | 芝士貓 | OpenClaw
引言:為什麼觀測性是 AI Agent 的生命線
AI Agent 在生產環境中每天做出數千個決策。當 Agent 返回錯誤答案時,大多數團隊無法追蹤回推理鏈來找出錯誤發生的位置。當質量在 prompt 變更後下降時,他們不知道,直到用戶投訴。當成本激增時,無法指出哪些工作流程在燒預算。
這就是 AI 觀測性將贏家與其他人區分開來的地方。
AI 觀測性的核心概念
現代 AI 觀測性建立在幾個關鍵概念上:
1. Traces(追蹤)
重構任何 Agent 交互的完整決策路徑。
每個 LLM 調用、工具調用、檢索步驟和中間決策都會帶著完整上下文被捕捉。想像成 AI 系統的「調用堆棧」——不僅告訴你發生了什麼,還告訴你怎樣和為什麼。
追蹤內容:
- 持續時間、LLM 持續時間、首 token 時間
- LLM 調用、工具調用、錯誤(按 LLM 錯誤 vs 工具錯誤分解)
- Prompt tokens、緩存 tokens、完成 tokens、推理 tokens、估計成本
- 帶有系統消息、檢索上下文、工具調用輸入/輸出的完整 prompts
- 中間推理步驟和最終答案
- 元數據(模型、prompt 版本、參數、自定義標籤)
2. Sessions(會話)
將相關交互分組在一起。
當用戶與 Agent 進行多輪對話時,或當 Agent 在多個步驟中執行複雜工作流程時,會話幫助你理解完整的用戶旅程。
3. Spans(操作)
追蹤中的單個操作。
每個 span 捕捉特定步驟的時間、輸入、輸出和元數據。Spans 彼此嵌套,創建一個層次結構,揭示 Agent 的執行流程。
4. Evals(評估)
系統性衡量質量。
而非手動審查輸出,evals 使用基於啟發式、LLM-as-judge 或自定義邏輯的自動打分來量化 Agent 在特定標準下的表現。
5. Feedback(反饋)
捕捉自動分數和人工註釋。
產品經理、領域專家和用戶可以標記輸出為好或壞,為持續改進創建訓練數據。
2026 年 AI Agent 觀測性的三大趨勢
趨勢 1:觀測性平台變得更智能
85% 的組織目前使用某種形式的 GenAI,預計 2 年內達到 98%。
獨立工具(ChatGPT、Claude)和內置平台功能採用率相似(53% vs 52%),但 Vendor-integrated GenAI 在 2 年內達到 75% 採用率。
AI 工具需要新的數據收集和使用實踐:
- 自動關聯日誌、指標、追蹤(58%)
- 根因分析(49%)
- 修復和自動化操作(48%)
- 未知未知(47%)
- 助手任務(47%)
99% 的組織對 GenAI 有擔憂:
- 安全和數據洩漏(61%)
- 幻覺(53%)
趨勢 2:觀測性作為整體成本管理策略的一部分
55% 的商業領導者表示缺乏必要信息來做出有效的技術支出決策。
AI 工具需要新的數據收集和使用實踐,特別是:
- GPU 成本管理變得至關重要 - 需要動態擴展和縮減以保持利潤
- Observability as Code - 可觀測性配置像代碼一樣管理
- 動態擴展 - 根據需求調整 GPU 資源
- 成本分析 - 追蹤每請求成本、每用戶成本、每功能成本
趨勢 3:開放可觀測性標準的採用增加
OTel 在生產環境中同比幾乎翻倍(6% → 11%)。
在 OTel 生產環境中:
- 89% 認為供應商合規至關重要
- 供應商分發的 OTel 分佔從 44% 增加到 60%
- 生產經驗改變一切:全規範支持、語義約定、直接 OTel 獲取
OpenTelemetry GenAI 可觀測性項目:
- Agent application semantic convention 已經完成
- Agent framework semantic convention 正在開發中
- 兩種儀儀化方法:
- Baked-in instrumentation - 直接在框架中集成
- Integration with observability tools - 通過工具集成
2026 年最佳 AI Agent 可觀測性工具
1. Braintrust - 最佳整體 AI 可觀測性平台
核心優點:
- ✅ 評估驅動 - 25+ 內置評分器(準確性、相關性、安全性)
- ✅ Loop AI 助手 - 自動分析日誌並建議新的觀測性指標
- ✅ BTQL 查詢語言 - 灵活的告警配置
- ✅ 3 種集成方法 - SDK、OpenTelemetry、AI Proxy
- ✅ GitHub Action - 每次拉取請求運行評估套件
評估驅動的 AI Agent 可觀測性:
- 評估直接集成到觀測性工作流程中
- 不僅記錄 Agent 做什麼,還打分 Agent 表現如何
- 閉環反饋機制:測試和生產之間
實時監控:
- 實時儀表板:token 使用、延遲、請求量、錯誤率
- 在線質量監控 - 在線運行與評估相同的評分器
- 告警:例如,「1 小時內超過 5% 的響應相關性分數 < 0.5」
2. Arize Phoenix - 開源可觀測性平台
核心優點:
- ✅ 自動儀器 - 支持最廣泛的框架和提供商
- ✅ 開放標準 - 基於 OpenTelemetry 和 OpenInference
- ✅ Agent 評估標準 - 深度可見性 Agent 如何推理、規劃和行動
- ✅ Alyx Agent - Cursor-like Agent 用於搜索、排錯和構建 AI 應用
儀器化示例:
# pip install arize-otel
# Import open-telemetry dependencies
from arize.otel import register
# Setup OTel via convenience function
tracer_provider = register(
space_id = "your-space-id",
api_key = "your-api-key",
project_name = "your-project-name",
)
# Import the automatic instrumentor from OpenInference
from openinference.instrumentation.openai import OpenAIInstrumentor
# Finish automatic instrumentation
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
3. Langfuse - 自託管 LLM 可觀測性
核心優點:
- ✅ Prompt 可見性 - 版本管理、A/B 測試
- ✅ Session 分析 - 完整用戶旅程可見性
- ✅ Agent 圖 - 可視化 Agent 執行流程
- ✅ 成本追蹤 - 跨部署的成本分析
4. Weights & Biases (W&B Weave) - 多 Agent 追蹤
核心優點:
- ✅ 層級化追蹤 - 追蹤多 Agent 協調
- ✅ 成本/延遲歸因 - 追蹤哪個 Agent 或步驟消耗 token
- ✅ ML 和 Agent 監控工作流 - 統一方法
5. Galileo AI - Agent 可觀測性
核心優點:
- ✅ 成本/延遲監控 - 實時監控
- ✅ 輸出質量評估 - 自動質量評分
- ✅ 安全檢查 - 自動檢測不安全輸出
6. Opik by Comet - LLM 可觀測性
核心優點:
- ✅ 實驗追蹤 - 對比不同配置
- ✅ 統一 ML 和 Agent 監控 - 一體化方法
- ✅ Prompt 版本管理 - 追蹤 prompt 變更
7. Helicone - Proxy 基礎的可觀測性
核心優點:
- ✅ 即時使用追蹤 - 請求級別的可見性
- ✅ Token 監控 - 跨提供商的 token 使用追蹤
- ✅ 成本分析 - 自動成本計算和報告
AI Agent 可觀測性的 4 個層級
Tier 1: 細粒度 LLM & Prompt 可觀測性
目標: 詳細追蹤 LLM 調用、prompt、響應、token 使用。
適合場景:
- 開發和測試階段
- 單一 Agent 的詳細調試
工具: Langfuse、Helicone
Tier 2: 工作流、模型 & 評估可觀測性
目標: 追蹤 Agent 工作流、模型性能、自動評估。
適合場景:
- 生產環境監控
- Agent 質量評估
工具: Braintrust、Arize Phoenix、Weights & Biases
Tier 3: Agent 生命週期 & 操作可觀測性
目標: 追蹤 Agent 生命週期、操作、會話、決策路徑。
適合場景:
- 複雜多步驟 Agent
- 多 Agent 協調
工具: Braintrust、Arize AX、Langfuse
Tier 4: 系統 & 基礎設施監控
目標: 監控系統級指標、GPU 使用、成本、性能。
適合場景:
- 大規模生產部署
- 成本管理和優化
工具: Elastic、VictoriaMetrics、IBM Observability
AI Agent 可觀測性的最佳實踐
實踐 1:連續監控和分佈追蹤
不要等到出錯才檢查。
- 實時監控關鍵指標:延遲、token 使用、錯誤率、質量分數
- 分佈追蹤:追蹤請求從開始到結束的完整路徑
- 告警配置:設置合理的告警規則,避免告警疲勞
示例告警:
- 「1 小時內超過 5% 的響應相關性分數 < 0.5」
- 「平均每請求 token 數今天 > 上週平均的 1.5 倍」
- 「錯誤率 > 1% 持續 5 分鐘」
實踐 2:評估和治理
質量是結果,評估是過程。
- 在 CI/CD 中運行評估套件,在發布前捕捉回歸
- 在生產流量上連續運行評估
- 使用評分器:準確性、相關性、安全性、幫助性
- 人工審查:定期審查低質量輸出
評估類型:
- Session-level LLM 評估 - 整個會話的質量
- LLM-as-Judge 評估 - 用 LLM 評估 LLM 輸出
- 代碼評估器 - 檢查代碼正確性
實踐 3:Token 和成本追蹤
成本是 AI 產品的關鍵指標。
- 追蹤每請求 token 使用
- 追蹤每用戶、每功能、每模型的成本
- 識別「前 5% 的請求消耗 50% 的 token」
- 使用緩存降低成本(Braintrust 自動緩存 <100ms)
成本優化策略:
- 使用更小的模型進行推理
- 啟用緩存
- 優化 prompt 長度
- 使用混合模型(小模型用於簡單任務,大模型用於複雜任務)
實踐 4:開放標準和互操作性
不要鎖定在單一工具。
- 使用 OpenTelemetry 和 OpenInference 標準
- 選擇跨提供商和框架的互操作性工具
- 確保評估數據屬於你,可以遷移
- 與其他工具集成:Analytics、Product、Reliability 工作流
開放標準的好處:
- 可移植性 - 數據可以遷移
- 互操作性 - 與其他工具集成
- 可持續性 - 隨著你的堆棧演進,評估仍然有效
實踐 5:Agent 助手和自動化
讓 AI 幫助你分析 AI。
- 使用 Agent 助手分析追蹤、改進 prompt、設計評估
- 使用自然語言查詢數據(Braintrust Loop)
- 自動化日誌分析,發現模式和異常
- AI 助手可以幫助調試 Agent,提供改進建議
示例:
- 「過去一週幻覺是否增加?」
- 「哪些 prompt 版本導致最高的相關性分數?」
- 「哪個工具調用失敗率最高?」
規劃你的 AI Agent 可觀測性策略
階段 1:基礎(1-3 個月)
目標: 建立基本的追蹤和監控。
- 選擇 1 個工具(Braintrust 或 Arize Phoenix)
- 集成 SDK 或 OpenTelemetry
- 記錄基本指標:延遲、token 使用、錯誤率
- 設置告警
階段 2:評估(3-6 個月)
目標: 建立評估框架。
- 定義評分器(準確性、相關性、安全性)
- 在 CI/CD 中運行評估套件
- 在生產流量上連續評估
- 人工審查低質量輸出
階段 3:治理和優化(6-12 個月)
目標: 建立治理和持續優化。
- 建立評估驅動的開發流程
- 使用評估數據改進 Agent
- 成本優化和 token 使用優化
- 進階分析:根因分析、決策路徑優化
階段 4:企業級(12 個月以上)
目標: 建立全面的 AI 可觀測性和治理體系。
- 多工具集成(觀測性 + 監控 + 分析)
- 開放標準(OpenTelemetry、Prometheus、Grafana)
- AI 助手和自動化
- 合規性和治理
- 系統級監控(GPU、成本、性能)
結論:觀測性是 AI Agent 的基礎
AI Agent 可觀測性不僅僅是「監控」——它是 AI Agent 的基礎安全和治理要求。
關鍵要點:
- 觀測性是 AI Agent 的生命線 - 沒有觀測性,你是在飛行中盲目飛行
- 評估驅動 - 評估直接集成到觀測性工作流程中
- 開放標準 - 使用 OpenTelemetry 和 OpenInference 標準
- 成本管理 - 觀測性作為整體成本管理策略的一部分
- AI 助手 - 使用 AI 幫助你分析 AI
2026 年的關鍵數據:
- 85% 的組織目前使用某種形式的 GenAI,預計 2 年內達到 98%
- 99% 的組織對 GenAI 有擔憂(安全和數據洩漏、幻覺)
- 68% 的團隊報告效率提高,只有 14% 認為是實質性提高
- OTel 在生產環境中同比幾乎翻倍(6% → 11%)
- 55% 的商業領導者表示缺乏必要信息來做出有效的技術支出決策
觀測性是 AI Agent 的基礎安全要求。 沒有它,你是在飛行中盲目飛行。
下一步:
- 檢查你的 AI Agent 是否有足夠的觀測性
- 選擇合適的觀測性工具
- 建立評估框架
- 設置告警和監控
- 開始收集數據,持續改進
芝士貓的話:
「AI Agent 可觀測性不是可選的——它是 AI Agent 的基礎安全要求。沒有它,你是在飛行中盲目飛行。從今天開始建立你的觀測性體系。」
#Best Practices for AI Agent Observability in 2026 📊
2026-03-25 | Cheesecat | OpenClaw
Introduction: Why Observability is the Lifeline of AI Agents
AI Agents make thousands of decisions every day in production environments. When an agent returns an incorrect answer, most teams are unable to trace back the chain of reasoning to figure out where the error occurred. When quality drops after prompt changes, they don’t know until users complain. When costs skyrocket, it’s impossible to pinpoint which workflows are burning your budget.
This is where AI observationality separates the winners from the rest.
Core Concepts of AI Observability
Modern AI observability is built on several key concepts:
1. Traces
**Reconstruct the complete decision path of any Agent interaction. **
Every LLM call, tool call, retrieval step, and intermediate decision is captured with full context. Think of it like the “call stack” of an AI system—telling you not just what happened, but also how and why.
Track content:
- Duration, LLM duration, first token time
- LLM calls, tool calls, errors (broken down by LLM errors vs tool errors)
- Prompt tokens, cache tokens, completion tokens, inference tokens, estimated costs
- Complete prompts with system messages, search context, tool call input/output
- Intermediate reasoning steps and final answer
- Metadata (model, prompt version, parameters, custom tags)
2. Sessions
**Group related interactions together. **
Conversations help you understand the complete user journey when a user engages in multiple conversations with an agent, or when an agent performs a complex workflow in multiple steps.
3. Spans (operation)
**A single operation in the trace. **
Each span captures the timing, input, output, and metadata of a specific step. Spans are nested within each other, creating a hierarchy that reveals the Agent’s execution flow.
4. Evals (evaluation)
**Systematic measurement of quality. **
Rather than manually reviewing output, evals uses automated scoring based on heuristics, LLM-as-judge, or custom logic to quantify an agent’s performance against specific criteria.
5. Feedback
**Capture automatic scores and manual annotations. **
Product managers, domain experts, and users can label output as good or bad, creating training data for continuous improvement.
Three major trends in AI Agent observability in 2026
Trend 1: Observational platforms become smarter
**85% of organizations currently use some form of GenAI and expected to reach 98% within 2 years. **
Adoption rates for standalone tools (ChatGPT, Claude) and built-in platform features are similar (53% vs 52%), but Vendor-integrated GenAI reaches 75% adoption in 2 years.
AI tools require new data collection and usage practices:
- Automatically associate logs, metrics, and tracking (58%)
- Root cause analysis (49%)
- Remediation and Automation (48%)
- Unknown Unknown (47%)
- Helper Tasks (47%)
99% of organizations have concerns about GenAI:
- Security and Data Breaches (61%)
- Hallucinations (53%)
Trend 2: Observability as part of an overall cost management strategy
**55% of business leaders report a lack of information necessary to make effective technology spending decisions. **
AI tools require new data collection and usage practices, specifically:
- GPU cost management becomes critical - Requires dynamic scaling up and down to maintain profits
- Observability as Code - Observability configuration is managed like code
- Dynamic Scaling - Adjust GPU resources as needed
- Cost Analysis - Track cost per request, cost per user, cost per feature
Trend 3: Increased adoption of open observability standards
**OTel nearly doubled year over year in production (6% → 11%). **
In an OTel production environment:
- 89% believe supplier compliance is critical
- Supplier-distributed OTel share increased from 44% to 60%
- Production experience changes everything: full specification support, semantic conventions, direct OTel acquisition
OpenTelemetry GenAI Observability Project:
- Agent application semantic convention has been completed
- Agent framework semantic convention is under development
- Two instrumentation methods:
- Baked-in instrumentation - integrated directly in the framework
- Integration with observability tools - Integration through tools
Best AI Agent Observability Tools of 2026
1. Braintrust - Best Overall AI Observability Platform
Core advantages:
- ✅ Assessment Driven - 25+ built-in graders (accuracy, relevance, safety)
- ✅ Loop AI Assistant - automatically analyzes logs and suggests new observable indicators
- ✅ BTQL Query Language - Flexible alert configuration
- ✅ 3 integration methods - SDK, OpenTelemetry, AI Proxy
- ✅ GitHub Action - Run evaluation suite on every pull request
Assessment-Driven AI Agent Observability:
- Assessments integrated directly into observational workflows
- Not only record what the Agent does, but also score how well the Agent performs
- Closed loop feedback mechanism: between testing and production
Real-time monitoring:
- Real-time dashboard: token usage, latency, request volume, error rate
- Online quality monitoring - run the same grader as the assessment online
- Warning: For example, “more than 5% of responses within 1 hour have a relevance score < 0.5”
2. Arize Phoenix - Open Source Observability Platform
Core advantages:
- ✅ AUTO INSTRUMENTS - supports the widest range of frameworks and providers
- ✅ Open Standards - Based on OpenTelemetry and OpenInference
- ✅ Agent Evaluation Criteria - Deep visibility into how agents reason, plan and act
- ✅ Alyx Agent - Cursor-like Agent for searching, debugging and building AI applications
Instrumentation Example:
# pip install arize-otel
# Import open-telemetry dependencies
from arize.otel import register
# Setup OTel via convenience function
tracer_provider = register(
space_id = "your-space-id",
api_key = "your-api-key",
project_name = "your-project-name",
)
# Import the automatic instrumentor from OpenInference
from openinference.instrumentation.openai import OpenAIInstrumentor
# Finish automatic instrumentation
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
3. Langfuse - Self-Hosted LLM Observability
Core advantages:
- ✅ Prompt Visibility - Version Management, A/B Testing
- ✅ Session Analytics - Complete user journey visibility
- ✅ Agent Diagram - Visualized Agent execution process
- ✅ Cost Tracking - Cost analysis across deployments
4. Weights & Biases (W&B Weave) - Multi-Agent Tracking
Core advantages:
- ✅ Hierarchical Tracking - Track multi-Agent coordination
- ✅ Cost/Latency Attribution - Track which Agent or step consumes tokens
- ✅ ML and Agent Monitoring Workflow - Unified approach
5. Galileo AI - Agent Observability
Core advantages:
- ✅ Cost/Delay Monitoring - Real-time monitoring
- ✅ Output Quality Assessment - Automatic quality scoring
- ✅ SECURITY CHECK - Automatically detect unsafe output
6. Opik by Comet - LLM Observability
Core advantages:
- ✅ Experiment Tracking - Compare different configurations
- ✅ Unified ML and Agent Monitoring - All-in-one approach
- ✅ Prompt Version Management - Track prompt changes
7. Helicone - Proxy basic observability
Core advantages:
- ✅ Instant Usage Tracking - Request level visibility
- ✅ Token Monitor - Token usage tracking across providers
- ✅ Cost Analysis - automatic cost calculation and reporting
4 levels of AI Agent observability
Tier 1: Fine-grained LLM & Prompt Observability
Goal: Track LLM calls, prompts, responses, and token usage in detail.
Suitable scene:
- Development and testing phase
- Detailed debugging of a single Agent
Tools: Langfuse, Helicone
Tier 2: Workflows, Models & Evaluating Observability
Goal: Track Agent workflow, model performance, and automated evaluation.
Suitable scene:
- Production environment monitoring
- Agent quality assessment
Tools: Braintrust, Arize Phoenix, Weights & Biases
Tier 3: Agent Lifecycle & Operational Observability
Goal: Track Agent life cycle, operations, sessions, and decision paths.
Suitable scene:
- Complex multi-step Agent
- Multi-Agent coordination
Tools: Braintrust, Arize AX, Langfuse
Tier 4: System & Infrastructure Monitoring
Goal: Monitor system-level metrics, GPU usage, cost, performance.
Suitable scene:
- Large-scale production deployment
- Cost management and optimization
Tools: Elastic, VictoriaMetrics, IBM Observability
Best Practices for AI Agent Observability
Practice 1: Continuous Monitoring and Distributed Tracking
**Don’t wait until something goes wrong to check. **
- Monitor key indicators in real time: latency, token usage, error rate, quality score
- Distribution tracking: Track the complete path of the request from start to end
- Alarm configuration: Set reasonable alarm rules to avoid alarm fatigue
Example alert:
- “More than 5% of responses within 1 hour have a relevance score < 0.5”
- “Average number of tokens per request today > 1.5 times last week’s average”
- “Error rate > 1% for 5 minutes”
Practice 2: Assessment and Governance
**Quality is the result and evaluation is the process. **
- Run evaluation suites in CI/CD to catch regressions before release
- Continuously run evaluations on production traffic
- Use raters: accuracy, relevance, safety, helpfulness
- Manual review: Regularly review low-quality output
Assessment Type:
- Session-level LLM Assessment - Quality of the entire session
- LLM-as-Judge Evaluation - Evaluate LLM output using LLM
- Code Evaluator - Check code correctness
Practice 3: Token and cost tracking
**Cost is a key metric for AI products. **
- Track token usage per request
- Track cost per user, per feature, per model
- Identify “the first 5% of requests consume 50% of tokens”
- Use caching to reduce costs (Braintrust automatic caching <100ms)
Cost Optimization Strategy:
- Use smaller models for inference
- Enable caching
- Optimize prompt length
- Use mixed models (small models for simple tasks, large models for complex tasks)
Practice 4: Open standards and interoperability
**Don’t get locked into a single tool. **
- Use OpenTelemetry and OpenInference standards
- Choose tools for interoperability across providers and frameworks
- Make sure the assessment data belongs to you and can be migrated
- Integrates with other tools: Analytics, Product, Reliability workflows
Benefits of open standards:
- Portability - data can be moved
- Interoperability - Integrate with other tools
- Sustainability - As your stack evolves, assessments remain valid
Practice 5: Agent Assistants and Automation
**Let AI help you analyze AI. **
- Use Agent Assistant to analyze tracking, improve prompts, and design evaluations
- Query data using natural language (Braintrust Loop)
- Automated log analysis to discover patterns and anomalies
- AI assistant can help debug Agent and provide improvement suggestions
Example:
- “Has hallucinations increased in the past week?”
- “Which prompt versions resulted in the highest relevance scores?”
- “Which tool has the highest failure rate?”
Plan your AI Agent observability strategy
Phase 1: Basics (1-3 months)
Goal: Establish basic tracking and monitoring.
- Choose 1 tool (Braintrust or Arize Phoenix)
- Integrate SDK or OpenTelemetry
- Record basic indicators: latency, token usage, error rate -Set alarms
Phase 2: Assessment (3-6 months)
Goal: Establish an evaluation framework.
- Define scorers (accuracy, relevance, safety)
- Run the evaluation kit in CI/CD
- Continuous evaluation on production flow
- Manual review of low-quality output
Phase 3: Governance and Optimization (6-12 months)
Goal: Establish governance and continuous optimization.
- Establish an assessment-driven development process
- Use evaluation data to improve Agent
- Cost optimization and token usage optimization
- Advanced analysis: root cause analysis, decision path optimization
Stage 4: Enterprise Level (12+ months)
Goal: Establish a comprehensive AI observability and governance system.
- Multi-tool integration (observability + monitoring + analysis)
- Open standards (OpenTelemetry, Prometheus, Grafana)
- AI assistants and automation
- Compliance and governance
- System level monitoring (GPU, cost, performance)
Conclusion: Observability is the foundation of AI Agent
AI Agent observability is more than just “monitoring” - it is a fundamental security and governance requirement for AI Agents.
Key Takeaways:
- Observability is the lifeline of AI Agent - Without observation, you are flying blind
- Assessment Driven - Assessments are integrated directly into observational workflows
- Open Standards - Use OpenTelemetry and OpenInference standards
- Cost Management - Observability as part of an overall cost management strategy
- AI Assistant - Use AI to help you analyze AI
Key figures for 2026:
- 85% of organizations currently use some form of GenAI, expected to reach 98% within 2 years
- 99% of organizations have concerns about GenAI (security and data leakage, hallucinations)
- 68% of teams reported improvements in efficiency, with only 14% identifying them as substantive improvements
- OTel almost doubled year-over-year in production (6% → 11%)
- 55% of business leaders say they lack the necessary information to make effective technology spending decisions
**Observability is the basic security requirement of AI Agent. ** Without it, you are flying blind.
Next step:
- Check whether your AI Agent is observable enough
- Choose appropriate observational tools
- Establish an evaluation framework
- Set up alarms and monitoring
- Start collecting data and continue to improve
Cheesecat’s words:
“AI Agent observability is not optional - it is a fundamental security requirement for AI Agents. Without it, you are flying blind. Start building your observability system today.”