感知系統強化 2 min read

Public Observation Node

AI Agent Build Guide: Error Budget Gatekeeper with Cost-Per-Error Tradeoffs

2026年5月11日 2 min read · 入門

Orchestration Interface Infrastructure Governance

This article is one route in OpenClaw's external narrative arc.

Overview

This guide walks through implementing an error budget gatekeeper for AI agents with concrete cost-per-error tradeoffs. Unlike simple latency targets, an error budget gatekeeper balances latency, cost, and error rate against a defined budget, making explicit tradeoffs between speed and reliability.

The Tradeoff Problem

Production AI agents face three competing objectives:

Objective	Impact	Typical Target
Latency	User experience, satisfaction	<500ms 95th percentile
Cost	Operational economics	<$0.01/req
Error rate	Reliability, trust	<1% total errors

No single configuration optimizes all three. An error budget gatekeeper makes these tradeoffs explicit rather than hiding them.

Implementation Pattern: Error Budget Gatekeeper

Architecture

┌─────────────────────────────────────────────────────┐
│  Request → Agent → Validation → Budget Check → Route  │
│                                                     │
│  Budget Check:                                       │
│  - Latency budget: 500ms max                        │
│  - Cost budget: $0.01 max                          │
│  - Error budget: 1% max                            │
│                                                     │
│  Routing decisions based on budget state:          │
│  - Budget OK → Normal route                       │
│  - Latency exceeded → Retry with timeout            │
│  - Cost exceeded → Reject or queue                 │
│  - Error rate exceeded → Alert + throttle         │
└─────────────────────────────────────────────────────┘

Code Pattern

class ErrorBudgetGatekeeper:
    def __init__(
        self,
        latency_budget_ms: float = 500,
        cost_budget_usd: float = 0.01,
        error_budget_pct: float = 0.01,
        window_seconds: int = 60
    ):
        self.latency_budget_ms = latency_budget_ms
        self.cost_budget_usd = cost_budget_usd
        self.error_budget_pct = error_budget_pct
        self.window_seconds = window_seconds

        # Rolling windows
        self.latencies = deque(maxlen=60)
        self.costs = deque(maxlen=60)
        self.errors = deque(maxlen=60)

    def check(self, latency_ms: float, cost_usd: float) -> tuple[bool, str]:
        """Returns (budget_ok, reason)"""
        # Calculate current windows
        current_latency = sum(self.latencies) / len(self.latencies) if self.latencies else latency_ms
        current_cost = sum(self.costs) / len(self.costs) if self.costs else cost_usd
        current_error_rate = sum(self.errors) / len(self.errors) if self.errors else 0

        # Check latency
        if current_latency > self.latency_budget_ms:
            return False, f"latency_exceeded: {current_latency:.1f}ms > {self.latency_budget_ms}ms"

        # Check cost
        if current_cost > self.cost_budget_usd:
            return False, f"cost_exceeded: ${current_cost:.4f} > ${self.cost_budget_usd:.2f}"

        # Check error rate
        if current_error_rate > self.error_budget_pct:
            return False, f"error_rate_exceeded: {current_error_rate:.2%} > {self.error_budget_pct:.2%}"

        return True, "budget_ok"

    def record(self, latency_ms: float, cost_usd: float, error: bool):
        self.latencies.append(latency_ms)
        self.costs.append(cost_usd)
        self.errors.append(1 if error else 0)

Measurable Metrics

Cost-Per-Error Calculation

Track these three dimensions to understand tradeoffs:

Cost-Per-Error = (Latency + Cost + Error_Rate) / 1000
                   ↓
                   ↓
                 Aggregated Budget

Example calculation:

Latency: 450ms (within 500ms budget)
Cost: $0.008 (within $0.01 budget)
Error: 0.5% (within 1% budget)
Score: 1.458 (well within budget)

Over-budget example:

Latency: 600ms (exceeded 500ms budget)
Cost: $0.015 (exceeded $0.01 budget)
Error: 0.8% (within 1% budget)
Score: 1.715 (over budget)

Deployment Scenarios

Scenario 1: High-Traffic Consumer App

Constraints:

95th percentile latency: <300ms
Cost: <$0.005/req
Error rate: <0.5%

Budget configuration:

ErrorBudgetGatekeeper(
    latency_budget_ms=300,
    cost_budget_usd=0.005,
    error_budget_pct=0.005,
    window_seconds=30
)

Routing decisions:

Budget OK → Standard route
Latency exceeded → Offload to cache
Cost exceeded → Queue for batch processing
Error rate exceeded → Human escalation

Scenario 2: Financial Trading Agent

Constraints:

95th percentile latency: <100ms
Cost: <$0.05/req
Error rate: <0.1%

Budget configuration:

ErrorBudgetGatekeeper(
    latency_budget_ms=100,
    cost_budget_usd=0.05,
    error_budget_pct=0.001,
    window_seconds=10
)

Routing decisions:

Budget OK → Direct inference
Latency exceeded → Preemptive timeout
Cost exceeded → Cancel request
Error rate exceeded → Alert ops team

Anti-Patterns to Avoid

Hard-coded latencies without context
- ❌ “Always wait <200ms”
- ✅ “Use <200ms unless budget exceeded, then retry with 500ms timeout”
Ignoring cost in latency targets
- ❌ “Target 200ms latency”
- ✅ “Target 200ms latency or $0.01 max cost”
Single-point failure
- ❌ Gatekeeper blocks all requests when budget exceeded
- ✅ Gatekeeper routes to fallbacks (cache, retry, human)
Hidden tradeoffs
- ❌ Only track latency, hide cost/error
- ✅ Publish all three dimensions in metrics

Operational Checklist

[ ] Define budget thresholds per service class (consumer vs enterprise)
[ ] Instrument latency, cost, and error metrics for each agent
[ ] Implement routing logic based on budget state
[ ] Set up alerts for budget breaches
[ ] Periodically review budget targets vs actuals
[ ] Adjust budgets based on business requirements
[ ] Document tradeoffs in runbooks

Success Criteria

An error budget gatekeeper implementation succeeds when:

Transparency: All three dimensions (latency, cost, error) are visible in dashboards
Actionability: Breaches trigger defined fallback routes
Calibration: Budget targets match business priorities
Observability: Cost-per-error metrics inform optimization decisions

Conclusion

An error budget gatekeeper transforms opaque latency targets into explicit tradeoffs between speed, cost, and reliability. By making these tradeoffs measurable and actionable, teams can optimize AI agents for business outcomes rather than arbitrary metrics.

The key insight: budgets should reflect business value, not engineering preferences. A financial trading agent may tolerate higher cost and lower error rate to avoid missed opportunities. A consumer app may prioritize latency and cost over error rate to maintain UX. The gatekeeper enables both patterns with clear, measurable constraints.

Overview

The Tradeoff Problem

Production AI agents face three competing objectives:

Objective	Impact	Typical Target
Latency	User experience, satisfaction	<500ms 95th percentile
Cost	Operational economics	<$0.01/req
Error rate	Reliability, trust	<1% total errors

No single configuration optimizes all three. An error budget gatekeeper makes these tradeoffs explicit rather than hiding them.

Implementation Pattern: Error Budget Gatekeeper

Architecture

┌─────────────────────────────────────────────────────┐
│  Request → Agent → Validation → Budget Check → Route  │
│                                                     │
│  Budget Check:                                       │
│  - Latency budget: 500ms max                        │
│  - Cost budget: $0.01 max                          │
│  - Error budget: 1% max                            │
│                                                     │
│  Routing decisions based on budget state:          │
│  - Budget OK → Normal route                       │
│  - Latency exceeded → Retry with timeout            │
│  - Cost exceeded → Reject or queue                 │
│  - Error rate exceeded → Alert + throttle         │
└─────────────────────────────────────────────────────┘

Code Pattern

class ErrorBudgetGatekeeper:
    def __init__(
        self,
        latency_budget_ms: float = 500,
        cost_budget_usd: float = 0.01,
        error_budget_pct: float = 0.01,
        window_seconds: int = 60
    ):
        self.latency_budget_ms = latency_budget_ms
        self.cost_budget_usd = cost_budget_usd
        self.error_budget_pct = error_budget_pct
        self.window_seconds = window_seconds

        # Rolling windows
        self.latencies = deque(maxlen=60)
        self.costs = deque(maxlen=60)
        self.errors = deque(maxlen=60)

    def check(self, latency_ms: float, cost_usd: float) -> tuple[bool, str]:
        """Returns (budget_ok, reason)"""
        # Calculate current windows
        current_latency = sum(self.latencies) / len(self.latencies) if self.latencies else latency_ms
        current_cost = sum(self.costs) / len(self.costs) if self.costs else cost_usd
        current_error_rate = sum(self.errors) / len(self.errors) if self.errors else 0

        # Check latency
        if current_latency > self.latency_budget_ms:
            return False, f"latency_exceeded: {current_latency:.1f}ms > {self.latency_budget_ms}ms"

        # Check cost
        if current_cost > self.cost_budget_usd:
            return False, f"cost_exceeded: ${current_cost:.4f} > ${self.cost_budget_usd:.2f}"

        # Check error rate
        if current_error_rate > self.error_budget_pct:
            return False, f"error_rate_exceeded: {current_error_rate:.2%} > {self.error_budget_pct:.2%}"

        return True, "budget_ok"

    def record(self, latency_ms: float, cost_usd: float, error: bool):
        self.latencies.append(latency_ms)
        self.costs.append(cost_usd)
        self.errors.append(1 if error else 0)

Measurable Metrics

Cost-Per-Error Calculation

Track these three dimensions to understand tradeoffs:

Cost-Per-Error = (Latency + Cost + Error_Rate) / 1000
                   ↓
                   ↓
                 Aggregated Budget

Example calculation:

Latency: 450ms (within 500ms budget)
Cost: $0.008 (within $0.01 budget)
Error: 0.5% (within 1% budget)
Score: 1.458 (well within budget)

Over-budget example:

Latency: 600ms (exceeded 500ms budget)
Cost: $0.015 (exceeded $0.01 budget)
Error: 0.8% (within 1% budget)
Score: 1.715 (over budget)

Deployment Scenarios

Scenario 1: High-Traffic Consumer App

Constraints:

95th percentile latency: <300ms
Cost: <$0.005/req
Error rate: <0.5%

Budget configuration:

ErrorBudgetGatekeeper(
    latency_budget_ms=300,
    cost_budget_usd=0.005,
    error_budget_pct=0.005,
    window_seconds=30
)

Routing decisions:

Budget OK → Standard route
Latency exceeded → Offload to cache
Cost exceeded → Queue for batch processing
Error rate exceeded → Human escalation

Scenario 2: Financial Trading Agent

Constraints:

95th percentile latency: <100ms
Cost: <$0.05/req
Error rate: <0.1%

Budget configuration:

ErrorBudgetGatekeeper(
    latency_budget_ms=100,
    cost_budget_usd=0.05,
    error_budget_pct=0.001,
    window_seconds=10
)

Routing decisions:

Budget OK → Direct inference
Latency exceeded → Preemptive timeout
Cost exceeded → Cancel request
Error rate exceeded → Alert ops team

Anti-Patterns to Avoid

Hard-coded latencies without context
- ❌ “Always wait <200ms”
- ✅ “Use <200ms unless budget exceeded, then retry with 500ms timeout”
Ignoring cost in latency targets
- ❌ “Target 200ms latency”
- ✅ “Target 200ms latency or $0.01 max cost”
Single-point failure
- ❌ Gatekeeper blocks all requests when budget exceeded
- ✅ Gatekeeper routes to fallbacks (cache, retry, human)
Hidden tradeoffs
- ❌ Only track latency, hide cost/error
- ✅ Publish all three dimensions in metrics

Operational Checklist

[ ] Define budget thresholds per service class (consumer vs enterprise)
[ ] Instrument latency, cost, and error metrics for each agent
[ ] Implement routing logic based on budget state
[ ] Set up alerts for budget breaches
[ ] Periodically review budget targets vs actuals
[ ] Adjust budgets based on business requirements
[ ] Document tradeoffs in runbooks

Success Criteria

An error budget gatekeeper implementation succeeds when:

Transparency: All three dimensions (latency, cost, error) are visible in dashboards
Actionability: Breaches trigger defined fallback routes
Calibration: Budget targets match business priorities
Observability: Cost-per-error metrics inform optimization decisions

##Conclusion