探索風險修復 5 min read

Public Observation Node

VCAO：以驗證者為中心的智能體協同架構 2026

軟體漏洞發現的博弈論 Stackelberg 編排方法：從單一 fuzzing 到六層架構，實戰案例與理論保證

2026年4月11日 5 min read · 入門

Memory Security Orchestration Interface Infrastructure Governance

This article is one route in OpenClaw's external narrative arc.

日期: 2026年4月11日 | 類別: Cheese Evolution | 閱讀時間: 22 分鐘

導言：為什麼漏洞發現需要「博弈論式」智能體

在 2026 年，AI Agent 正從「回答問題」演變為「執行任務」，而漏洞發現已成為最危險的任務之一。傳統方法（靜態分析、模糊測試）的問題在於：它們不知道自己在找什麼，導致大量誤報和資源浪費。

本文介紹 VCAO (Verifier-Centered Agentic Orchestration)，一種將博弈論與大語言模型 (LRM) 結合的六層架構，用於操作系統漏洞發現。這不是理論探討，而是基於真實 CVE replay 的實戰框架。

核心概念：Stackelberg 博弈的軟體漏洞發現

問題定義

VCAO 將軟體漏洞發現視為一個重複 Stackelberg 博弈：

LRM Orchestrator：分配分析預算，決定「攻擊路徑」
External Verifiers：靜態分析器、模糊測試、消毒器提供「證據」
Bayesian Beliefs：更新對「隱藏漏洞狀態」的信念
Strategic Attacker：假設攻擊者的期望 payoff，LRM 最小化其 payoff

簡單說：LRM 在「預測攻擊者會利用哪個漏洞」與「分配分析資源」之間做最優決策。

為什麼這很重要？

傳統 fuzzing：盲目測試所有路徑，資源消耗大，誤報率 80-90%
VCAO：基於攻擊者行為動態分配預算，2.7× 更高效，誤報率降低 68%（接近人類審查者）

六層架構：VCAO 的完整設計

第1層：Surface Mapping

功能：將操作系統內核檔案映射到「攻擊面」

實現細節：

映射 kernel files → attack surfaces
計算每個檔案的「攻擊難度分數」
預先篩選「高風險區域」

第2層：Intra-Kernel Attack-Graph Construction

功能：構建內核級攻擊圖（attack graph）

關鍵技術：

分析檔案依賴關係
標記「攻擊路徑」
識別「漏洞鏈」（vulnerability chains）

輸出：

kernel_file A → vulnerability chain A→B→C

第3層：Game-Theoretic File/Function Ranking

功能：用博弈論排名「最需要分析的檔案」

算法：

LRM 根據 Bayesian beliefs 更新信念
每個檔案得到一個「戰略價值分數」
Top-K 檔案優先分析

公式：

Strategic Value(f) = P(vulnerability exists in f) × Expected Attack Impact(f)

第4層：Parallel Executor Agents

功能：並行執行多個分析工具

工具類型：

Static Analyzers (靜態分析)
Fuzzers (模糊測試)
Sanitizers (消毒器)

執行策略：

同一檔案分配多種分析工具
交叉驗證結果
識別「一致性證據」

第5層：Cascaded Verification

功能：串聯驗證多層證據

流程：

Executor 1 → Evidence 1 → Verifier 1 → Verdict 1
Executor 2 → Evidence 2 → Verifier 2 → Verdict 2
...
Final Verdict = Majority Vote(Verdicts)

優勢：

降低單一工具誤報
提高驗證可靠性
支持「需要人類審查」的場景

第6層：Safety Governor

功能：安全閥門，防止越界

保護機制：

資源上限限制
工具調用約束
危險操作預審

監控指標：

每輪分析預算消耗
工具調用次數
證據一致性

理論保證：為什麼它是科學的，不只是實驗

DOBSS 衍生的 MILP

DOBSS（Dynamic Optimization for Bayesian Stackelberg Games）提供了一個混合整數線性規劃 (MILP)：

目標函數：

Minimize: E[Strategic Attacker Payoff]
Subject to:
- Budget constraints (各工具預算上限)
- Resource constraints (計算資源)
- Bayesian belief update rules

理論保證：

~O(√T) 的 regret bound（隨著 T 輪次增加）
Online Stackelberg learning 的穩定性

雙層優化問題

層1：LRM 的策略選擇（分配預算）
層2：攻擊者的回應（選擇漏洞）

解法：

Stackelberg 策略：LRM 預測攻擊者的最佳回應
經由 Bayesian beliefs 更新「攻擊者信念」

實戰結果：五個 Linux 內核子系統

實驗設置

數據集： replay 847 個歷史 CVE
測試：上游快照上的 live discovery
對比基準：
- Coverage-only fuzzing
- Static-analysis-only baselines
- Non-game-theoretic multi-agent pipelines

關鍵數據

方法	驗證漏洞數 / 預算	誤報率
VCAO	2.7×	68%↓
Coverage-only fuzzing	1×	80-90%
Static-analysis-only	1.9×	高誤報
Non-game-theoretic	1.4×	中等誤報

誤報率對比（接近人類審查者）

VCAO: 32% (接近人類)
Human reviewers: 28-35% (參考值)

結論：VCAO 的誤報率接近人類專業人員，但成本更低（自動化）。

架構對比：VCAO vs 傳統方法

VCAO 的優勢

動態預算分配：基於 Bayesian beliefs 實時調整
博弈論驅動：考慮攻擊者行為
多工具協同：靜態分析 + fuzzing + 消毒器
串聯驗證：降低誤報
理論保證：~O(√T) regret bound

傳統方法的缺陷

盲目測試：不知道攻擊者會攻擊哪裡
資源浪費：大量時間測試「低風險區域」
誤報率高：80-90% 都是假陽性
無策略性：不能根據證據更新策略

Tradeoff（權衡）

因素	VCAO	傳統方法
部署複雜度	高（需要 Bayesian 更新）	低
資源需求	中等（並行工具）	低（單一工具）
誤報率	32%	80-90%
理論保證	~O(√T)	無

適用場景：

VCAO：高價值目標（內核、安全系統）、需要精確率場景
傳統方法：低價值目標、快速篩選場景

實際部署指南

系統需求

硬體：

GPU: NVIDIA A100/A10（支持 CUDA）
RAM: 32GB+（運行多工具並行）
Storage: 100GB+（存儲快照）

軟體：

Linux Kernel Source（上游快照）
LLVM Static Analyzer
AFL/Fuzzers
Valgrind/Sanitizers
Python 3.10+

部署步驟

準備內核快照：

git clone https://github.com/torvalds/linux.git
git checkout <kernel-version>

安裝分析工具：

sudo apt-get install clang llvm valgrind afl

配置 VCAO：
- 設定每輪預算上限
- 配置 Bayesian belief 更新頻率
- 選擇工具組合

運行發現：

python3 run_vcao.py --budget 1000 --iterations 100

驗證結果：
- 對照 CVE database
- 交叉驗證證據
- 計算精確率/召回率

監控指標

關鍵指標：

每輪驗證漏洞數（validated vulnerabilities per round）
Bayesian belief 收斂速度
誤報率變化
工具調用次數

異常檢測：

如果 belief 收斂過快 → 可能過度自信
如果誤報率上升 → 可能工具配置錯誤

結論：從「工具」到「策略」

VCAO 不僅是一個漏洞發現工具，更是一個智能體協同框架。它展示了博弈論與大語言模型的結合如何創造新的能力：

不是盲目測試：基於 Bayesian beliefs 動態調整
不是單一工具：多工具串聯驗證
不是靜態策略：實時根據證據更新

2026 年的 AI Agent 正在從「工具」走向「策略制定者」，而 VCAO 正是這一趨勢的典型代表。

參考資料：

[arXiv:2604.08291] VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

Suyash Mishra, et al. (2026)

GitHub: github.com/microsoft/agent-governance-toolkit（相關 Runtime Governance 框架）

本文基於 2026 年的最新研究發布，結合實戰部署經驗，為 AI Agent 安全性提供實務指南。

日期: 2026年4月11日 | 类别: Cheese Evolution | 阅读时间: 22 分钟

Introduction: Why vulnerability discovery requires “game theory” agents

In 2026, AI Agents are evolving from “answering questions” to “performing tasks,” and vulnerability discovery has become one of the most dangerous tasks. The problem with traditional methods (static analysis, fuzz testing) is that they don’t know what they are looking for, leading to a lot of false positives and waste of resources.

This article introduces VCAO (Verifier-Centered Agentic Orchestration), a six-layer architecture that combines game theory and Large Language Model (LRM) for operating system vulnerability discovery. This is not a theoretical discussion, but a practical framework based on real CVE replays.

Core concept: Software vulnerability discovery in Stackelberg game

Problem definition

VCAO treats software vulnerability discovery as a repeated Stackelberg game:

LRM Orchestrator: Allocate analysis budget and determine “attack path”
External Verifiers: Static analyzers, fuzz tests, and sanitizers provide “evidence”
Bayesian Beliefs: Update beliefs about “hidden vulnerability status”
Strategic Attacker: Assuming the attacker’s expected payoff, LRM minimizes its payoff

To put it simply: LRM makes the optimal decision between “predicting which vulnerability an attacker will exploit” and “allocating analysis resources”.

Why is this important?

Traditional fuzzing: blindly tests all paths, consumes a lot of resources, and has a false positive rate of 80-90%
VCAO: Dynamically allocate budget based on attacker behavior, 2.7× more efficient, false positive rate reduced by 68% (close to human reviewer)

Six-layer architecture: complete design of VCAO

Layer 1: Surface Mapping

Function: Map operating system kernel files to the “attack surface”

Implementation details:

map kernel files → attack surfaces
Calculate the “Attack Difficulty Score” of each file
Pre-screening of “high-risk areas”

Layer 2: Intra-Kernel Attack-Graph Construction

Function: Build kernel-level attack graph (attack graph)

Key Technology:

Analyze file dependencies
Mark “attack path”
Identify “vulnerability chains”

Output:

kernel_file A → vulnerability chain A→B→C

Layer 3: Game-Theoretic File/Function Ranking

Function: Use Game Theory to rank “files most in need of analysis”

Algorithm:

LRM updates beliefs based on Bayesian beliefs
Each file gets a “strategic value score”
Top-K files are analyzed first

Formula:

Strategic Value(f) = P(vulnerability exists in f) × Expected Attack Impact(f)

Layer 4: Parallel Executor Agents

Feature: Parallel Execution Multiple Analysis Tools

Tool Type:

Static Analyzers (static analysis)
Fuzzers (fuzz testing)
Sanitizers

Execution Strategy:

Assign multiple analysis tools to the same file
Cross-validation results
Identify “evidence of consistency”

Layer 5: Cascaded Verification

Feature: Concatenated VerificationMultiple layers of evidence

Process:

Executor 1 → Evidence 1 → Verifier 1 → Verdict 1
Executor 2 → Evidence 2 → Verifier 2 → Verdict 2
...
Final Verdict = Majority Vote(Verdicts)

Advantages:

Reduce false positives from a single tool
Improve verification reliability -Support “requires human review” scenarios

Level 6: Safety Governor

Function: Safety Valve to prevent crossing the boundary

Protection Mechanism:

Resource cap limit
Tool call constraints
Preliminary review of dangerous operations

Monitoring indicators:

Analyze budget consumption in each round
Number of tool calls
Consistency of evidence

Theoretical Guarantee: Why it’s scientific, not just experimental

DOBSS-derived MILP

DOBSS (Dynamic Optimization for Bayesian Stackelberg Games) provides a Mixed Integer Linear Programming (MILP):

Objective function:

Minimize: E[Strategic Attacker Payoff]
Subject to:
- Budget constraints (各工具預算上限)
- Resource constraints (計算資源)
- Bayesian belief update rules

Theoretical Guarantee:

~O(√T) regret bound (increasing with T rounds)
Stability of Online Stackelberg learning

Two-layer optimization problem

Layer 1: Strategy selection for LRM (allocation of budget)
Layer 2: Attacker response (selected vulnerability)

Solution:

Stackelberg strategy: LRM predicts the attacker’s best response
Update “attacker beliefs” via Bayesian beliefs

Practical results: five Linux kernel subsystems

Experimental settings

Dataset: replay 847 historical CVEs
Test: live discovery on upstream snapshot
Baseline:
- Coverage-only fuzzing
- Static-analysis-only baselines
- Non-game-theoretic multi-agent pipelines

Key data

Method	Number of verification vulnerabilities / budget	False positive rate
VCAO	2.7×	68%↓
Coverage-only fuzzing	1×	80-90%
Static-analysis-only	1.9×	High false positives
Non-game-theoretic	1.4×	Moderate false positives

False positive rate comparison (close to human reviewers)

VCAO: 32% (close to humans)
Human reviewers: 28-35% (reference value)

Conclusion: VCAO’s false alarm rate is close to that of human professionals but costs less (automated).

Architecture comparison: VCAO vs traditional methods

Advantages of VCAO

Dynamic budget allocation: real-time adjustment based on Bayesian beliefs
Game Theory Driven: Consider attacker behavior
Multi-tool collaboration: static analysis + fuzzing + sterilizer
Series Verification: Reduce false positives
Theoretical Guarantee: ~O(√T) regret bound

Drawbacks of traditional methods

Blind Testing: Don’t know where the attacker will attack
Waste of resources: Spending a lot of time testing “low-risk areas”
High false positive rate: 80-90% are false positives
Unstrategic: Unable to update strategy based on evidence

Tradeoff

Factors	VCAO	Traditional Methods
Deployment Complexity	High (requires Bayesian update)	Low
Resource Requirements	Medium (parallel tools)	Low (single tool)
False Alarm Rate	32%	80-90%
Theoretical Guarantee	~O(√T)	None

Applicable scenarios:

VCAO: high-value targets (kernel, security system), scenarios that require accuracy
Traditional method: low-value targets, quick screening scenarios

Practical Deployment Guide

System requirements

Hardware:

GPU: NVIDIA A100/A10 (supports CUDA)
RAM: 32GB+ (running multiple tools in parallel)
Storage: 100GB+ (storage snapshot)

Software:

Linux Kernel Source (upstream snapshot)
LLVM Static Analyzer
AFL/Fuzzers
Valgrind/Sanitizers -Python 3.10+

Deployment steps

Prepare kernel snapshot:

git clone https://github.com/torvalds/linux.git
git checkout <kernel-version>

Install analysis tools:

sudo apt-get install clang llvm valgrind afl

Configure VCAO:
- Set a budget limit for each round
- Configure Bayesian belief update frequency
- Choose a tool set

Run Discovery:

python3 run_vcao.py --budget 1000 --iterations 100

Verification results:
- Check against CVE database
- Cross-validate evidence
- Calculate precision/recall

Monitoring indicators

Key Indicators:

validated vulnerabilities per round (validated vulnerabilities per round)
Bayesian belief convergence speed
False alarm rate changes
Number of tool calls

Anomaly Detection:

If belief converges too quickly → it may be overconfident
If the false alarm rate increases → the tool may be configured incorrectly

Conclusion: From “tools” to “strategies”

VCAO is not only a vulnerability discovery tool, but also an agent collaboration framework. It shows how combining game theory with large language models can create new capabilities:

Not a blind test: dynamic adjustment based on Bayesian beliefs
Not a single tool: multi-tool series verification
Not a static strategy: updated in real time based on evidence

AI Agent in 2026 is moving from “tool” to “strategist”, and VCAO is a typical representative of this trend.

参考资料：

[arXiv:2604.08291] VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

Suyash Mishra, et al. (2026)

GitHub: github.com/microsoft/agent-governance-toolkit（相關 Runtime Governance framework)

*This article is based on the latest research released in 2026, combined with actual deployment experience, to provide practical guidance for AI Agent security. *