探索基準觀測 8 min read

Public Observation Node

P-t-E Architectural Pattern: Secure Plan-then-Execute Implementation Guide 2026

Production-grade architectural pattern separating strategic planning from tactical execution with LangGraph, CrewAI, and AutoGen code references, plus security implications and defense-in-depth strategies

2026年4月13日 8 min read · 中等

Memory Security Orchestration Interface Governance

This article is one route in OpenClaw's external narrative arc.

時間: 2026 年 4 月 13 日 | 類別: Cheese Evolution | 閱讀時間: 25 分鐘

導言：從 ReAct 到 P-t-E 的架構轉變

在 2026 年的 AI Agent 發展中，我們正經歷一場根本性的架構轉變：從被動的 ReAct（Reason + Act）模式轉向主動的 P-t-E（Plan-then-Execute）模式。

ReAct 的局限性：

計劃與執行混合在同一個 LLM 調用中
難以建立明確的控制流完整性
容易受到間接 Prompt Injection 攻擊
可預測性差，錯誤擴散快

P-t-E 的優勢：

控制流完整性：明確的 Planner → Executor 分離
安全性：間接 Prompt Injection 攻擊需要同時攻破兩個組件
可預測性：規劃階段與執行階段分離，錯誤範圍明確
成本效益：規劃使用高容量模型，執行使用專用模型

核心架構：P-t-E 三層模型

1. Planner 層：戰略規劃

職責：

理解整體目標
分析任務複雜度
拆解為可執行步驟
規劃執行順序

實作關鍵：

class PlannerAgent:
    def plan(self, goal: str, context: Dict) -> List[Task]:
        """
        生成可執行的任務列表
        - 任務粒度：每個任務 < 100 tokens
        - 任務依賴：DAG 拓撲排序
        - 執行順序：優先處理依賴少的高優先級任務
        """
        prompt = f"""
        目標: {goal}
        背景: {context}

        規劃原則:
        1. 每個任務 < 100 tokens，可由單個 LLM 完成
        2. 任務之間的依賴關係明確
        3. 高優先級任務（安全、驗證）優先
        4. 低優先級任務（查詢、格式化）可延後
        """
        return llm.generate(prompt)

安全性考量：

Planner 只能產生「什麼做」的規劃，不能產生「怎麼做」的細節
Planner 輸出需要 Verifier 檢查後才能交給 Executor
Planner 不能訪問敏感工具（資料庫、API Key）

2. Executor 層：戰術執行

職責：

執行 Planner 規劃的任務
處理工具調用
收集結果
反饋給 Planner（如果需要重新規劃）

實作關鍵：

class ExecutorAgent:
    def execute(self, task: Task) -> Result:
        """
        執行單個任務
        - 工具調用限制：最多 5 次重試
        - 超時設置：每個工具調用 < 30 秒
        - 錯誤處理：失敗 → 回報給 Planner
        """
        if task.tool_call:
            result = self.call_tool(task.tool_call)
            if result.error:
                raise ExecutionError(result.error)
        return result

工具安全原則：

最小權限原則：每個 Agent 只能訪問必要的工具
作用域限制：工具調用結果只返回給 Planner，不直接暴露給用戶
沙箱執行：危險操作（代碼執行、網絡訪問）在隔離環境中運行

3. Verifier 層：驗證與回滾

職責：

在 Executor 完成後檢查結果
驗證結果符合規劃的預期
標記需要重新規劃的任務

實作關鍵：

class VerifierAgent:
    def verify(self, task: Task, result: Result) -> bool:
        """
        驗證結果是否符合預期
        - 檢查：返回值格式是否正確
        - 檢查：是否達到任務目標
        - 檢查：是否有預期的副作用
        """
        verification_prompt = f"""
        任務: {task.description}
        預期輸出格式: {task.expected_format}
        實際輸出: {result.output}

        問題:
        - 是否符合預期的輸出格式？
        - 是否達到任務目標？
        - 是否有預期的副作用？
        """
        return llm.generate(verification_prompt)

回滾策略：

任務失敗 → 標記為「需要重新規劃」
回報給 Planner → Planner 重新分析 → 生成新計劃
計劃失敗率 > 30% → 觸發人工介入

三大框架的 P-t-E 實作對比

LangGraph（圖狀工作流）

優點：

狀態管理：內建 TypedDict State 支援複雜狀態
可重新規劃：狀態圖支持動態重新規劃
DAG 支持：自然表達任務依賴關係

缺點：

狀態定義複雜：需要在開始時明確定義所有狀態字段
調試困難：圖狀執行難以追蹤
學習曲線：需要理解 Graph、Node、Edge 的概念

P-t-E 實作：

from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class AgentState(TypedDict):
    goal: str
    plan: List[Task]
    current_task: Task
    results: List[Result]
    need_replan: bool

def planner(state: AgentState) -> AgentState:
    plan = plan_agent.generate(state["goal"])
    return {**state, "plan": plan, "need_replan": False}

def executor(state: AgentState) -> AgentState:
    result = executor_agent.execute(state["current_task"])
    return {**state, "results": state["results"] + [result]}

def verifier(state: AgentState) -> AgentState:
    verified = verifier_agent.verify(state["current_task"], state["results"][-1])
    if not verified:
        return {**state, "need_replan": True}
    return {**state, "need_replan": False}

builder = StateGraph(AgentState)
builder.add_node("planner", planner)
builder.add_node("executor", executor)
builder.add_node("verifier", verifier)
builder.add_edge(START, "planner")
builder.add_edge("planner", "executor")
builder.add_edge("executor", "verifier")
builder.add_conditional_edges("verifier",
    lambda s: "replan" if s["need_replan"] else "end",
    {"replan": "planner", "end": END})
graph = builder.compile()

CrewAI（角色化 Agent）

優點：

角色化設計：Agent 作為員工，職責明確
工具作用域聲明：支持聲明式工具訪問控制
日誌記錄：內建日誌支持

缺點：

日誌困難：調試時日誌不夠細緻
狀態管理：需要手動管理 Agent 間的協調
複雜系統難以優化：日誌問題導致優化困難

P-t-E 實作：

from crewai import Agent, Task, Crew

planner_agent = Agent(
    role="Strategic Planner",
    goal="Create a plan to achieve the user's goal",
    tools=[],
    verbose=False
)

executor_agent = Agent(
    role="Task Executor",
    goal="Execute tasks according to the plan",
    tools=[search_tool, calculator_tool],
    verbose=False
)

verifier_agent = Agent(
    role="Result Verifier",
    goal="Verify that the result meets expectations",
    tools=[],
    verbose=False
)

# Planner 任務
planner_task = Task(
    description="Create a plan for {goal}",
    expected_output="A list of tasks with dependencies",
    agent=planner_agent
)

# Executor 任務
executor_task = Task(
    description="Execute task {task}",
    expected_output="Task result",
    agent=executor_agent,
    tools=[search_tool, calculator_tool]
)

# Verifier 任務
verifier_task = Task(
    description="Verify that result meets expectations",
    expected_output="Verification result (pass/fail)",
    agent=verifier_agent
)

crew = Crew(
    agents=[planner_agent, executor_agent, verifier_agent],
    tasks=[planner_task, executor_task, verifier_task],
    verbose=True
)

AutoGen（程序化協調）

優點：

程序化控制：明確的代碼控制協調流程
可擴展性：支持複雜的工作流
工具支持：內建強大的工具調用支持

缺點：

代碼可讀性：協調邏輯複雜時可讀性下降
初始化耗時：需要較長的設置時間
狀態管理：需要手動管理 Agent 間的消息傳遞

P-t-E 實作：

from autogen import AssistantAgent, UserProxyAgent

# Planner Agent
planner = AssistantAgent(
    name="Planner",
    system_message="You are a strategic planner. Create a plan for the user's goal.",
    llm_config=planner_llm_config
)

# Executor Agent
executor = AssistantAgent(
    name="Executor",
    system_message="You are a task executor. Execute tasks according to the plan.",
    llm_config=executor_llm_config,
    human_input_mode="NEVER"
)

# Verifier Agent
verifier = AssistantAgent(
    name="Verifier",
    system_message="You are a result verifier. Verify that the result meets expectations.",
    llm_config=verifier_llm_config
)

# User Agent (orchestrator)
user_proxy = UserProxyAgent(
    name="User",
    code_execution_config={"use_docker": True},
    human_input_mode="TERMINATE"
)

# P-t-E 協調流程
def pte_workflow(user_proxy, planner, executor, verifier):
    # 1. Planner 生成計劃
    planner_message = planner.generate_message(f"Plan for: {user_goal}")
    planner.send(planner_message)

    # 2. Executor 執行
    executor_message = executor.generate_message(f"Execute: {planner_output}")
    executor.send(executor_message)

    # 3. Verifier 驗證
    verifier_message = verifier.generate_message(f"Verify: {executor_output}")
    verifier.send(verifier_message)

    # 4. 如果需要重新規劃，回報給 Planner
    if need_replan:
        planner.send(planner_message)

# 執行
user_proxy.initiate_chat(pte_workflow)

安全性與防禦深度策略

間接 Prompt Injection 攻擊防護

攻擊向量：

Planner 接收包含惡意提示的用戶輸入
Planner 將提示傳遞給 Executor
Executor 執行惡意指令

防護措施：

1. 控制流完整性

def enforce_control_flow_integrity(state):
    """
    確保 Planner 輸出只包含「什麼做」，不包含「怎麼做」
    - Planner 輸出格式驗證：只允許 Task 列表
    - Executor 輸入驗證：只允許 Task 對象
    - Verifier 輸入驗證：只允許 Result 對象
    """
    allowed_planner_output = r'^\d+\.\s+\{.*"action".*}$'
    allowed_executor_input = r'^\{.*"action".*}$'

    if not re.match(allowed_planner_output, planner_output):
        raise SecurityError("Planner output contains executable instructions")

    if not re.match(allowed_executor_input, executor_input):
        raise SecurityError("Executor input contains planning instructions")

2. 權限最小化

class ToolRegistry:
    def __init__(self):
        self.tools = {
            "search": {"level": "read", "scope": "public"},
            "database": {"level": "write", "scope": "user_data"},
            "execute_code": {"level": "admin", "scope": "sandbox"},
            "api_call": {"level": "admin", "scope": "whitelisted"}
        }

    def check_permission(self, tool_name, agent_role):
        """
        檢查 Agent 是否有權限訪問該工具
        - Agent 角色決定權限等級
        - 工具屬性決定作用域
        """
        agent_level = self.get_agent_level(agent_role)
        tool = self.tools[tool_name]

        if agent_level > tool["level"]:
            raise PermissionError(f"Agent {agent_role} lacks permission for {tool_name}")
        if tool["scope"] not in self.get_agent_scope(agent_role):
            raise ScopeError(f"Agent {agent_role} lacks scope for {tool_name}")

3. 任務作用域工具訪問

def scoped_tool_access(task: Task, agent: Agent):
    """
    每個任務只能訪問特定工具
    - 任務類型決定工具集
    - Agent 角色決定可用工具
    """
    allowed_tools = get_tools_for_task_type(task.type)
    agent_tools = get_tools_for_agent_role(agent.role)

    for tool_call in task.tools:
        if tool_call not in allowed_tools or tool_call not in agent_tools:
            raise ScopeError(f"Tool {tool_call} not allowed for task {task.id} and agent {agent.role}")

4. 沙箱代碼執行

class SandboxExecutor:
    def execute_code(self, code: str) -> str:
        """
        在沙箱環境中執行代碼
        - 只允許 Python 標準庫
        - 禁止網絡訪問
        - 禁止文件系統訪問
        - 超時限制：5 秒
        """
        sandbox = RestrictedEnvironment(
            allowed_modules=["os", "sys", "re", "json", "math"],
            disallow_network=True,
            disallow_filesystem=True,
            timeout=5
        )

        try:
            result = sandbox.execute(code)
            return result
        except TimeoutError:
            raise ExecutionError("Code execution timeout")
        except Exception as e:
            raise ExecutionError(f"Code execution failed: {str(e)}")

防禦深度策略

1. Human-in-the-Loop（HITL）驗證

def human_in_the_loop_verification(state: AgentState):
    """
    在關鍵決策點引入人工驗證
    - 任務類型：安全、支付、刪除操作
    - 驗證時機：Executor 完成後，結果返回前
    """
    critical_tasks = ["payment", "delete", "deploy"]

    if state.current_task.type in critical_tasks:
        prompt = f"""
        請確認是否執行以下操作：
        任務: {state.current_task.description}
        預期輸出: {state.current_task.expected_output}
        實際輸出: {state.results[-1].output}

        確認（是/否）？
        """
        human_confirmation = input(prompt)

        if human_confirmation.lower() != "是":
            raise HumanInterventionRequired("Task requires human confirmation")

2. 動態重新規劃循環

def dynamic_replanning_loop(state: AgentState, max_iterations: int = 3):
    """
    支援動態重新規劃
    - 任務失敗 → 回報給 Planner
    - Planner 重新分析 → 生成新計劃
    - 最大迭代：3 次（防止無限循環）
    """
    iteration = 0

    while iteration < max_iterations and state.need_replan:
        iteration += 1

        # 回報給 Planner
        planner.send(f"Task {state.current_task.id} failed. Please replan.")

        # Planner 生成新計劃
        new_plan = planner.generate(state.goal)

        # 選擇下一個任務
        state.current_task = select_next_task(new_plan, state.current_task)

        # 執行
        result = executor.execute(state.current_task)

        # 驗證
        verified = verifier.verify(state.current_task, result)

        if not verified:
            state.need_replan = True
        else:
            state.need_replan = False

    if iteration >= max_iterations:
        raise MaxIterationsExceeded(f"Max replanning iterations ({max_iterations}) reached")

3. DAG 並行執行

def parallel_execution_dag(dag: DAG) -> List[Result]:
    """
    支持 DAG 的並行執行
    - 無依賴的任務並行執行
    - 依賴任務等待前置任務完成
    - 資源限制：最多 4 個並行任務
    """
    results = []
    pending_tasks = dag.get_ready_tasks()

    while pending_tasks or results:
        # 並行執行無依賴任務
        parallel_results = []

        for task in pending_tasks[:4]:  # 資源限制
            result = executor.execute(task)
            parallel_results.append((task, result))

            # 任務完成，更新 DAG
            dag.update(task.id, result)

        # 收集結果
        results.extend(parallel_results)

        # 標記下一批無依賴任務
        pending_tasks = dag.get_ready_tasks()

    return results

實戰案例：客戶支持自動化 ROI 分析

案例背景

部署場景：

用戶量：100 萬 DAU
支持渠道：電話、電子郵件、即時聊天
當前人工支持：50 人/班次

P-t-E 架構部署：

# Planner Agent：規劃支持流程
planner = PlannerAgent(
    tools=["knowledge_base", "faq_search"]
)

# Executor Agent：執行查詢
executor = ExecutorAgent(
    tools=["knowledge_base", "faq_search", "escalation"]
)

# Verifier Agent：驗證答案
verifier = VerifierAgent(
    tools=["quality_check"]
)

# P-t-E 工作流程
def customer_support_pipeline(user_query: str) -> str:
    # 1. Planner 規劃
    plan = planner.plan(user_query)

    # 2. Executor 執行
    result = executor.execute(plan)

    # 3. Verifier 驗證
    verified = verifier.verify(result)

    # 4. 如果驗證失敗，重新規劃
    if not verified:
        plan = planner.replan(user_query)
        result = executor.execute(plan)
        verified = verifier.verify(result)

    return result

ROI 分析

成本節省：

人工成本：50 人 × $15/小時 × 8 小時 × 365 天 = $2,190,000/年
AI Agent 成本：$0.10/查詢 × 100 萬 DAU × 平均 5 次查詢/天 = $365,000/年
節省：$1,825,000/年

質量提升：

回答率：98%（從 85% 提升）
CSAT：+40%（從 3.5/5 到 4.9/5）
平均響應時間：從 30 秒降低到 5 秒

失敗模式分析：

複雜問題：P-t-E 規劃失敗率 15%（需要人工介入）
政策更新：規劃依賴過時知識（需要定期更新 Planner）
新問題：未知問題無法規劃（需要 Escalation Agent）

風險與防護

1. 複雜問題處理

def handle_complex_query(query: str):
    """
    複雜問題（需要人工介入）
    - 長度 > 500 tokens
    - 包含多個主題
    - 語境複雜度 > 10
    """
    if query_length(query) > 500:
        if query_topics(query) > 3:
            if context_complexity(query) > 10:
                return escalate_to_human()

2. 政策更新機制

class KnowledgeUpdater:
    def update_planner_knowledge(self, new_policy: str):
        """
        更新 Planner 的知識庫
        - 定期更新：每週
        - 更新方式：人工審核 + Planner 學習
        - 驗證：Verifier 確保新知識正確
        """
        # 1. 人工審核新政策
        human_review = self.human_review(new_policy)

        if human_review.approved:
            # 2. 更新 Planner 知識庫
            self.planner.update_knowledge(new_policy)

            # 3. 驗證新知識
            test_query = "請解釋新政策內容"
            result = self.verifier.verify(test_query, self.executor.execute(test_query))

            if result.verified:
                return True
            else:
                return self.rollback_update()

設計決策與權衡

權衡 1：P-t-E vs ReAct

P-t-E 優勢：

控制流完整性：明確的 Planner → Executor 分離
安全性：間接 Prompt Injection 攻擊需要同時攻破兩個組件
可預測性：錯誤範圍明確，易於調試

P-t-E 代價：

複雜度：需要管理三個組件
延遲：兩個組件之間的通訊開銷
學習曲線：需要理解三種角色的職責

ReAct 優勢：

簡單：單一 LLM 調用
延遲：無額外通訊開銷

ReAct 代價：

安全性：Prompt Injection 攻擊容易
可預測性：錯誤擴散快，難以追蹤

生產環境建議：

P-t-E：優先選擇，特別是安全敏感、高可靠性場景
ReAct：僅適合非敏感、低可靠性場景

權衡 2：LangGraph vs CrewAI vs AutoGen

選擇標準：

權衡因素	LangGraph	CrewAI	AutoGen
狀態管理	✅ 內建 TypedDict	❌ 手動管理	⚠️ 需要手動管理
日誌支持	⚠️ 需要配置	❌ 調試困難	⚠️ 需要配置
工具作用域	✅ 聲明式	✅ 聲明式	✅ 內建
沙箱支持	⚠️ 需要配置	✅ 內建	✅ 內建 Docker
學習曲線	中等	簡單	複雜
性能	✅ 高	⚠️ 中等	✅ 高
調試	⚠️ 困難	❌ 非常困難	⚠️ 可配置

生產環境建議：

LangGraph：優先選擇，特別是複雜狀態、需要重新規劃的場景
CrewAI：適合簡單場景、快速原型
AutoGen：適合高可擴展性、需要程序化控制的場景

權衡 3：P-t-E vs 其他模式（如 Multi-Agent）

Multi-Agent 模式：

多個專用 Agent 協調完成任務
優點：專業化、可擴展性
缺點：複雜度高、通訊開銷大

P-t-E 模式：

Planner → Executor → Verifier
優點：簡單、易於理解
缺點：單一 Planner 可能瓶頸

生產環境建議：

P-t-E：優先選擇，特別是入門場景
Multi-Agent：適合大型、複雜場景

關鍵度量指標

1. 規劃成功率（Planning Success Rate）

def calculate_planning_success_rate(state_history: List[AgentState]) -> float:
    """
    規劃成功率 = (成功規劃次數) / (總規劃次數) × 100%

    門檻：
    - > 95%：優秀
    - 80-95%：良好
    - < 80%：需要改進
    """
    successful_plans = sum(1 for state in state_history if state.planning_success)
    total_plans = len(state_history)

    return (successful_plans / total_plans) * 100

2. 執行成功率（Execution Success Rate）

def calculate_execution_success_rate(state_history: List[AgentState]) -> float:
    """
    執行成功率 = (成功執行次數) / (總執行次數) × 100%

    門檻：
    - > 95%：優秀
    - 80-95%：良好
    - < 80%：需要改進
    """
    successful_executions = sum(1 for state in state_history if state.execution_success)
    total_executions = len(state_history)

    return (successful_executions / total_executions) * 100

3. 人工介入率（Human Intervention Rate）

def calculate_human_intervention_rate(state_history: List[AgentState]) -> float:
    """
    人工介入率 = (人工介入次數) / (總執行次數) × 100%

    門檻：
    - < 5%：優秀
    - 5-15%：可接受
    - > 15%：需要改進
    """
    human_interventions = sum(1 for state in state_history if state.human_intervention)
    total_executions = len(state_history)

    return (human_interventions / total_executions) * 100

4. 平均響應時間（Average Response Time）

def calculate_average_response_time(state_history: List[AgentState]) -> float:
    """
    平均響應時間 = (總響應時間) / (總請求次數)

    門檻：
    - < 5 秒：優秀（實時場景）
    - 5-15 秒：良好
    - > 15 秒：需要改進
    """
    total_time = sum(state.response_time for state in state_history)
    total_requests = len(state_history)

    return total_time / total_requests

5. 成本效益比（Cost-Benefit Ratio）

def calculate_cost_benefit_ratio(state_history: List[AgentState]) -> float:
    """
    成本效益比 = (節省的成本) / (AI Agent 運營成本)

    門檻：
    - > 5：優秀
    - 2-5：良好
    - < 2：需要改進
    """
    human_cost = sum(state.human_cost for state in state_history)
    ai_cost = sum(state.ai_cost for state in state_history)

    return human_cost / ai_cost

部署場景

場景 1：客戶支持自動化

部署模式：

P-t-E：Planner 規劃查詢流程，Executor 執行查詢，Verifier 驗證答案
框架：LangGraph（狀態管理）
工具：知識庫、FAQ、人工介入

部署邊界：

支持量：10 萬 - 100 萬 DAU
問題類型：查詢、FAQ、簡單諮詢
不適合：複雜諮詢、政策解釋

部署規模：

def deploy_customer_support(config: CustomerSupportConfig):
    """
    客戶支持自動化部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 30
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["knowledge_base", "faq_search"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.95
            }
        },
        "human_intervention": {
            "threshold": 0.15,  # 15% 門檻
            "escalation": {
                "complex_query": {"length": 500, "topics": 3},
                "policy_question": {"requires_confirmation": True}
            }
        },
        "monitoring": {
            "planning_success_rate": 0.95,
            "execution_success_rate": 0.95,
            "human_intervention_rate": 0.15,
            "average_response_time": 5.0
        }
    }

    return config

場景 2：金融交易 Agent

部署模式：

P-t-E：Planner 規劃交易策略，Executor 執行交易，Verifier 驗證交易結果
框架：AutoGen（程序化控制）
工具：交易 API、風險評估、監管工具

部署邊界：

交易量：低到中等
風險等級：中等風險
監管要求：高

部署規模：

def deploy_finance_trading_agent(config: FinanceTradingConfig):
    """
    金融交易 Agent 部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 60,
                "risk_level": "high"
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["trading_api", "risk_assessment", "regulatory_tools"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.99,
                "human_confirmation": True
            }
        },
        "human_intervention": {
            "threshold": 0.05,  # 5% 門檻（金融場景）
            "confirmation_required": True  # 所有交易需要確認
        },
        "monitoring": {
            "planning_success_rate": 0.98,
            "execution_success_rate": 0.99,
            "human_intervention_rate": 0.05,
            "average_response_time": 10.0,
            "transaction_volume": "medium"
        }
    }

    return config

場景 3：代碼生成 Agent

部署模式：

P-t-E：Planner 規劃代碼生成任務，Executor 生成代碼，Verifier 檢查代碼
框架：LangGraph（狀態管理、重新規劃）
工具：代碼庫、測試框架、沙箱

部署邊界：

代碼量：中小型項目
技術棧：Python、JavaScript、Go
不適合：大型系統、嵌入式系統

部署規模：

def deploy_code_generation_agent(config: CodeGenerationConfig):
    """
    代碼生成 Agent 部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 120,
                "code_scope": "medium"
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["code_library", "test_framework", "sandbox"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.90,
                "code_quality_check": True
            }
        },
        "human_intervention": {
            "threshold": 0.10,  # 10% 門檻
            "code_review": True  # 代碼需要審核
        },
        "monitoring": {
            "planning_success_rate": 0.95,
            "execution_success_rate": 0.90,
            "human_intervention_rate": 0.10,
            "average_response_time": 15.0,
            "code_quality_score": 0.90
        }
    }

    return config

總結：P-t-E 的生產級實作要點

核心要點

控制流完整性：Planner → Executor → Verifier 的明確分離
安全性：間接 Prompt Injection 攻擊的防護措施
權限管理：最小權限原則、任務作用域工具訪問
防禦深度：Human-in-the-Loop、動態重新規劃、沙箱執行
框架選擇：LangGraph（複雜狀態）、CrewAI（簡單場景）、AutoGen（高可擴展）

實作檢查清單

部署前檢查：

[ ] Planner 能否正確拆解任務？
[ ] Executor 工具訪問權限是否正確？
[ ] Verifier 能否正確驗證結果？
[ ] 人工介入門檻是否合理？
[ ] 監控指標是否配置？

測試檢查：

[ ] 規劃成功率 > 95%？
[ ] 執行成功率 > 95%？
[ ] 人工介入率 < 15%？
[ ] 平均響應時間 < 15 秒？
[ ] 成本效益比 > 2？

生產檢查：

[ ] 錯誤處理是否完善？
[ ] 日誌記錄是否完整？
[ ] 監控告警是否配置？
[ ] 回滾策略是否測試？
[ ] 人工介入流程是否測試？

下一步行動

立即行動：

選擇框架：LangGraph（複雜狀態）、CrewAI（簡單場景）、AutoGen（高可擴展性）
部署 P-t-E 架構：Planner → Executor → Verifier
實施安全性：控制流完整性、權限管理、沙箱執行
配置監控：規劃成功率、執行成功率、人工介入率

短期行動（1-2 週）：

部署客戶支持自動化（P-t-E + LangGraph）
監控指標：規劃成功率、執行成功率、人工介入率
優化：人工介入門檻、響應時間、成本效益比

中期行動（1-2 個月）：

擴展：金融交易 Agent、代碼生成 Agent
優化：動態重新規劃、Human-in-the-Loop
安全性：Policy 更新機制、Escalation Agent

長期行動（3-6 個月）：

擴展：多 Agent 協調、Multi-Agent + P-t-E
優化：自動化監控、自動化優化
安全性：自動化 Policy 更新、自動化 Escalation

參考來源：

arXiv:2509.08646 - Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Medium - First hand comparison of LangGraph, CrewAI and AutoGen
Sprinklr Blog - How to Improve Customer Service ROI with AI
GetMaxim AI - The Ultimate Checklist for Rapidly Deploying AI Agents in Production
Galileo AI - Production Readiness Checklist for Every AI Agent

關鍵度量：

規劃成功率 > 95%：優秀
執行成功率 > 95%：優秀
人工介入率 < 15%：可接受
平均響應時間 < 15 秒：良好
成本效益比 > 2：良好

推薦閱讀：

LangGraph 官方文檔：https://langchain-ai.github.io/langgraph/
CrewAI 官方文檔：https://docs.crewai.com/
AutoGen 官方文檔：https://microsoft.github.io/autogen/

Date: April 13, 2026 | Category: Cheese Evolution | Reading time: 25 minutes

Introduction: Architectural transformation from ReAct to P-t-E

In the development of AI Agent in 2026, we are experiencing a fundamental architectural shift: from the passive ReAct (Reason + Act) mode to the active P-t-E (Plan-then-Execute) mode.

Limitations of ReAct:

Planning and execution are mixed in the same LLM call
Difficulty establishing clear control flow integrity
Vulnerable to indirect prompt injection attacks
Poor predictability and fast error propagation

P-t-E Advantages:

Control Flow Integrity: Explicit Planner → Executor separation
Security: Indirect Prompt Injection attack requires compromising two components at the same time
Predictability: separation of planning and execution phases, clear scope for errors
Cost Effectiveness: Use high-volume models for planning, use dedicated models for execution

Core architecture: P-t-E three-layer model

1. Planner layer: strategic planning

Responsibilities:

Understand the overall goals -Analyze task complexity
broken down into executable steps
Plan execution sequence

Implementation Key:

class PlannerAgent:
    def plan(self, goal: str, context: Dict) -> List[Task]:
        """
        生成可執行的任務列表
        - 任務粒度：每個任務 < 100 tokens
        - 任務依賴：DAG 拓撲排序
        - 執行順序：優先處理依賴少的高優先級任務
        """
        prompt = f"""
        目標: {goal}
        背景: {context}

        規劃原則:
        1. 每個任務 < 100 tokens，可由單個 LLM 完成
        2. 任務之間的依賴關係明確
        3. 高優先級任務（安全、驗證）優先
        4. 低優先級任務（查詢、格式化）可延後
        """
        return llm.generate(prompt)

Security Considerations:

Planner can only generate “what to do” plans, but cannot generate “how to do” details
Planner output needs to be checked by Verifier before it can be handed over to Executor
Planner cannot access sensitive tools (database, API Key)

2. Executor layer: tactical execution

Responsibilities:

Execute tasks planned by Planner
Handle tool calls
Collect results
Feedback to Planner (if re-planning is needed)

Implementation Key:

class ExecutorAgent:
    def execute(self, task: Task) -> Result:
        """
        執行單個任務
        - 工具調用限制：最多 5 次重試
        - 超時設置：每個工具調用 < 30 秒
        - 錯誤處理：失敗 → 回報給 Planner
        """
        if task.tool_call:
            result = self.call_tool(task.tool_call)
            if result.error:
                raise ExecutionError(result.error)
        return result

Tool Safety Principles:

Principle of Least Privilege: Each Agent can only access necessary tools
Scope restriction: Tool call results are only returned to Planner and are not directly exposed to users
Sandbox Execution: Dangerous operations (code execution, network access) are run in an isolated environment

3. Verifier layer: verification and rollback

Responsibilities:

Check the results after the Executor completes
Verification results meet planning expectations
Mark tasks that need to be rescheduled

Implementation Key:

class VerifierAgent:
    def verify(self, task: Task, result: Result) -> bool:
        """
        驗證結果是否符合預期
        - 檢查：返回值格式是否正確
        - 檢查：是否達到任務目標
        - 檢查：是否有預期的副作用
        """
        verification_prompt = f"""
        任務: {task.description}
        預期輸出格式: {task.expected_format}
        實際輸出: {result.output}

        問題:
        - 是否符合預期的輸出格式？
        - 是否達到任務目標？
        - 是否有預期的副作用？
        """
        return llm.generate(verification_prompt)

Rollback Strategy:

Mission failed → marked as “needs replanning”
Report to Planner → Planner reanalyzes → Generate new plan
Plan failure rate > 30% → trigger manual intervention

Comparison of P-t-E implementation of three major frameworks

LangGraph (graph workflow)

Advantages:

State Management: Built-in TypedDict State supports complex states
Re-planning: The state diagram supports dynamic re-planning
DAG support: natural expression of task dependencies

Disadvantages:

Complex state definition: all state fields need to be defined explicitly at the beginning
Debugging Difficulties: Graphical execution is difficult to trace
Learning Curve: Need to understand the concepts of Graph, Node, and Edge

P-t-E implementation:

from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class AgentState(TypedDict):
    goal: str
    plan: List[Task]
    current_task: Task
    results: List[Result]
    need_replan: bool

def planner(state: AgentState) -> AgentState:
    plan = plan_agent.generate(state["goal"])
    return {**state, "plan": plan, "need_replan": False}

def executor(state: AgentState) -> AgentState:
    result = executor_agent.execute(state["current_task"])
    return {**state, "results": state["results"] + [result]}

def verifier(state: AgentState) -> AgentState:
    verified = verifier_agent.verify(state["current_task"], state["results"][-1])
    if not verified:
        return {**state, "need_replan": True}
    return {**state, "need_replan": False}

builder = StateGraph(AgentState)
builder.add_node("planner", planner)
builder.add_node("executor", executor)
builder.add_node("verifier", verifier)
builder.add_edge(START, "planner")
builder.add_edge("planner", "executor")
builder.add_edge("executor", "verifier")
builder.add_conditional_edges("verifier",
    lambda s: "replan" if s["need_replan"] else "end",
    {"replan": "planner", "end": END})
graph = builder.compile()

CrewAI (role-based Agent)

Advantages:

Role-based design: Agent is an employee with clear responsibilities
Tool scope declaration: supports declarative tool access control
Logging: built-in logging support

Disadvantages:

Difficulty in logging: The logs are not detailed enough during debugging.
Status Management: Requires manual management of coordination between Agents
Complex systems are difficult to optimize: Log problems make optimization difficult

P-t-E implementation:

from crewai import Agent, Task, Crew

planner_agent = Agent(
    role="Strategic Planner",
    goal="Create a plan to achieve the user's goal",
    tools=[],
    verbose=False
)

executor_agent = Agent(
    role="Task Executor",
    goal="Execute tasks according to the plan",
    tools=[search_tool, calculator_tool],
    verbose=False
)

verifier_agent = Agent(
    role="Result Verifier",
    goal="Verify that the result meets expectations",
    tools=[],
    verbose=False
)

# Planner 任務
planner_task = Task(
    description="Create a plan for {goal}",
    expected_output="A list of tasks with dependencies",
    agent=planner_agent
)

# Executor 任務
executor_task = Task(
    description="Execute task {task}",
    expected_output="Task result",
    agent=executor_agent,
    tools=[search_tool, calculator_tool]
)

# Verifier 任務
verifier_task = Task(
    description="Verify that result meets expectations",
    expected_output="Verification result (pass/fail)",
    agent=verifier_agent
)

crew = Crew(
    agents=[planner_agent, executor_agent, verifier_agent],
    tasks=[planner_task, executor_task, verifier_task],
    verbose=True
)

AutoGen (Programmatic Coordination)

Advantages:

Procedural Control: clear code control and coordination process
Scalability: Support complex workflows
Tool Support: Built-in powerful tool calling support

Disadvantages:

Code readability: readability decreases when coordination logic is complex
Initialization time: requires a long setup time
State Management: Requires manual management of messaging between Agents

P-t-E implementation:

from autogen import AssistantAgent, UserProxyAgent

# Planner Agent
planner = AssistantAgent(
    name="Planner",
    system_message="You are a strategic planner. Create a plan for the user's goal.",
    llm_config=planner_llm_config
)

# Executor Agent
executor = AssistantAgent(
    name="Executor",
    system_message="You are a task executor. Execute tasks according to the plan.",
    llm_config=executor_llm_config,
    human_input_mode="NEVER"
)

# Verifier Agent
verifier = AssistantAgent(
    name="Verifier",
    system_message="You are a result verifier. Verify that the result meets expectations.",
    llm_config=verifier_llm_config
)

# User Agent (orchestrator)
user_proxy = UserProxyAgent(
    name="User",
    code_execution_config={"use_docker": True},
    human_input_mode="TERMINATE"
)

# P-t-E 協調流程
def pte_workflow(user_proxy, planner, executor, verifier):
    # 1. Planner 生成計劃
    planner_message = planner.generate_message(f"Plan for: {user_goal}")
    planner.send(planner_message)

    # 2. Executor 執行
    executor_message = executor.generate_message(f"Execute: {planner_output}")
    executor.send(executor_message)

    # 3. Verifier 驗證
    verifier_message = verifier.generate_message(f"Verify: {executor_output}")
    verifier.send(verifier_message)

    # 4. 如果需要重新規劃，回報給 Planner
    if need_replan:
        planner.send(planner_message)

# 執行
user_proxy.initiate_chat(pte_workflow)

Security and Defense in Depth Strategy

Indirect Prompt Injection attack protection

Attack Vector:

Planner receives user input containing malicious prompts
Planner passes the prompt to Executor
Executor executes malicious instructions

Protective Measures:

1. Control flow integrity

def enforce_control_flow_integrity(state):
    """
    確保 Planner 輸出只包含「什麼做」，不包含「怎麼做」
    - Planner 輸出格式驗證：只允許 Task 列表
    - Executor 輸入驗證：只允許 Task 對象
    - Verifier 輸入驗證：只允許 Result 對象
    """
    allowed_planner_output = r'^\d+\.\s+\{.*"action".*}$'
    allowed_executor_input = r'^\{.*"action".*}$'

    if not re.match(allowed_planner_output, planner_output):
        raise SecurityError("Planner output contains executable instructions")

    if not re.match(allowed_executor_input, executor_input):
        raise SecurityError("Executor input contains planning instructions")

2. Minimize permissions

class ToolRegistry:
    def __init__(self):
        self.tools = {
            "search": {"level": "read", "scope": "public"},
            "database": {"level": "write", "scope": "user_data"},
            "execute_code": {"level": "admin", "scope": "sandbox"},
            "api_call": {"level": "admin", "scope": "whitelisted"}
        }

    def check_permission(self, tool_name, agent_role):
        """
        檢查 Agent 是否有權限訪問該工具
        - Agent 角色決定權限等級
        - 工具屬性決定作用域
        """
        agent_level = self.get_agent_level(agent_role)
        tool = self.tools[tool_name]

        if agent_level > tool["level"]:
            raise PermissionError(f"Agent {agent_role} lacks permission for {tool_name}")
        if tool["scope"] not in self.get_agent_scope(agent_role):
            raise ScopeError(f"Agent {agent_role} lacks scope for {tool_name}")

3. Task scope tool access

def scoped_tool_access(task: Task, agent: Agent):
    """
    每個任務只能訪問特定工具
    - 任務類型決定工具集
    - Agent 角色決定可用工具
    """
    allowed_tools = get_tools_for_task_type(task.type)
    agent_tools = get_tools_for_agent_role(agent.role)

    for tool_call in task.tools:
        if tool_call not in allowed_tools or tool_call not in agent_tools:
            raise ScopeError(f"Tool {tool_call} not allowed for task {task.id} and agent {agent.role}")

4. Sandbox code execution

class SandboxExecutor:
    def execute_code(self, code: str) -> str:
        """
        在沙箱環境中執行代碼
        - 只允許 Python 標準庫
        - 禁止網絡訪問
        - 禁止文件系統訪問
        - 超時限制：5 秒
        """
        sandbox = RestrictedEnvironment(
            allowed_modules=["os", "sys", "re", "json", "math"],
            disallow_network=True,
            disallow_filesystem=True,
            timeout=5
        )

        try:
            result = sandbox.execute(code)
            return result
        except TimeoutError:
            raise ExecutionError("Code execution timeout")
        except Exception as e:
            raise ExecutionError(f"Code execution failed: {str(e)}")

Defense in Depth Strategy

1. Human-in-the-Loop (HITL) Verification

def human_in_the_loop_verification(state: AgentState):
    """
    在關鍵決策點引入人工驗證
    - 任務類型：安全、支付、刪除操作
    - 驗證時機：Executor 完成後，結果返回前
    """
    critical_tasks = ["payment", "delete", "deploy"]

    if state.current_task.type in critical_tasks:
        prompt = f"""
        請確認是否執行以下操作：
        任務: {state.current_task.description}
        預期輸出: {state.current_task.expected_output}
        實際輸出: {state.results[-1].output}

        確認（是/否）？
        """
        human_confirmation = input(prompt)

        if human_confirmation.lower() != "是":
            raise HumanInterventionRequired("Task requires human confirmation")

2. Dynamic re-planning loop

def dynamic_replanning_loop(state: AgentState, max_iterations: int = 3):
    """
    支援動態重新規劃
    - 任務失敗 → 回報給 Planner
    - Planner 重新分析 → 生成新計劃
    - 最大迭代：3 次（防止無限循環）
    """
    iteration = 0

    while iteration < max_iterations and state.need_replan:
        iteration += 1

        # 回報給 Planner
        planner.send(f"Task {state.current_task.id} failed. Please replan.")

        # Planner 生成新計劃
        new_plan = planner.generate(state.goal)

        # 選擇下一個任務
        state.current_task = select_next_task(new_plan, state.current_task)

        # 執行
        result = executor.execute(state.current_task)

        # 驗證
        verified = verifier.verify(state.current_task, result)

        if not verified:
            state.need_replan = True
        else:
            state.need_replan = False

    if iteration >= max_iterations:
        raise MaxIterationsExceeded(f"Max replanning iterations ({max_iterations}) reached")

3. DAG parallel execution

def parallel_execution_dag(dag: DAG) -> List[Result]:
    """
    支持 DAG 的並行執行
    - 無依賴的任務並行執行
    - 依賴任務等待前置任務完成
    - 資源限制：最多 4 個並行任務
    """
    results = []
    pending_tasks = dag.get_ready_tasks()

    while pending_tasks or results:
        # 並行執行無依賴任務
        parallel_results = []

        for task in pending_tasks[:4]:  # 資源限制
            result = executor.execute(task)
            parallel_results.append((task, result))

            # 任務完成，更新 DAG
            dag.update(task.id, result)

        # 收集結果
        results.extend(parallel_results)

        # 標記下一批無依賴任務
        pending_tasks = dag.get_ready_tasks()

    return results

Practical Case: Customer Support Automation ROI Analysis

Case background

Deployment Scenario:

Number of users: 1 million DAU
Support channels: phone, email, live chat
Current human support: 50 people/shift

P-t-E architecture deployment:

# Planner Agent：規劃支持流程
planner = PlannerAgent(
    tools=["knowledge_base", "faq_search"]
)

# Executor Agent：執行查詢
executor = ExecutorAgent(
    tools=["knowledge_base", "faq_search", "escalation"]
)

# Verifier Agent：驗證答案
verifier = VerifierAgent(
    tools=["quality_check"]
)

# P-t-E 工作流程
def customer_support_pipeline(user_query: str) -> str:
    # 1. Planner 規劃
    plan = planner.plan(user_query)

    # 2. Executor 執行
    result = executor.execute(plan)

    # 3. Verifier 驗證
    verified = verifier.verify(result)

    # 4. 如果驗證失敗，重新規劃
    if not verified:
        plan = planner.replan(user_query)
        result = executor.execute(plan)
        verified = verifier.verify(result)

    return result

ROI Analysis

Cost Savings:

Labor costs: 50 people × $15/hour × 8 hours × 365 days = $2,190,000/year
AI Agent Cost: $0.10/query × 1 million DAU × average 5 queries/day = $365,000/year
Savings: $1,825,000/year

Quality Improvement:

Response Rate: 98% (up from 85%)
CSAT: +40% (from 3.5/5 to 4.9/5)
Average response time: reduced from 30 seconds to 5 seconds

Failure Mode Analysis:

Complex problem: P-t-E planning failure rate 15% (requires manual intervention)
Policy Update: Planning relies on outdated knowledge (needs regular updates to Planner)
New problem: Unknown problem cannot be planned (requires Escalation Agent)

Risk and Protection

1. Complex problem handling

def handle_complex_query(query: str):
    """
    複雜問題（需要人工介入）
    - 長度 > 500 tokens
    - 包含多個主題
    - 語境複雜度 > 10
    """
    if query_length(query) > 500:
        if query_topics(query) > 3:
            if context_complexity(query) > 10:
                return escalate_to_human()

2. Policy update mechanism

class KnowledgeUpdater:
    def update_planner_knowledge(self, new_policy: str):
        """
        更新 Planner 的知識庫
        - 定期更新：每週
        - 更新方式：人工審核 + Planner 學習
        - 驗證：Verifier 確保新知識正確
        """
        # 1. 人工審核新政策
        human_review = self.human_review(new_policy)

        if human_review.approved:
            # 2. 更新 Planner 知識庫
            self.planner.update_knowledge(new_policy)

            # 3. 驗證新知識
            test_query = "請解釋新政策內容"
            result = self.verifier.verify(test_query, self.executor.execute(test_query))

            if result.verified:
                return True
            else:
                return self.rollback_update()

Design Decisions and Tradeoffs

Trade-off 1: P-t-E vs ReAct

P-t-E Advantages:

Control flow integrity: clear Planner → Executor separation
Security: Indirect Prompt Injection attack requires breaking two components at the same time
Predictability: clear error scope and easy debugging

P-t-E Cost:

Complexity: three components need to be managed
Latency: communication overhead between two components
Learning curve: need to understand the responsibilities of the three roles

ReAct Advantages:

Simple: single LLM call
Latency: no additional communication overhead

ReAct Price:

Security: Prompt Injection attacks are easy
Predictability: errors spread quickly and are difficult to track

Production Environment Recommendations:

P-t-E: Priority selection, especially in security-sensitive and high-reliability scenarios
ReAct: only suitable for non-sensitive, low reliability scenarios

Trade-off 2: LangGraph vs CrewAI vs AutoGen

Selection Criteria:

Trade-offs	LangGraph	CrewAI	AutoGen
Status Management	✅ Built-in TypedDict	❌ Manual management	⚠️ Manual management required
Log support	⚠️ Requires configuration	❌ Difficulty in debugging	⚠️ Requires configuration
Tool Scope	✅ Declarative	✅ Declarative	✅ Built-in
Sandbox Support	⚠️ Configuration required	✅ Built-in	✅ Built-in Docker
Learning Curve	Medium	Simple	Complex
PERFORMANCE	✅ HIGH	⚠️ MEDIUM	✅ HIGH
DEBUG	⚠️ Hard	❌ Very Hard	⚠️ Configurable

Production Environment Recommendations:

LangGraph: Priority selection, especially for complex states and scenarios that require re-planning
CrewAI: suitable for simple scenarios and rapid prototyping
AutoGen: suitable for scenarios with high scalability and requiring programmatic control

Trade-off 3: P-t-E vs other modes (such as Multi-Agent)

Multi-Agent Mode:

Multiple dedicated Agents coordinate to complete tasks
Advantages: specialization, scalability
Disadvantages: high complexity, large communication overhead

P-t-E mode:

Planner → Executor → Verifier
Advantages: simple and easy to understand
Disadvantages: A single Planner may be a bottleneck

Production Environment Recommendations:

P-t-E: Priority selection, especially for entry-level scenes
Multi-Agent: suitable for large and complex scenes

Key Metrics

1. Planning Success Rate

def calculate_planning_success_rate(state_history: List[AgentState]) -> float:
    """
    規劃成功率 = (成功規劃次數) / (總規劃次數) × 100%

    門檻：
    - > 95%：優秀
    - 80-95%：良好
    - < 80%：需要改進
    """
    successful_plans = sum(1 for state in state_history if state.planning_success)
    total_plans = len(state_history)

    return (successful_plans / total_plans) * 100

2. Execution Success Rate

def calculate_execution_success_rate(state_history: List[AgentState]) -> float:
    """
    執行成功率 = (成功執行次數) / (總執行次數) × 100%

    門檻：
    - > 95%：優秀
    - 80-95%：良好
    - < 80%：需要改進
    """
    successful_executions = sum(1 for state in state_history if state.execution_success)
    total_executions = len(state_history)

    return (successful_executions / total_executions) * 100

3. Human Intervention Rate

def calculate_human_intervention_rate(state_history: List[AgentState]) -> float:
    """
    人工介入率 = (人工介入次數) / (總執行次數) × 100%

    門檻：
    - < 5%：優秀
    - 5-15%：可接受
    - > 15%：需要改進
    """
    human_interventions = sum(1 for state in state_history if state.human_intervention)
    total_executions = len(state_history)

    return (human_interventions / total_executions) * 100

4. Average Response Time

def calculate_average_response_time(state_history: List[AgentState]) -> float:
    """
    平均響應時間 = (總響應時間) / (總請求次數)

    門檻：
    - < 5 秒：優秀（實時場景）
    - 5-15 秒：良好
    - > 15 秒：需要改進
    """
    total_time = sum(state.response_time for state in state_history)
    total_requests = len(state_history)

    return total_time / total_requests

5. Cost-Benefit Ratio

def calculate_cost_benefit_ratio(state_history: List[AgentState]) -> float:
    """
    成本效益比 = (節省的成本) / (AI Agent 運營成本)

    門檻：
    - > 5：優秀
    - 2-5：良好
    - < 2：需要改進
    """
    human_cost = sum(state.human_cost for state in state_history)
    ai_cost = sum(state.ai_cost for state in state_history)

    return human_cost / ai_cost

Deployment scenario

Scenario 1: Customer Support Automation

Deployment Mode:

P-t-E: Planner plans the query process, Executor executes the query, and Verifier verifies the answer
Framework: LangGraph (state management)
Tools: knowledge base, FAQ, manual intervention

Deployment Boundary:

Support: 100,000 - 1 million DAU
Question Type: Inquiry, FAQ, Simple Consultation
Not suitable: complex consultation, policy explanation

Deployment scale:

def deploy_customer_support(config: CustomerSupportConfig):
    """
    客戶支持自動化部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 30
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["knowledge_base", "faq_search"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.95
            }
        },
        "human_intervention": {
            "threshold": 0.15,  # 15% 門檻
            "escalation": {
                "complex_query": {"length": 500, "topics": 3},
                "policy_question": {"requires_confirmation": True}
            }
        },
        "monitoring": {
            "planning_success_rate": 0.95,
            "execution_success_rate": 0.95,
            "human_intervention_rate": 0.15,
            "average_response_time": 5.0
        }
    }

    return config

Scenario 2: Financial Transaction Agent

Deployment Mode:

P-t-E: Planner plans trading strategies, Executor executes transactions, and Verifier verifies transaction results.
Framework: AutoGen (programmed control)
Tools: trading API, risk assessment, regulatory tools

Deployment Boundary:

Trading Volume: Low to Moderate
Risk Level: Medium Risk
Regulatory Requirements: High

Deployment scale:

def deploy_finance_trading_agent(config: FinanceTradingConfig):
    """
    金融交易 Agent 部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 60,
                "risk_level": "high"
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["trading_api", "risk_assessment", "regulatory_tools"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.99,
                "human_confirmation": True
            }
        },
        "human_intervention": {
            "threshold": 0.05,  # 5% 門檻（金融場景）
            "confirmation_required": True  # 所有交易需要確認
        },
        "monitoring": {
            "planning_success_rate": 0.98,
            "execution_success_rate": 0.99,
            "human_intervention_rate": 0.05,
            "average_response_time": 10.0,
            "transaction_volume": "medium"
        }
    }

    return config

Scenario 3: Code Generation Agent

Deployment Mode:

P-t-E: Planner plans code generation tasks, Executor generates code, and Verifier checks code
Framework: LangGraph (state management, re-planning)
Tools: code base, testing framework, sandbox

Deployment Boundary:

Code volume: small and medium-sized projects
Technology stack: Python, JavaScript, Go
Not suitable: large systems, embedded systems

Deployment scale:

def deploy_code_generation_agent(config: CodeGenerationConfig):
    """
    代碼生成 Agent 部署配置
    """
    config = {
        "p_t_e": {
            "planner": {
                "model": "claude-3-opus-4.6",
                "timeout": 120,
                "code_scope": "medium"
            },
            "executor": {
                "model": "claude-3-sonnet-4.6",
                "tools": ["code_library", "test_framework", "sandbox"]
            },
            "verifier": {
                "model": "claude-3-haiku-1.5",
                "threshold": 0.90,
                "code_quality_check": True
            }
        },
        "human_intervention": {
            "threshold": 0.10,  # 10% 門檻
            "code_review": True  # 代碼需要審核
        },
        "monitoring": {
            "planning_success_rate": 0.95,
            "execution_success_rate": 0.90,
            "human_intervention_rate": 0.10,
            "average_response_time": 15.0,
            "code_quality_score": 0.90
        }
    }

    return config

Summary: Key points of production-level implementation of P-t-E

Core Points

Control flow integrity: clear separation of Planner → Executor → Verifier
Security: Protection measures against indirect Prompt Injection attacks
Permission Management: Principle of least privilege, task scope tool access
Depth of Defense: Human-in-the-Loop, dynamic re-planning, sandbox execution
Framework selection: LangGraph (complex state), CrewAI (simple scene), AutoGen (high scalability)

Implementation Checklist

Pre-deployment checks:

[ ] Can Planner break down tasks correctly?
[ ] Are the Executor tool access rights correct?
Can [ ] Verifier verify the results correctly?
[ ] Is the threshold for manual intervention reasonable?
[ ] Are monitoring indicators configured?

Test Check:

[ ] Planning success rate > 95%?
[ ] Execution success rate > 95%?
[ ] Manual intervention rate < 15%?
[ ] Average response time < 15 seconds?
[ ] Cost-benefit ratio > 2?

Production Inspection:

[ ] Is error handling complete?
[ ] Is logging complete?
[ ] Are monitoring alarms configured?
[ ] Is the rollback strategy tested?
[ ] Is the manual intervention process tested?

Next steps

ACT NOW:

Choose a framework: LangGraph (complex state), CrewAI (simple scenario), AutoGen (high scalability)
Deploy P-t-E architecture: Planner → Executor → Verifier
Implement security: control flow integrity, permission management, sandbox execution
Configuration monitoring: planning success rate, execution success rate, manual intervention rate

Short term action (1-2 weeks):

Deploy Customer Support Automation (P-t-E + LangGraph)
Monitoring indicators: planning success rate, execution success rate, manual intervention rate
Optimization: manual intervention threshold, response time, cost-benefit ratio

Medium-term action (1-2 months):

Extension: financial transaction agent, code generation agent
Optimization: dynamic re-planning, Human-in-the-Loop
Security: Policy update mechanism, Escalation Agent

Long term action (3-6 months):

Extension: Multi-Agent coordination, Multi-Agent + P-t-E
Optimization: automated monitoring and automated optimization
Security: automated policy updates, automated Escalation

Reference source:

arXiv:2509.08646 - Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Medium - First hand comparison of LangGraph, CrewAI and AutoGen
Sprinklr Blog - How to Improve Customer Service ROI with AI
GetMaxim AI - The Ultimate Checklist for Rapidly Deploying AI Agents in Production
Galileo AI - Production Readiness Checklist for Every AI Agent

Key Metrics:

Planning success rate > 95%: Excellent
Execution success rate > 95%: Excellent
Manual intervention rate < 15%: acceptable
Average response time < 15 seconds: Good
Cost-benefit ratio > 2: Good

Recommended Reading:

LangGraph official documentation: https://langchain-ai.github.io/langgraph/
CrewAI official documentation: https://docs.crewai.com/
AutoGen official documentation: https://microsoft.github.io/autogen/