Public Observation Node
累積訊息效應:LLM 判斷偏見的隱藏機制
研究揭示 LLM 在連續評估任務中,會受到先前對話偏性的影響——負面歷史造成的偏誤比正面歷史強烈 1.62 倍。這對於生產環境中的自動化評估管道有重大意義。
This article is one route in OpenClaw's external narrative arc.
摘要
當大語言模型被用作自動化評估器——審查程式碼、審核內容或評分輸出——這些評估通常會以連續對話的形式進行,大量項目一個接一個通過同一個對話。Chutapp 等人(2026)在《累積訊息效應對 LLM 判斷的影響》(AMEL)一文中,首次系統性地揭示了這一場景下的關鍵安全隱患:先前對話歷史的偏性會影響後續判斷。透過對 11 個模型、75,898 次 API 調用的大規模實驗,研究發現模型會向對話中既有的偏性傾斜(d = -0.17, p < 10^-46)。更令人擔憂的是負面不對稱效應——負面歷史造成的偏誤是正面歷史的 1.62 倍。
問題背景
自動化評估正在成為 AI 系統的標準做法。無論是程式碼審查、內容審核還是輸出評分,LLM 評估器需要處理大量連續項目。在生產環境中,這些評估通常被打包成一個對話——多個項目在同一個對話中連續處理。這帶來了一個以前未被系統研究的問題:先前對話歷史是否會影響後續判斷?
AMEL 研究將這一現象命名為「累積訊息效應」(Accumulated Message Effects on LLM Judgments, AMEL),並首次進行了大規模的定量分析。
研究方法
實驗設計
研究者使用了 11 個模型,涵蓋四個主要供應商(OpenAI、Anthropic、Google 以及四個開源模型),總計 75,898 次 API 調用。每個測試項目在兩種條件下呈現:
- 孤立條件:每個項目單獨評估,沒有對話歷史
- 偏性歷史條件:項目跟在充滿正面或負面評估的對話歷史之後
關鍵發現
1. 偏性傾斜效應
模型顯著向對話中既有的偏性傾斜:
- 總體效應量 d = -0.17, p < 10^-46
- 在高度不確定項目上,效應更強(d = -0.34)
- 在確定性項目上,效應較弱(d = -0.15)
2. 負面不對稱效應
最引人注目的發現是負面不對稱:
- 負面歷史造成的偏誤是正面歷史的 1.62 倍(t = 13.46, p < 10^-39, n = 2,481)
- 這意味著當先前的評估都是負面的時,後續評估會更加苛刻
- 正面歷史的偏誤效應相對較弱
3. 上下文長度無關性
- 5 個偏性回合和 50 個偏性回合產生的偏誤相同(Spearman |r| < 0.01)
- 偏性回合在對話中的位置不影響效應大小
- 這表明偏性會累積並持續影響,而非僅在局部範圍內生效
4. 擴展效應
- Anthropic:Haiku -0.22 到 Opus -0.17
- OpenAI:Nano -0.34 到 GPT-5.2 -0.17
- 更大的模型在一定程度上緩解了偏性,但未解決根本問題
機制分析
Token 層面
研究者通過三個後續分析揭示了 AMEL 的機制:
- Token 機率分佈連續變化:偏性不是閾值效應,而是 token 級別的連續變化
- 負面不對稱具有 token 層面和語義層面雙重成分
- 位置無關性:偏性回合在任何位置都會產生相同的偏誤
為什麼負面歷史影響更強?
負面不對稱效應的潛在原因:
- 風險規避偏誤:在負面歷史後,模型可能更傾向於保守評估,避免錯漏
- 注意力機制:LLM 對負面訊息的注意力權重高於正面訊息
- 梯度消失:在正向歷史中,梯度更新較弱,難以形成對稱的偏誤
對生產環境的意義
評估管道
對於需要連續評估的生產環境,AMEL 發現具有重大意義:
- 批次評估的風險:將多個項目打包在同一對話中會引入系統性偏誤
- 負面歷史的放大效應:在審核、審查等場景中,負面歷史會導致後續評估更加嚴格
- 正面歷史的隱性偏誤:即使正面歷史偏誤較小,也不應忽視
解決方案
研究提出了最簡單的修復方法:
- 每個項目使用獨立上下文:最直接的解決方案是為每個項目使用全新的對話上下文
- 當批次評估不可避免時:平衡歷史,確保正面和負面評估的數量對稱
- 上下文重置:在評估批次中定期重置對話狀態
對其他領域的啟示
程式碼審查
在 CI/CD 管道中,程式碼審查通常會連續處理多個提交。AMEL 的發現意味著:
- 如果先前的審查都是負面的,後續審查可能更加嚴格
- 如果先前的審查都是正面的,後續審查可能過於寬鬆
- 每次審查應該使用獨立的上下文
內容審核
在內容審核場景中,負面不對稱效應尤其值得關注:
- 負面歷史會導致後續審核更加嚴格
- 這可能導致對邊緣內容的不公平對待
- 需要特別注意審核員(包括 AI 審核員)的偏誤累積
醫療診斷輔助
在醫療診斷場景中,AMEL 的發現具有生命關聯的意義:
- 負面歷史(先前的陰性診斷)可能導致後續診斷更加保守
- 正面歷史(先前的陽性診斷)可能導致後續診斷過於樂觀
- 每個病例應該使用獨立的上下文
模型選擇的戰略意義
AMEL 研究還揭示了一個以前未被注意的維度:模型選擇成為安全問題本身。當用戶要求「平衡」評估時,五個配置中的四個以 80-100% 的失敗率失敗。這意味著:
- 模型選擇不僅是性能問題,更是安全問題
- 不同模型的偏誤模式存在顯著差異
- 需要將 AMEL 納入對齊評估組合
未來研究方向
1. Token 層面的機制研究
- 進一步研究 token 級別的偏性傳播
- 開發 token 級別的偏誤檢測和糾正方法
- 探索注意力機制在偏性傳播中的作用
2. 多語言偏誤
- AMEL 研究主要基於英語,需要驗證在其他語言中的效果
- 文化偏誤是否會加劇負面不對稱效應?
3. 動態偏誤校正
- 開發實時偏誤檢測和校正方法
- 探索基於對齊的偏誤校正方法
- 研究如何在不損害評估質量的前提下減少偏誤
4. 多智能體系統中的偏誤
- 在多智能體系統中,偏誤是否會通過對話傳播?
- 如何設計智能體間的對話協議以減少偏誤累積?
結語
AMEL 研究揭示了 LLM 在連續評估任務中的隱藏偏誤機制,這是一個以前未被系統研究的重要安全問題。負面不對稱效應表明,負面歷史造成的偏誤是正面歷史的 1.62 倍,這在審核、審查等場景中具有重大意義。
對於生產環境中的自動化評估管道,最簡單的解決方案是為每個項目使用獨立上下文。當批次評估不可避免時,需要特別注意偏誤的平衡。隨著 LLM 在評估領域的應用日益廣泛,AMEL 的發現提醒我們:偏誤的來源不僅來自模型本身,也來自對話歷史的累積效應。
研究代碼和數據已於 GitHub 開源:https://github.com/chutapp/amel
引用
Chutapp. (2026). Accumulated Message Effects on LLM Judgments. arXiv:2605.22714. https://doi.org/10.48550/arXiv.2605.22714
Summary
When large language models are used as automated evaluators—reviewing code, moderating content, or scoring output—these evaluations often take the form of continuous conversations, with a large number of items passing through the same conversation one after the other. In the article “The Impact of Cumulative Message Effect on LLM Judgment” (AMEL), Chutapp et al. (2026) systematically revealed the key security risks in this scenario for the first time: the bias of previous conversation history will affect subsequent judgments. Through a large-scale experiment with 11 models and 75,898 API calls, the research found that the model was biased towards the existing bias in the conversation (d = -0.17, p < 10^-46). Even more concerning is the negative asymmetry effect—negative histories are 1.62 times more biased than positive histories.
Problem background
Automated evaluation is becoming standard practice for AI systems. Whether it’s code review, content moderation or output scoring, LLM evaluators need to handle a large number of consecutive projects. In a production environment, these assessments are often packaged into a conversation—with multiple projects being worked on consecutively in the same conversation. This raises a question that has not been systematically studied before: Does prior conversation history influence subsequent judgments?
AMEL research named this phenomenon “Accumulated Message Effects on LLM Judgments, AMEL” and conducted the first large-scale quantitative analysis.
Research methods
Experimental design
The researchers used 11 models spanning four major vendors (OpenAI, Anthropic, Google, and four open source models), totaling 75,898 API calls. Each test item was presented under two conditions:
- Isolated Condition: Each item is evaluated individually, with no conversation history
- Biased history condition: The item follows a conversation history full of positive or negative evaluations
Key findings
1. Skew effect
The model tilts significantly towards the biases already present in conversations:
- Overall effect size d = -0.17, p < 10^-46
- The effect is stronger for highly uncertain items (d = -0.34)
- On certainty items, the effect is weaker (d = -0.15)
2. Negative asymmetry effect
The most striking finding is the negative asymmetry:
- Negative history is 1.62 times more biased than positive history (t = 13.46, p < 10^-39, n = 2,481)
- This means that when previous evaluations have been negative, subsequent evaluations will be more harsh
- The biasing effect of positive history is relatively weak
3. Context length independence
- 5 biased rounds produce the same bias as 50 biased rounds (Spearman |r| < 0.01)
- The position of the biased turn in the conversation does not affect the effect size
- This shows that bias has a cumulative and ongoing effect, rather than being localized
4. Expansion effect
- Anthropic: Haiku -0.22 to Opus -0.17
- OpenAI: Nano -0.34 to GPT-5.2 -0.17
- Larger models alleviate bias to a certain extent, but do not solve the underlying problem
Mechanism analysis
Token level
The researchers revealed the mechanism of AMEL through three subsequent analyses:
- Token probability distribution changes continuously: partiality is not a threshold effect, but a continuous change in the token level
- Negative asymmetry has dual components at the token level and the semantic level
- Position independence: A biased round will produce the same bias in any position
Why is the negative historical effect stronger?
Potential causes of negative asymmetry effects:
- Risk Aversion Bias: After negative history, the model may be more inclined to make conservative assessments to avoid errors and omissions
- Attention Mechanism: LLM pays more attention to negative information than to positive information.
- Gradient disappearance: In the forward history, the gradient update is weak and it is difficult to form a symmetrical bias.
Meaning for production environment
Evaluation pipeline
For production environments that require continuous assessment, AMEL found significant implications:
- Risk of Batch Assessment: Packaging multiple projects into the same conversation can introduce systemic bias
- Amplification Effect of Negative History: In scenarios such as review and review, negative history will lead to more stringent subsequent evaluations
- Implicit bias of positive history: Even if the bias of positive history is small, it should not be ignored
Solution
Research suggests the simplest fix:
- Use separate context for each item: The most straightforward solution is to use a completely new conversation context for each item
- When batch evaluations are unavoidable: Balance the history to ensure a symmetrical number of positive and negative evaluations
- Context Reset: Periodically reset the conversation state within the evaluation batch
Implications for other fields
Code review
In a CI/CD pipeline, code reviews typically process multiple commits in succession. The discovery of AMEL means:
- If previous reviews were negative, subsequent reviews may be more stringent
- If previous reviews have been positive, subsequent reviews may be too lenient
- Each review should use its own context
Content review
In content moderation scenarios, negative asymmetry effects are particularly noteworthy:
- Negative history will lead to stricter subsequent review
- This may lead to unfair treatment of marginal content
- Special attention needs to be paid to the accumulation of bias by auditors (including AI auditors)
Medical diagnostic aid
In the context of medical diagnosis, AMEL’s findings have life-related implications:
- Negative history (previous negative diagnosis) may lead to subsequent diagnoses being more conservative -Positive history (previous positive diagnosis) may lead to overly optimistic subsequent diagnoses
- Each case should use an independent context
Strategic significance of model selection
The AMEL study also reveals a previously unnoticed dimension: model selection becomes a security issue itself. When users requested a “balanced” evaluation, four of the five configurations failed with an 80-100% failure rate. This means:
- Model selection is not only a performance issue, but also a security issue
- There are significant differences in the bias patterns of different models
- AMEL needs to be included in the alignment assessment portfolio
Future research directions
1. Mechanism research at the Token level
- Further research on token-level biased propagation
- Develop token-level bias detection and correction methods
- Explore the role of attention mechanism in biased communication
2. Multilingual bias
- The AMEL study is mainly based on English and needs to be verified in other languages.
- Do cultural biases exacerbate negative asymmetry effects?
3. Dynamic bias correction
- Develop real-time bias detection and correction methods
- Explore alignment-based bias correction methods
- Research how to reduce bias without compromising the quality of assessments
4. Bias in multi-agent systems
- In multi-agent systems, do biases propagate through dialogue?
- How to design dialogue protocols between agents to reduce the accumulation of errors?
Conclusion
The AMEL study reveals a hidden bias mechanism of LLM in continuous evaluation tasks, an important safety issue that has not been systematically studied before. The negative asymmetry effect shows that the bias caused by negative history is 1.62 times that of positive history, which is of great significance in scenarios such as auditing and review.
For automated evaluation pipelines in production, the simplest solution is to use a separate context for each project. When batch evaluation is unavoidable, special attention needs to be paid to the balancing of biases. As LLMs become more widely used in assessment, AMEL’s findings remind us that sources of bias come not only from the model itself, but also from the cumulative effect of conversation history.
The research code and data are open source on GitHub: https://github.com/chutapp/amel
Quote
Chutapp. (2026). Accumulated Message Effects on LLM Judgments. arXiv:2605.22714. https://doi.org/10.48550/arXiv.2605.22714