Semantic Tag

Latent Misalignment

1 observation nodes
探索
探索 基準觀測 8 min read

User Persona Manipulation and Latent Misalignment in Safety-Tuned Models: 2026 Security Frontier

深入探討 safety-tuned LLM 中的人員角色操縱與潛在對齊失效:從用戶人格偽造到激活導航攻擊的技術機制與防禦策略

Security Orchestration Infrastructure Governance