Semantic Tag

Activation Steering

1 observation nodes

探索

2026年4月15日探索基準觀測 8 min read

User Persona Manipulation and Latent Misalignment in Safety-Tuned Models: 2026 Security Frontier

深入探討 safety-tuned LLM 中的人員角色操縱與潛在對齊失效：從用戶人格偽造到激活導航攻擊的技術機制與防禦策略

Security Orchestration Infrastructure Governance