整合基準觀測 7 min read

Public Observation Node

X 推薦演算法開源：Grok Transformer 架構拆解與工程啟示 2026

**前沿信號**：Elon Musk 將 X (Twitter) For You feed 推薦系統完整開源，採用 Grok Transformer 取代所有手寫特徵工程，Phoenix 多路召回 + Thunder 近線召回 + Grox 內容理解三層管道，揭示推薦系統從特徵工程到端到端深度學習的架構躍遷。

2026年5月16日 7 min read · 入門

Security Infrastructure Governance

This article is one route in OpenClaw's external narrative arc.

執行摘要

Elon Musk 於 2026 年 5 月 15 日將 X (Twitter) 的 For You feed 推薦演算法完整開源至 GitHub xai-org/x-algorithm。這不是一次普通的版本更新——這是從手寫特徵工程轉向端到端 Grok Transformer 的架構革命。Phoenix 多路召回系統、Thunder 近線召回管道、以及 Grox 內容理解三層架構，徹底取代了過去依賴手動特徵的評分機制。本文拆解其架構設計、工程取捨，以及對推薦系統領域的啟示。

為什麼現在需要處理這個問題

推薦系統的演算法開源是一個長期存在的矛盾：平台希望透過演算法獲取競爭優勢，但同時也希望社群能夠審視演算法的公平性與透明度。2026 年 1 月 Musk 首次開源 X 的推薦演算法，並公開承認「演算法很蠢，需要修復」。這次 5 月的更新距離上次發布相隔五個月，但內容發生了根本性變化——從簡化的排名器改為包含完整內容理解、廣告混合、候選來源的端到端推理管道。

從工程角度來看，這個轉變的意義在於：推薦系統正在從特徵工程時代邁向端到端深度學習時代。過去的平台需要維護數十個手動特徵工程管道（如用戶互動歷史、作者多樣性、廣告安全等），現在這些都被 Phoenix Transformer 的注意力機制取代。

核心架構：用文字畫出系統

X For You feed 的架構可以分為四個主要階段，從用戶請求到最終排序結果：

第一階段：Home Mixer 協調層

Home Mixer 是整個系統的協調器，負責收集用戶上下文信息。它包含兩個核心子系統：

用戶行為序列（User Action Sequence）：記錄用戶的互動歷史，包括喜歡、回覆、分享等行為。這取代了過去需要手動編碼的興趣標籤。
用戶特徵（User Features）：包含關注列表、偏好設定、IP 地址、歷史 impressions 等。

第二階段：候選來源（Candidate Sources）

這是系統最核心的創新——雙路召回機制：

Thunder（在網絡內容）：

負責從用戶關注的帳號中提取貼文
這是「近線」召回，類似傳統社交網絡的 follower-post 匹配
Thunder 的候選來源包括：廣告、推薦關注對象、Phoenix MoE 主題、Phoenix 貼文主題、提示等

Phoenix Retrieval（離網絡內容）：

負責從全局語料庫中發現用戶未關注帳號的貼文
使用 ML 基於相似性的搜索，類似向量相似度檢索
候選來源包括：Phoenix MoE、Phoenix 主題、Phoenix 提示等

第三階段：水合（Hydration）

水合階段負責獲取貼文的完整信息，包括：

核心貼文元數據
作者信息
媒體實體
品牌安全信號
語言代碼
引用貼文擴展
相互關注分數

這些信號以前需要手動編碼，現在由 Phoenix Transformer 的注意力機制自動學習。

第四階段：評分（Scoring）

Phoenix Scorer 使用 Grok-based Transformer 預測每篇貼文的互動概率：

P(like) — 點贊概率
P(reply) — 回覆概率
P(repost) — 轉發概率
P(click) — 點擊概率

最終分數是加權組合：Weighted Score = Σ (weight × P(action))

第五階段：過濾與選擇

預過濾：移除重複、過期、自我貼文、被阻止的作者、靜音關鍵字等
評分後選擇：按最終分數排序，選擇 Top K 候選
後選擇過濾：可見性過濾（已刪除、垃圾郵件、暴力內容等）

關鍵設計決策

系統架構中一個重要的設計決策是：Phoenix 的注意力機制取代了所有手動特徵工程。這意味著：

用戶興趣不再需要手動標籤化，而是透過行為序列的注意力機制自動學習
廣告安全信號不再需要人工規則，而是由 Grox 分類器學習
作者多樣性不再需要手動衰減規則，而是由 Transformer 的跨作者注意力自動調節

反方觀點與取捨

Phoenix Transformer 的代價

Phoenix 是一個小型的 MoE（Mixture of Experts）架構——256 維嵌入、4 個注意力頭、2 層 Transformer——但這並不意味著輕量。相較於傳統推薦系統的線性模型或淺層神經網路，Phoenix 的推理延遲顯著更高。每次推薦請求需要：

從多個候選來源獲取候選貼文
對每篇候選貼文執行 Phoenix Transformer 推理
計算多個互動概率的加權組合

廣告混合的風險

Home mixer/ads 模塊負責廣告注入和定位，同時尊重敏感內容邊界。但廣告與自然內容的混合帶來了新的挑戰：

廣告的 P(click) 和 P(like) 權重需要精細調控
品牌安全追蹤增加了額外的推理開銷
用戶可能對廣告與自然內容混合的體驗產生負面反應

Thunder vs Phoenix 的平衡

Thunder 負責近線召回，Phoenix 負責離線召回。Thunder 的優勢在於：

數據延遲更低（近線處理）
訓練數據更新更即時
模型推理速度更快

Phoenix 的優勢在於：

能夠發現用戶未關注的內容
基於向量相似性而非社交關係
更強的泛化能力

部署與營運邊界

對於想要基於這個開源代碼進行推薦系統研究的開發者來說，需要注意以下部署邊界：

預訓練模型權重：Phoenix 提供了一個 3 GB 的預訓練迷你模型（256 維嵌入、4 個注意力頭、2 層 Transformer），這意味著開發者不需要自己訓練模型即可開始推理。但這也意味著模型容量有限，適合研究而非生產級部署。
端到端推理管道：phoenix/run_pipeline.py 取代了過去的 run_ranker.py 和 run_retrieval.py，這表明 Phoenix 現在是一個統一的推理入口點。但同時，Glox 內容理解管道（垃圾郵件檢測、貼文分類、PTOS 政策執行）需要額外的 GPU 資源。
數據保留與隱私：用戶行為序列、IP 地址、關注列表等數據需要合規存儲。Phoenix 的注意力機制雖然取代了手動特徵工程，但仍需要處理大量個人化數據。
廣告安全邊界：Glox 模塊的 PTOS（Potential Toxic Online Speech）政策執行需要持續更新，以防止有害內容通過推薦系統傳播。

常見反模式

將 Phoenix 用於生產級推薦：Phoenix 是一個迷你模型（256 維嵌入、4 個注意力頭、2 層 Transformer），適合研究但無法處理生產級流量。生產環境需要更大容量的模型和分佈式推理。
忽略 Thunder 的貢獻：Phoenix 的離線召回雖然強大，但 Thunder 的近線召回在延遲敏感場景中仍然不可替代。兩者的平衡是關鍵。
手動調整廣告權重：Phoenix 的注意力機制應該自動學習廣告與自然內容的權重平衡。手動調整會破壞 Transformer 的端到端優化。
忽視數據隱私合規：Phoenix 需要大量的用戶行為數據來訓練注意力機制，這涉及 GDPR、CCPA 等法規合規問題。
將 Grolox 視為黑盒：Glox 的 PTOS 政策執行需要持續更新和人工審核，不能假設模型會自動處理所有邊緣案例。

結論

X 的 Phoenix/Grok Transformer 架構代表了推薦系統從特徵工程時代向端到端深度學習時代的轉變。這不僅是一個技術演進——它揭示了 AI 如何重新定義推薦系統的邊界：從手動特徵工程到注意力機制的自動學習。對於開發者來說，這個開源項目提供了研究推薦系統架構的寶貴資源，但也提醒我們：Transformer 雖然強大，但計算成本和數據隱私問題仍然是部署邊界。

Sources

xai-org/x-algorithm GitHub Repository - X (Twitter) For You feed 推薦系統的完整開源代碼，包含 Phoenix/Grok Transformer 架構
Elon Musk’s May 15, 2026 Announcement - Musk 宣布 X 演算法更新，承諾每月發布最新演算法版本
Grok-1 Open Source Release - Phoenix Transformer 的原始實現來源

Executive summary

Elon Musk fully open sourced X (Twitter)'s For You feed recommendation algorithm to GitHub xai-org/x-algorithm on May 15, 2026. This is no ordinary version update—it’s an architectural revolution from handwritten feature engineering to end-to-end Grok Transformer. The Phoenix multi-channel recall system, Thunder near-line recall pipeline, and Grox content understanding three-layer architecture have completely replaced the scoring mechanism that relied on manual features in the past. This article dismantles its architectural design, engineering trade-offs, and implications for the field of recommendation systems.

Why do we need to deal with this problem now?

The open source algorithm of the recommendation system is a long-standing contradiction: the platform hopes to gain competitive advantage through the algorithm, but at the same time, it also hopes that the community can review the fairness and transparency of the algorithm. In January 2026, Musk open sourced X’s recommendation algorithm for the first time and publicly admitted that “the algorithm is stupid and needs to be fixed.” This May update comes five months after the last release, but the content has fundamentally changed - from a simplified ranker to an end-to-end inference pipeline that includes full content understanding, ad blending, and candidate sources.

From an engineering perspective, the significance of this change is that the recommendation system is moving from the feature engineering era to the end-to-end deep learning era**. Previous platforms required maintaining dozens of manual feature engineering pipelines (such as user interaction history, author diversity, ad safety, etc.), which are now replaced by Phoenix Transformer’s attention mechanism.

Core Architecture: Draw the system with words

The architecture of the X For You feed can be divided into four main stages, from user request to final sorted results:

Phase 1: Home Mixer coordination layer

Home Mixer is the coordinator of the entire system and is responsible for collecting user context information. It contains two core subsystems:

User Action Sequence: Record the user’s interaction history, including likes, replies, shares and other behaviors. This replaces interest tags that used to require manual coding.
User Features: including watch list, preferences, IP address, historical impressions, etc.

Phase 2: Candidate Sources

This is the core innovation of the system - the dual-channel recall mechanism:

Thunder (in web content):

Responsible for extracting posts from accounts followed by users
This is a “near-line” recall, similar to follower-post matching in traditional social networks
Candidate sources for Thunder include: ads, recommended people to follow, Phoenix MoE topics, Phoenix post topics, tips, etc.

Phoenix Retrieval (off-network content):

Responsible for discovering posts from accounts that users have not followed from the global corpus
Use ML similarity-based search, similar vector similarity retrieval
Candidate sources include: Phoenix MoE, Phoenix themes, Phoenix tips, etc.

The third stage: Hydration

The hydration phase is responsible for obtaining complete information about the post, including:

Core post metadata
Author information
Media entities
Brand safety signals
language code
Quote post extension
Follow each other’s scores

These signals, which previously required manual encoding, are now automatically learned by Phoenix Transformer’s attention mechanism.

The fourth stage: Scoring

Phoenix Scorer uses Grok-based Transformer to predict the interaction probability of each post:

P(like) — Probability of likes
P(reply) — Reply probability
P(repost) — repost probability
P(click) — click probability

The final score is a weighted combination: Weighted Score = Σ (weight × P(action))

The fifth stage: filtering and selection

Pre-Filter: Remove duplicates, expired, self-posts, blocked authors, muted keywords, etc.
Select after scoring: Sort by final score and select Top K candidates
Final selection filter: Visibility filtering (deleted, spam, violent content, etc.)

Key Design Decisions

An important design decision in the system architecture is: Phoenix’s attention mechanism replaces all manual feature engineering. This means:

User interests no longer need to be manually labeled, but are automatically learned through the attention mechanism of behavioral sequences.
Advertising safety signals no longer require manual rules, but are learned by Grox classifiers
Author diversity no longer requires manual attenuation rules, but is automatically adjusted by Transformer’s cross-author attention

Opposite views and trade-offs

Price of Phoenix Transformer

Phoenix is a small MoE (Mixture of Experts) architecture - 256-dimensional embedding, 4 attention heads, 2-layer Transformer - but this does not mean lightweight. Phoenix’s inference latency is significantly higher compared to linear models or shallow neural networks for traditional recommender systems. Each referral request requires:

Get candidate posts from multiple candidate sources
Perform Phoenix Transformer inference on each candidate post
Calculate weighted combinations of multiple interaction probabilities

Risks of Ad Mixing

The home mixer/ads module takes care of ad injection and targeting while respecting sensitive content boundaries. But the mix of ads and organic content brings new challenges:

The P(click) and P(like) weights of advertisements need to be carefully controlled
Brand safety tracking adds additional inference overhead
Users may react negatively to experiences where ads are mixed with organic content

Thunder vs Phoenix Balance

Thunder is responsible for near-line recall, and Phoenix is responsible for offline recall. The advantages of Thunder are:

Lower data latency (nearline processing)
Training data is updated more immediately
Model inference is faster

The advantages of Phoenix are:

Ability to discover content that users are not paying attention to
Based on vector similarity rather than social relationships
Stronger generalization ability

Deployment and operation boundaries

For developers who want to conduct recommendation system research based on this open source code, they need to pay attention to the following deployment boundaries:

Pre-trained model weights: Phoenix provides a 3 GB pre-trained mini model (256-dimensional embedding, 4 attention heads, 2-layer Transformer), which means developers do not need to train the model themselves to start inference. But this also means that the model has limited capacity and is suitable for research rather than production-level deployment.
End-to-end inference pipeline: phoenix/run_pipeline.py replaces the past run_ranker.py and run_retrieval.py, indicating that Phoenix is now a unified entry point for inference. But at the same time, the Glox content understanding pipeline (spam detection, post classification, PTOS policy enforcement) requires additional GPU resources.
Data retention and privacy: User behavior sequences, IP addresses, watch lists and other data need to be stored in compliance with regulations. Although Phoenix’s attention mechanism replaces manual feature engineering, it still requires processing a large amount of personalized data.
Advertising Safety Boundary: The PTOS (Potential Toxic Online Speech) policy implementation of the Glox module needs to be continuously updated to prevent harmful content from being spread through the recommendation system.

Common anti-patterns

Use Phoenix for production-level recommendations: Phoenix is a mini-model (256-dimensional embedding, 4 attention heads, 2-layer Transformer) that is suitable for research but cannot handle production-level traffic. Production environments require larger capacity models and distributed inference.
Ignore Thunder’s contribution: Although Phoenix’s offline recall is powerful, Thunder’s near-line recall is still irreplaceable in latency-sensitive scenarios. The balance between the two is key.
Manually adjust ad weighting: Phoenix’s attention mechanism should automatically learn the weight balance between ads and organic content. Manual tuning breaks Transformer’s end-to-end optimization.
Ignore data privacy compliance: Phoenix requires a large amount of user behavior data to train the attention mechanism, which involves regulatory compliance issues such as GDPR and CCPA.
Treat Grolox as a black box: Glox’s PTOS policy enforcement requires continuous updates and manual review, and it cannot be assumed that the model will automatically handle all edge cases.

Conclusion

X’s Phoenix/Grok Transformer architecture represents the transition of recommendation systems from the era of feature engineering to the era of end-to-end deep learning. This is more than just a technology evolution—it reveals how AI is redefining the boundaries of recommender systems: from manual feature engineering to automatic learning of attention mechanisms. For developers, this open source project provides a valuable resource for studying recommendation system architecture, but it also reminds us that although Transformer is powerful, computational cost and data privacy issues are still deployment boundaries.

Sources

xai-org/x-algorithm GitHub Repository - X (Twitter) For You feed The complete open source code of the recommendation system, including Phoenix/Grok Transformer architecture
Elon Musk’s May 15, 2026 Announcement - Musk announced an update to the X algorithm and promised to release the latest algorithm version every month
Grok-1 Open Source Release - Original implementation source of Phoenix Transformer