The retrieval stage is the first phase of the Phoenix recommendation pipeline. It efficiently narrows down millions of potential candidates to hundreds of relevant items using a two-tower architecture that enables fast similarity search.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/xai-org/x-algorithm/llms.txt
Use this file to discover all available pages before exploring further.
Architecture Overview
- User Tower: Encodes user features and engagement history
- Candidate Tower: Encodes candidate item features
- Similarity Search: Retrieves top-K candidates using dot product
How It Works
User Representation
The user tower processes:
- User identifiers (via hash embeddings)
- Recent engagement history (posts, authors, actions)
- Product surface context
[B, D]Candidate Representation
The candidate tower projects post and author embeddings through a 2-layer MLP:
- Layer 1: Projects to
2*Ddimensions with SiLU activation - Layer 2: Projects to
Ddimensions - L2 normalization produces the final candidate embedding
User Tower Implementation
The user tower leverages the same transformer architecture used in ranking:The user tower uses average pooling over the transformer outputs, weighted by the padding mask. This creates a single vector representation that captures the full user context.
Candidate Tower Implementation
The candidate tower is a simpler 2-layer MLP that projects combined post and author embeddings:Similarity Search
Once both towers produce normalized embeddings, retrieval becomes a simple dot product:Why L2 normalization?Normalizing embeddings to unit length converts cosine similarity into a simple dot product. This enables the use of highly optimized approximate nearest neighbor (ANN) libraries like FAISS or ScaNN for efficient retrieval at scale.
Key Design Decisions
Shared Transformer Architecture
The user tower uses the same transformer architecture as the Phoenix ranking model. This provides several benefits:- Consistent representations across retrieval and ranking
- Transfer learning from ranking to retrieval
- Simplified infrastructure with shared model code
Asymmetric Tower Complexity
User Tower
Heavy: Full transformerComputed once per user request and cached
Candidate Tower
Light: 2-layer MLPPre-computed for all items offline
- The user tower can be expensive because it runs once per request
- The candidate tower must be lightweight because it runs on millions of items
Performance Considerations
Embedding Caching
Embedding Caching
Candidate embeddings are pre-computed offline and stored in a vector database. Only the user tower runs at inference time.
ANN Index
ANN Index
In production, exact top-k search is replaced with approximate nearest neighbor algorithms (e.g., FAISS, ScaNN) that provide sub-linear search complexity.
Batch Processing
Batch Processing
Both towers support batched computation for efficient training and offline candidate encoding.
Model Configuration
recsys_retrieval_model.py:103-121
Next Steps
Ranking Model
Learn how retrieved candidates are ranked
Architecture Details
Explore transformer implementation details