Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

2026-03-31

12 min read

Tags:

ML Applications

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

This article introduces Meta's Adaptive Ranking Model for serving LLM-scale ad recommendations at sub-second latency.

•Addresses the inference trilemma: balancing model complexity, latency, and cost at billion-user scale
•Request-Oriented Optimization computes user signals once per request, making scaling costs sub-linear
•Wukong Turbo architecture mitigates numeric instability via No-Bias approach and small parameter delegation
•Selective FP8 post-training quantization improves hardware throughput with negligible quality loss
•Multi-card GPU infrastructure enables O(1T) parameter scaling; launched on Instagram in Q4 2025 with +3% ad conversions and +5% CTR

This summary was automatically generated by AI based on the original article and may not be fully accurate.

Related Articles