|Machine Learning

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

2026-02-02

10 min read

by Pinterest Engineering

Tags:

engineering

monetization

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

Spotify's ads team describes how they re-architected their serving stack to replace the Two-Tower model with more expressive neural networks capable of deep feature interactions.

•Two-Tower models are efficient but cannot leverage interaction features, target attention, or early feature crossing between user and item representations
•High-value O(1M) candidates have features embedded directly as PyTorch registered buffers in the model file, eliminating network I/O and host-to-GPU transfer overhead
•Business logic (utility calculation, diversity rules, top-k selection) was moved inside the PyTorch model to reduce GPU-to-CPU data transfer from O(100K) to O(1K) documents
•GPU inference latency was reduced from 4000ms p90 to 20ms via multi-stream CUDA, worker-to-core alignment, Triton kernel fusion, and BF16 precision

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Slack AI: The Path to Multi-Cloud

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use