Pinterest Engineering Blog - Medium logoPinterest Engineering Blog - Medium
|Machine Learning

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

2026-02-02
10 min read
0
by Pinterest Engineering

Endigest AI Core Summary

Spotify's ads team describes how they re-architected their serving stack to replace the Two-Tower model with more expressive neural networks capable of deep feature interactions.

  • Two-Tower models are efficient but cannot leverage interaction features, target attention, or early feature crossing between user and item representations
  • High-value O(1M) candidates have features embedded directly as PyTorch registered buffers in the model file, eliminating network I/O and host-to-GPU transfer overhead
  • Business logic (utility calculation, diversity rules, top-k selection) was moved inside the PyTorch model to reduce GPU-to-CPU data transfer from O(100K) to O(1K) documents
  • GPU inference latency was reduced from 4000ms p90 to 20ms via multi-stream CUDA, worker-to-core alignment, Triton kernel fusion, and BF16 precision
  • Retrieval data flow was restructured to return only IDs and Bids in a column-wise format first, deferring heavy metadata fetch to after ranking reduces candidate set
Tags:
#engineering
#pinterest
#monetization