GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction
2026-02-13
5 min read
0
by Pinterest Engineering
Endigest AI Core Summary
Pinterest introduced a GPU-served two-tower model using MMOE-DCN architecture for lightweight ads engagement prediction.
- •The two-tower design separates Pin (ad) embeddings via offline batch updates and user embeddings via real-time inference, scoring via dot product sigmoid
- •Architecture shifted from Multi-Task Multi-Domain (MTMD) to MMOE with MLP gating and full-rank/low-rank DCN layers per expert
- •GPU serving enabled the more complex model while maintaining latency comparable to the CPU baseline
- •Training optimizations included GPU prefetch, fused kernels, BF16 precision, larger batch sizes, and tuned worker threads
- •Achieved 5–10% offline loss reduction for CTR prediction; separating standard and shopping ad training doubled iteration speed and reduced loss further 5–10%
Tags:
#engineering
#pinterest
#monetization
