|Machine Learning

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

2026-02-13

5 min read

by Pinterest Engineering

Tags:

engineering

monetization

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Pinterest introduced a GPU-served two-tower model using MMOE-DCN architecture for lightweight ads engagement prediction.

•The two-tower design separates Pin (ad) embeddings via offline batch updates and user embeddings via real-time inference, scoring via dot product sigmoid
•Architecture shifted from Multi-Task Multi-Domain (MTMD) to MMOE with MLP gating and full-rank/low-rank DCN layers per expert
•GPU serving enabled the more complex model while maintaining latency comparable to the CPU baseline
•Training optimizations included GPU prefetch, fused kernels, BF16 precision, larger batch sizes, and tuned worker threads
•Achieved 5–10% offline loss reduction for CTR prediction; separating standard and shopping ad training doubled iteration speed and reduced loss further 5–10%

Related Articles