Pinterest Engineering Blog - Medium logoPinterest Engineering Blog - Medium
|Machine Learning

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

2026-02-13
5 min read
0
by Pinterest Engineering

Endigest AI Core Summary

Pinterest introduced a GPU-served two-tower model using MMOE-DCN architecture for lightweight ads engagement prediction.

  • The two-tower design separates Pin (ad) embeddings via offline batch updates and user embeddings via real-time inference, scoring via dot product sigmoid
  • Architecture shifted from Multi-Task Multi-Domain (MTMD) to MMOE with MLP gating and full-rank/low-rank DCN layers per expert
  • GPU serving enabled the more complex model while maintaining latency comparable to the CPU baseline
  • Training optimizations included GPU prefetch, fused kernels, BF16 precision, larger batch sizes, and tuned worker threads
  • Achieved 5–10% offline loss reduction for CTR prediction; separating standard and shopping ad training doubled iteration speed and reduced loss further 5–10%
Tags:
#engineering
#pinterest
#monetization