15 articles
This post describes how Lyft built a Bayesian hierarchical tree model to predict rider conversion in real-time under sparse data conditions.
Airbnb explains how COVID broke their booking-to-trip forecasting models and the architectural changes they built to handle structural data shifts.
This post details Netflix's multi-step CPU optimization journey for video serendipity scoring in the Ranker recommendation service using JDK's Vector API.
Pinterest investigates the online–offline discrepancy in L1 CVR models in their ads funnel.
Airbnb recaps its 2025 academic research at KDD, CIKM, and EMNLP covering ML, NLP, and recommendation systems.
Netflix introduces MediaFM, an in-house tri-modal (audio, video, text) foundation model for deep media content understanding at scale.
Lyft rebuilt its translation pipeline by integrating LLMs to reduce translation latency from days to minutes while maintaining linguist oversight.
This article describes the architecture, optimization, and evolution of Lyft's Feature Store, a core ML infrastructure platform serving 60+ use cases across the rideshare stack.
Pinterest Search presents a methodology for scaling search relevance assessment using fine-tuned LLMs to replace costly human annotation.
Pinterest describes how Pinner (user) surveys are used to train a machine learning model that improves content quality recommendations across Homefeed, Related Pins, and Search.
This post describes how Lyft evolved LyftLearn, their end-to-end ML platform, from a fully Kubernetes-based system to a hybrid architecture combining AWS SageMaker and Kubernetes.
This post describes a Lyft data scientist's starter project using the Rider Experience Score (RES) tool to estimate long-term causal effects of rider experiences on retention without relying on A/B tests.
Two Lyft Data Scientists share their intern-to-full-time journeys, highlighting impactful data science projects in EV adoption and driver loyalty.
PayPal shares how they reduced Apache Spark job cloud costs by up to 70% by migrating from CPU-based Spark 2 to GPU-accelerated Spark 3 using NVIDIA's Spark RAPIDS.
PayPal describes their declarative (config-based) feature engineering paradigm used for real-time fraud detection ML across 400 million users.