How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads

2026-04-09

6 min read

by Sagar Randive

Tags:

Cloud Run

AI & Machine Learning

Serverless

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

This post explains how Estée Lauder Companies used Cloud Run worker pools to build a scalable, pull-based infrastructure for their consumer-facing AI applications.

•Estée Lauder migrated their Rostrum LLM platform to a producer-consumer model using Cloud Run worker pools and Cloud Pub/Sub to handle holiday-scale traffic for Jo Malone London's AI Scent Advisor.
•The web tier (FastAPI on Cloud Run Service) publishes user messages instantly to Pub/Sub, while worker pool instances act as always-on consumers handling LLM inference.
•The decoupled architecture delivered 100% message durability, strong UI latency SLAs, and minimal operations overhead without server management.
•Cloud Run worker pools support non-HTTP protocols via L4 ingress, NVIDIA L4 and RTX PRO 6000 GPUs, and are approximately 40% cheaper than request-driven services for long-running tasks.

How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

High-Throughput Graph Abstraction at Netflix: Part I

Developer's guide to Gemini Enterprise and A2UI integration

Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

From Silos to Service Topology: Why Netflix Built a Real-Time Service Map