Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
This post explains how Estée Lauder Companies used Cloud Run worker pools to build a scalable, pull-based infrastructure for their consumer-facing AI applications.
•Estée Lauder migrated their Rostrum LLM platform to a producer-consumer model using Cloud Run worker pools and Cloud Pub/Sub to handle holiday-scale traffic for Jo Malone London's AI Scent Advisor.
•The web tier (FastAPI on Cloud Run Service) publishes user messages instantly to Pub/Sub, while worker pool instances act as always-on consumers handling LLM inference.
•The decoupled architecture delivered 100% message durability, strong UI latency SLAs, and minimal operations overhead without server management.
•Cloud Run worker pools support non-HTTP protocols via L4 ingress, NVIDIA L4 and RTX PRO 6000 GPUs, and are approximately 40% cheaper than request-driven services for long-running tasks.
•CREMA (Cloud Run External Metrics Autoscaler), an open-source tool built on KEDA scalers, enables automatic scaling of worker po
This summary was automatically generated by AI based on the original article and may not be fully accurate.