Trustpilot built a real-time pipeline processing millions of reviews using fine-tuned Gemma-2-9b models for data enrichment.
- •Fine-tuned google/gemma-2-9b with teacher consensus from Gemini family for NER, sentiment analysis, and topic classification tasks
- •Decoupled FastAPI classifier and vLLM-based LLM endpoints on Dataflow enabling independent scaling and performance optimization
- •Optimized vLLM with prefix caching and A100 GPUs to handle millions of reviews daily in near real-time
- •Achieved Gemini-like accuracy at lower cost with full model ownership and predictable infrastructure expenses
- •Overcame challenges with private networking, deployment observability, and GPU scarcity in EU
This summary was automatically generated by AI based on the original article and may not be fully accurate.