Small models, high quality: Inside BMW Group’s experiments evaluating domain-specific language models

2026-03-04

13 min read

by Dr. Michael Menzel

Tags:

AI & Machine Learning

Customers

Research

Manufacturing

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

BMW Group and Google Cloud built an automated workflow for fine-tuning, optimizing, and evaluating small language models (SLMs) for in-vehicle deployment.

•Cloud-based LLMs are impractical for in-vehicle use due to network dependency, making on-device SLMs a better fit for automotive edge deployment
•Compression techniques explored include quantization (32-bit to 4/8-bit), pruning, and knowledge distillation to reduce model size
•Post-compression quality is recovered via LoRA fine-tuning and reinforcement learning methods such as PPO, DPO, and GRPO
•Model quality is evaluated using point-wise metrics (ROUGE, BLEU) and pair-wise methods (LLM-as-a-judge or human feedback)
•The automated pipeline is built on Vertex AI Pipelines, enabling reproducible experimentation across the full configuration space with versioned artifacts

Small models, high quality: Inside BMW Group’s experiments evaluating domain-specific language models

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Urban Outfitters achieves major cost savings by moving Sterling OMS to AlloyDB for PostgreSQL

Benchmark and optimize LLMs on-device with AI Edge Portal

Agent Sandbox on GKE is now available for everyone, and a first look at Agent Substrate

Introducing Agent Executor, Google’s distributed Agent Runtime