How we built a custom vision LLM to improve document processing at Grab

2025-11-04

10 min read

Tags:

engineering

performance

Engineering

Data

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Grab built a custom ~1B Vision LLM to improve eKYC document processing for Southeast Asian languages and documents.

•Traditional OCR failed on SEA language diversity; proprietary LLMs had hallucinations and high latency
•Qwen2-VL 2B was chosen as base model for its size, dynamic resolution, and SEA language tokenizer support
•LoRA fine-tuning worked for Latin-script documents but failed on Thai and Vietnamese non-Latin scripts
•Two-stage full-parameter fine-tuning improved Thai accuracy by +70pp and Vietnamese by +40pp over baseline
•A custom ~1B model (Qwen2-VL 2B encoder + Qwen2.5 0.5B decoder) matched 2B accuracy with lower and more consistent latency than external APIs

This summary was automatically generated by AI based on the original article and may not be fully accurate.

Related Articles