Grab built a custom ~1B Vision LLM to improve eKYC document processing for Southeast Asian languages and documents.
- •Traditional OCR failed on SEA language diversity; proprietary LLMs had hallucinations and high latency
- •Qwen2-VL 2B was chosen as base model for its size, dynamic resolution, and SEA language tokenizer support
- •LoRA fine-tuning worked for Latin-script documents but failed on Thai and Vietnamese non-Latin scripts
- •Two-stage full-parameter fine-tuning improved Thai accuracy by +70pp and Vietnamese by +40pp over baseline
- •A custom ~1B model (Qwen2-VL 2B encoder + Qwen2.5 0.5B decoder) matched 2B accuracy with lower and more consistent latency than external APIs