QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard | Endigest
Hugging Face
|AIGet the latest tech trends every morning
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
QIMMA is a quality-validated leaderboard for evaluating Arabic language models, addressing systematic quality issues in existing benchmarks.
- •Consolidates 109 subsets from 14 benchmarks into 52,000+ samples across 7 domains (Cultural, STEM, Legal, Medical, Safety, Poetry, Coding)
- •Uses two-stage validation: automated assessment by Qwen3-235B and DeepSeek-V3, followed by human review by native Arabic speakers
- •Discovered systematic quality issues including false answers, text corruption, cultural sensitivity problems, and gold answer misalignment
- •Contains 99% native Arabic content with first Arabic code evaluation using adapted HumanEval+ and MBPP+
- •Evaluated 46 models with Jais-2-70B-Chat achieving top score of 65.81
This summary was automatically generated by AI based on the original article and may not be fully accurate.