Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
Nemotron OCR v2 is a multilingual OCR model trained on 12.2 million synthetic images generated by combining mOSCAR text corpus with modified SynthDoG renderer. This approach overcomes Nemotron OCR v1's limitations with non-English text.
•Synthetic data generation provides pixel-precise multi-level annotations (word, line, paragraph) with reading order graphs, avoiding expensive manual annotation.
•The pipeline generates diverse layouts including multi-column text, tables, vertical text for CJK, and slides with 165-1,258 open-source fonts per language.
•Accuracy improved significantly: NED scores from 0.56-0.92 down to 0.035-0.069 on non-English languages, with 34.7 pages/second inference speed on A100 GPU.
•The approach is language-agnostic: adding new languages requires only source text and fonts without architecture modifications.
This summary was automatically generated by AI based on the original article and may not be fully accurate.