Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
H Company releases Holotron-12B, a multimodal computer-use agent model post-trained from NVIDIA's Nemotron-Nano-2 VL, optimized for high-throughput production inference.
•Built on a hybrid State-Space Model (SSM) and attention architecture, avoiding the quadratic cost of full attention and reducing memory footprint to a constant state per layer
•Achieved over 2x higher throughput than Holo2-8B on a single H100 GPU using vLLM v0.14.1, reaching 8.9k tokens/s at concurrency of 100
•Trained in two stages: supervised fine-tuning on proprietary localization and navigation data (~14B tokens), focusing on screen understanding, grounding, and UI interactions
•WebVoyager benchmark score improved from 35.1% (base Nemotron) to 80.5%, surpassing Holo2-8B
•
Also shows improvements on localization benchmarks including OS-World-G, GroundUI, and WebClick
This summary was automatically generated by AI based on the original article and may not be fully accurate.