Endigest

About
Privacy
Terms
Contact
RSS

Hugging Face Articles | Endigest

Hugging Face Articles

Hugging Face

41 min read

Machine Learning•2026-05-14

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM released Granite Embedding Multilingual R2, two new multilingual embedding models balancing model size with retrieval quality.

Hugging Face

11 min read

Backend•2026-05-14

Unlocking asynchronicity in continuous batching

This article explains how to achieve asynchronous GPU computation in continuous batching to eliminate CPU-GPU idle time and maximize inference throughput.

Hugging Face

31 min read

Machine Learning•2026-05-11

Building Blocks for Foundation Model Training and Inference on AWS

This article explains the infrastructure building blocks on AWS for training and inferencing foundation models at scale.

Hugging Face

01 min read

AI•2026-05-10

MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X

MachinaCheck is a multi-agent AI system that automatically assesses CNC manufacturability by parsing STEP files and running domain-specific LLM agents on an AMD MI300X.

Hugging Face

31 min read

AI•2026-05-08

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

CyberSecQwen-4B is a 4B specialized model for local, private cybersecurity analysis, built to overcome limitations of frontier models in security operations.

Hugging Face

31 min read

Machine Learning•2026-05-08

EMO: Pretraining mixture of experts for emergent modularity

EMO is a mixture-of-experts model trained to develop modular expert groups that can be selectively used for specific tasks.

Hugging Face

21 min read

Machine Learning•2026-05-08

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

This article demonstrates LoRA fine-tuning of Qwen3-1.7B on MedMCQA using AMD MI300X with ROCm, enabling clinical question-answering without CUDA.

Hugging Face

31 min read

Machine Learning•2026-05-06

vLLM V0 to V1: Correctness Before Corrections in RL

This article describes fixing train-inference mismatch when migrating PipelineRL from vLLM V0 to V1 in reinforcement learning.

Hugging Face

11 min read

AI•2026-05-06

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

The Open ASR Leaderboard adds private datasets to prevent benchmaxxing and improve model evaluation robustness.

Hugging Face

151 min read

Machine Learning•2026-04-29

AI evals are becoming the new compute bottleneck

AI evaluation has become a critical cost bottleneck that determines who can conduct evaluations, with the Holistic Agent Leaderboard spending $40,000 for 21,730 agent rollouts and individual GAIA runs costing $2,829.

Hugging Face

51 min read

AI•2026-04-29

Granite 4.1 LLMs: How They’re Built

Granite 4.1 comprises dense, decoder-only LLMs in three sizes (3B, 8B, 30B) trained on 15 trillion tokens with multi-stage optimization techniques.

Hugging Face

11 min read

AI•2026-04-29

DeepInfra on Hugging Face Inference Providers 🔥

DeepInfra is now an Inference Provider on Hugging Face Hub, delivering cost-effective serverless AI inference.