$500 GPU outperforms Claude Sonnet on coding benchmarks

2026-03-26

1 min read

by yogthos

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

A.T.L.A.S (Adaptive Test-time Learning and Autonomous Specialization) achieves 74.6% LiveCodeBench pass@1 with a frozen 14B model on a single $500 consumer GPU, outperforming Claude 4.5 Sonnet (71.4%) at ~$0.004/task vs ~$0.066/task.

•Uses a three-phase pipeline: PlanSearch + BudgetForcing for diverse candidate generation, Geometric Lens energy scoring for candidate selection, and PR-CoT self-verified iterative repair
•Runs a frozen Qwen3-14B-Q4_K_M model on an RTX 5060 Ti 16GB via a patched llama-server on K3s with speculative decoding (~100 tok/s) and 5120-dim self-embeddings
•Phase 3 PR-CoT repair rescued 36/42 failed tasks (85.7%) using model-generated test cases, contributing +7.3pp to the final score
•Cost is local electricity only (~$0.004/task at $0.12/kWh), with no API calls, no data leaving the machine, and no fine-tuning required

$500 GPU outperforms Claude Sonnet on coding benchmarks

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Introducing Gemma 4 on Google Cloud: Our most capable open models yet

Activating Your Data Layer for Production-Ready AI

How Addepar Scales Investment Workflows with Databricks AI Agents

Insights from Shoptalk 2026: How agents are changing retail