Protecting against token theft

2026-05-29

6 min read

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

This article discusses the threat of inference theft on AI endpoints and presents a real attack case from Vercel, along with defensive strategies.

•Inference theft exploits the high cost difference between HTTP requests and AI inference, with attackers using residential proxies and OpenAI/Anthropic-compatible adapters to resell stolen tokens
•Traditional IP rate limits and per-session authentication fail because attackers can amortize bypass costs across thousands of stolen calls using fleet-wide proxies
•AI inference costs orders of magnitude more than HTTP requests, making 5-10% resale margin highly profitable even after factoring in adapter development costs
•Defense requires per-request verification rather than per-session gates, using solutions like Vercel's BotID with invisible CAPTCHA and client-side machine learning

Related Articles