Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
This article discusses the threat of inference theft on AI endpoints and presents a real attack case from Vercel, along with defensive strategies.
•Inference theft exploits the high cost difference between HTTP requests and AI inference, with attackers using residential proxies and OpenAI/Anthropic-compatible adapters to resell stolen tokens
•Traditional IP rate limits and per-session authentication fail because attackers can amortize bypass costs across thousands of stolen calls using fleet-wide proxies
•AI inference costs orders of magnitude more than HTTP requests, making 5-10% resale margin highly profitable even after factoring in adapter development costs
•Defense requires per-request verification rather than per-session gates, using solutions like Vercel's BotID with invisible CAPTCHA and client-side machine learning
•April 2026 attack on Vercel's docs chat spiked to 1,300 requests per minute, but BotID blocked over 10,000 bot requests within minutes
This summary was automatically generated by AI based on the original article and may not be fully accurate.