Spotify Engineering logoSpotify Engineering
|AI

Background Coding Agents: Predictable Results Through Strong Feedback Loops (Honk, Part 3)

2025-12-09
1 min read
0
by Spotify Engineering

Endigest AI Core Summary

Spotify shares how they designed reliable background coding agents ("Honk") using strong verification loops to minimize incorrect or broken pull requests at scale.

  • Three failure modes are identified: agent fails to produce a PR, produces a PR that fails CI, or produces a PR that passes CI but is functionally incorrect
  • Verification loops use independent verifiers (e.g., a Maven verifier triggered by pom.xml) exposed to the agent via MCP tools, abstracting away build system complexity
  • Verifiers run before any PR is opened, using a Claude Code stop hook to block submission if any check fails
  • An LLM-as-a-judge layer evaluates the diff against the original prompt to catch overly ambitious changes; it vetoes ~25% of sessions, with the agent self-correcting half the time
  • The agent runs in a sandboxed container with minimal permissions, and complex tasks like pushing code or Slack interaction are handled by surrounding infrastructure
Tags:
#AI
#Developer Experience
#Platform
#Developer Tools