Honk Is Not Magic. It’s 15 Years of Infrastructure With the Context Stripped Out.

Spotify told investors that 99% of engineers use AI weekly. Their own engineering blog tells a different story about what that means.

May 27, 2026

I’m allergic to bullshit when it comes to this stuff. When executives spend months bouncing between stages telling everyone AI already does everything for them, it gets under my skin. Not because they’re lying, necessarily. Because the context disappears somewhere between the engineering blog and the keynote, and the audience fills the gap with their own hopes. A few weeks ago, when Claude’s codebase leaked publicly, I looked at what was actually inside and broke down the “AI writes all the code” narrative. Today I’ll explain why it possibly works for Spotify, and why it’s not that simple.

Yesterday, Anthropic published a recap of their Code with Claude London conference, calling it “rethinking how we build,” with Spotify as the first named customer. Spotify didn’t rethink how they build. They spent 15 years building Backstage, Fleet Management, and a Java BOM with 96% adoption, then plugged Claude Code into a system that was already automating half their PRs. That’s not rethinking. That’s a better interface to something that already worked.

But the marketing loop is now recursive. Spotify tells the Honk story on Anthropic’s stage. Anthropic writes a blog about Spotify telling the story. Investor Day picks it up, the numbers go up each time, and here's the timeline.

On May 21, 2026, Spotify held its Investor Day in New York. Co-CEO Gustav Söderström and VP of Engineering Niklas Gustavsson told investors that 99% of Spotify engineers now use AI weekly, 73% of code contributions are AI-assisted, and Honk, their internal coding agent, is now part of a broader story about the Large Taste Model and personalized monetization. Two days earlier, Gustavsson had given the same talk at Code with Claude in London, Anthropic’s developer conference. The number there was 96%. It went up before the slides changed.

In February, Söderström told analysts his best engineers haven’t written a single line of code since December. Two days later, Anthropic closed a $30 billion funding round. In March, the two companies shared a stage in London. In April, Spotify launched as a Claude connector. On May 19, Gustavsson presented at Code with Claude London with 96%. On May 21, Investor Day in New York with 99%.

Five moments in four months, the audience rotates, the numbers go up, and the engineering blog stays the same.

The Wrong Metric

I don’t write code either. AI does it for me. And I work more than I ever did when I wrote every line myself. The code was never the hard part. The hard part is the review documentation nobody wants to write, the architecture decision that turns out wrong six months later, the junior who needs fifteen minutes of your time at exactly the wrong moment. AI took the typing and gave back a review queue that never ends, constant context switching, and a steady stream of “does this diff actually do what I asked” that I didn’t have before.

And “99% use AI weekly” means what exactly? Opened Copilot once this quarter? Used Claude to generate a regex? Ran a Honk migration on a fleet of repos? The metric has no definition, which means it has no meaning. “73% of code contributions are AI-assisted” is equally hollow without knowing what counts as a contribution. Config changes, dependency bumps, and feature flag flips are all contributions.

Nobody on that call asked about revert rates. Nobody asked if defects went up. “50+ features in 2025” from a company with over seven thousand employees. Honk handles migrations and dependency updates, not feature development. Their own blog is clear about this. But the stage narrative packages it as a product velocity story, and nobody makes the distinction.

The bottleneck at Spotify was never typing speed.

What Honk Actually Does

Honk is Spotify’s internal background coding agent built on Claude Code. Anthropic’s Boris Cherny is quoted directly inside Spotify’s own engineering blog as an endorsement, and Anthropic’s Applied AI team worked on the integration. The three-part blog series by Max Charas and Marc Bruggmann (November-December 2025) is the most detailed public source.

An engineer writes a prompt through Slack or a version-controlled file in Git. Honk runs Claude Code in a sandboxed Kubernetes Job. Three tools: verify, Git, Bash allowlist. Ten turns, three retries, then a PR.

That’s it. A thin wrapper around Claude Code, plugged into an automation pipeline that existed years before AI.

As of November 2025, Honk had merged 1,500+ PRs total. Anthropic’s customer page reports 650+ monthly PRs. Fleet Management, the system Honk sits on top of, processed 652,000 automated PRs in 2024 per Splunk’s recap of Spotify’s PlatEngDay data. Honk adds a useful layer to an already massive automation system. But from the stage narrative, you’d think Honk is the system.

The blog is candid about limitations. No code search or documentation tools are exposed to the agent. Verifiers only run on Linux x86, with macOS and iOS planned for the future. The team admits they’re “still flying mostly by intuition” on prompt engineering, with no structured evals. The LLM judge that validated output vetoed about 25% of sessions, and by QCon London in March 2026 they’d removed it entirely as models improved.

Compare that to Söderström telling analysts about an engineer fixing iOS bugs from his commute and merging to production before arriving at the office. The blog says iOS verifiers don’t exist yet. One of these is the engineering reality. The other is the earnings call.

What Every Headline Missed

Every story about “Spotify’s engineers don’t code” stops before the interesting part.

Backstage, created internally and open-sourced in 2020, is Spotify’s internal developer portal with 3,400+ adopters worldwide. Internally, it catalogs thousands of software components across hundreds of squads. Every component has an owner. Not a team, a person. With a dependency graph, docs, and a certification score attached. Or as Spotify puts it: “you can’t safely automate what you don’t understand.”

Fleet Management, described in Spotify’s 2023 blog series, runs Docker-based code transformations as Kubernetes Jobs across thousands of repos. Before AI, this system already handled half of PRs at Spotify. The bot-to-human contribution ratio reached 3:1, with over 1.8 million automated contributions total per the same Splunk data.

Before Claude, when Log4j hit in December 2021, Fleet Management patched 80% of production backend in 9 hours. Framework rollouts went from 200 days to under 7.

Golden Paths and Soundcheck handle the other end: new services come in pre-standardized, existing ones get continuously checked. As of their 2023 blog series, the Java Bill of Materials had 96% adoption across the fleet. That’s why an AI agent can produce a mergeable PR. Not because it’s smart, but because the codebase is predictable.

What Honk replaced was not human engineering. It replaced a 20,000-line script for Maven dependency updates with a natural-language prompt. The pipeline around it is identical. Targeting, opening, review, deploy: none of that changed. The “revolution” is a better transformation definition format. Everything else was already automated.

Why This Doesn’t Transfer

Read the headlines about Spotify and Claude and the pitch is obvious: buy Claude Code, point it at your codebase, watch productivity double. Most teams that try will bounce off their own mess long before they see anything like that.

Spotify can automate at this scale for a boring reason: they have processes people actually follow. Not documented processes, followed processes. Spotify got near-universal adoption of their standards, and that’s not just an engineering achievement, it’s a cultural one. A Swedish company where, apparently, you can get 96% of engineers to follow a standard voluntarily. Most companies can’t get that number with a mandate from above.

I see this from the inside. Enterprise clients come in and say they want AI. You start digging, and there are no processes. Half the knowledge lives in somebody’s head, and that person is the only one who knows how any of it works. No component catalog. No ownership graph. No standardized builds. There’s a Confluence page from 2021 that nobody updates, three CI systems (two deprecated but still running), and a README whose last commit message is “initial commit” from two years ago.

Spotify has 15 years of institutional documentation rendered through TechDocs with 5,000+ documentation sites. The AI came last, not as the foundation, but as a better interface to something that already worked.

Without that substrate, you get exactly the failure mode Spotify’s own engineers documented. Early Honk agents took shortcuts to make builds pass: commenting out failing tests, downgrading Java versions. The same QCon talk described this directly.

The Questions That Matter

If you’ve ever sat through a 3 a.m. incident, you already know software engineering was never about writing code. The framing around Spotify’s AI adoption creates the same misunderstanding that vibe-coding courses create for juniors: that the value of an engineer is measured in lines of code, and if AI writes lines faster, the engineer is either 10x more productive or obsolete. Both conclusions share the same flawed premise.

What Spotify actually demonstrated is narrower than the headlines. Spend 15 years on platform engineering, standards, and a fleet-wide automation system that already handles half your PRs. Then swap the transformation definition layer for an LLM prompt and cut 60-90% off bounded migration work. A real achievement. The kind that doesn’t travel well.

So instead we got five moments in four months, the same two executives, numbers that go up every time the audience changes, and no defect rates, revert rates, or customer satisfaction data to back any of it up.

I’m not here to hate on Spotify; if they genuinely made large-scale migrations faster on top of solid infrastructure, that’s a real engineering win. What I’m not fine with is the context getting lost. Somebody climbed Everest with a guide and a decade of training. The audience is buying boots.

Spotify spent a decade understanding their codebase before they touched an LLM. That decade is the only reason any of this works. LLM amplifies what you already have.

From the Trenches

Discussion about this post

Ready for more?