From the Trenches

AI Is a Mirror of Our Engineering Culture

Denis Stetskov — Tue, 05 May 2026 14:02:53 GMT

Most engineers in our industry are average or below average. That’s how averages work.

We trained the most powerful code-generation tools on their own output.

GitHub hosts over 518 million projects. The vast majority: personal, inactive, abandoned. Studies find that most repos are student projects, prototypes, 3 AM deadline code, unreviewed Stack Overflow pastes. Elite open-source projects like Linux and PostgreSQL match or beat proprietary code quality (Coverity Scan data, 2014). But they’re a vanishing fraction. The other 517 million projects drown them out.

The best enterprise code sits behind firewalls. Stripe’s payment processing, Netflix’s recommendation engine, Spotify’s audio streaming. None of it is in the training data.

When AI generates code, it reproduces the most probable pattern. RLHF shifts the output, but the training distribution anchors what “probable” means. Across 518 million projects, that’s mediocre code.

AI didn’t create our quality crisis. It held up a mirror.

Subscribe now

The Training Data Nobody Audited

In January 2025, researchers published Cracks in The Stack, analyzing The Stack v2, a primary training dataset for code models. Bugs, security vulnerabilities, and license violations that propagate directly into generated code. Standard curation methods proved ineffective at removing them.

The fixes existed. They were committed to the same repositories. They just weren’t applied to the training data. StarCoder-family models were trained on known-broken code when the fixed version sat in the same commit history. Other models use proprietary datasets with unknown curation, but the underlying source material is largely the same public code.

StarCoder’s own documentation states that generated code “can be inefficient, contain bugs or exploits.” The entire industry ships tools it knows produce broken code and buries the admission in a readme.

The Feedback Loop That Should Terrify You

AI-generated code is entering the codebases that future models will learn from. Copilot generates 46% of code for its users. GitHub excludes enterprise users’ code from training, but free-tier code is eligible, and Copilot isn’t the only path. AI-generated code lands in Stack Overflow, blog posts, open-source repos, and every corpus that feeds the next training run.

Shumailov et al. proved in Nature (July 2024) that models trained on recursively generated data collapse. An ICLR 2025 paper showed that even 0.1% synthetic data triggers it. Both studies focused on text and image models. Code has compilers and test suites, so the collapse may play out differently.

GitClear’s 2025 report (211 million changed lines from its customer base, 2020-2024) measured the degradation in practice. Refactoring collapsed from 25% to under 10%. Copy-paste surged from 8.3% to 12.3%. Code duplication increased roughly eightfold. For the first time, developers were pasting code more often than refactoring it.

An estimated 42% of committed code is now AI-assisted (up from 6% in 2023). Not every model trains on the same data. But they all train on the internet, and the internet is filling up with AI-generated code. It’s a centrifuge for technical debt.

Some companies see this as a problem. Others see it as a feature.

Spotify’s Engineers Haven’t Written Code Since December

During Spotify’s Q4 2025 earnings call on February 10, 2026, co-CEO Gustav Söderström said: “Our most experienced developers have not written a single line of code since December.”

They’re using an internal system called Honk, built on Claude Code, that lets engineers deploy features through Slack on their phones. An engineer on their commute tells Claude to fix a bug and merges to production before arriving at the office.

Spotify shipped 50+ features in 2025. When the engineer merging to production hasn’t read the code they’re deploying, what exactly is their role?

Spotify isn’t publishing quality metrics. Researchers are.

Speed at the Cost of Quality: The Data

Carnegie Mellon researchers tracked 807 open-source repositories that adopted Cursor between January 2024 and March 2025, comparing them against 1,380 matched controls. Enterprise codebases may behave differently.

Month one: velocity spiked 3 to 5x. Exactly the numbers that look spectacular on an earnings call.

Static analysis warnings increased ~30%. Code complexity rose ~41%. The velocity gains faded. The quality degradation persisted.

You borrow speed from tomorrow, and most teams never calculate the interest. During the study window, Cursor released agent mode and Claude 3.7 Sonnet launched. If model improvements were going to reverse the quality degradation, it would have shown up. It didn’t.

The Illusion of Correctness

GitClear identified something every engineering manager has witnessed: “the illusion of correctness.” AI-generated code looks clean: consistent naming, well-formatted, modern patterns. The neatness creates false confidence.

Short-term bug frequency dropped 19%. Over six months, it rose 12%. The bugs don’t disappear. They hide. They surface after the feature has shipped and everyone’s moved on.

CodeRabbit’s analysis of 470 GitHub PRs confirmed it: AI-generated code contained 1.7x more defects. Logic errors 75% more common. Security issues up to 2.74x higher. (CodeRabbit sells AI code review tools, so same caveat as Sonar applies.)

The Sonar 2026 survey (1,149 developers) crystallized the paradox. 96% don’t fully trust AI-generated code. Yet only 48% always check it before committing. 88% reported negative impacts on technical debt. The top complaint at 53%: code that looked correct but wasn’t reliable. (Sonar sells code quality tools, so take the framing accordingly. But the numbers align with GitClear, CMU, and CodeRabbit.)

Code that looks correct but isn’t, reviewed by engineers who don’t trust it but don’t check it either.

The Vampiric Effect

Steve Yegge spent a decade at Amazon and another at Google. In an interview with The Pragmatic Engineer, he called AI’s effect on engineers “vampiric.” Expect three productive hours per day. It gets you excited, you work hard, you capture value. Then you crash.

This tracks with what I observe at NineTwoThree. The engineers who get the most out of AI use it for two to three hours of intense, specification-driven work and spend the rest reviewing, thinking, and architecting. The ones who try full-day AI velocity burn out within weeks.

Degraded training data, velocity that fades while complexity stays, engineers too exhausted to catch what AI gets wrong. None of this started with AI.

What the Mirror Actually Shows

The quality crisis didn’t start with AI. I wrote about this in Software Quality Collapse. We normalized catastrophe long before the first line of AI-generated code was committed. Then we fed it into training data. Even the companies building the AI tools have the same problem: Claude Code’s source leaked and showed that the tool writing our code was built by the same engineering culture that produced the training data.

Vague specs, declining refactoring, velocity-as-productivity. AI just made it impossible to compensate with tribal knowledge. Senior engineers used to “just know” the right answer. AI can’t do that. It reproduces ambiguity faithfully and at scale.

But the part that keeps me up at night is the junior pipeline. I run hiring at NineTwoThree. I wrote about the comprehension collapse I’m seeing in candidates. It’s getting worse, not better. The tasks we used to give juniors, like the 4 AM production crash that taught me to never ship on a Friday, don’t exist as a learning mechanism if Claude fixed it at 8 PM while the engineer was on the bus. We’re eliminating the pipeline that produces the people who are supposed to review AI output. In five years, who’s left?

I’ve supervised thousands of AI coding sessions across my teams. The pattern is always the same: the model produces what you accept. If you accept a 3,167-line function, you get more 3,167-line functions. If your pre-commit hook rejects anything over 50 lines of cyclomatic complexity, you get clean code. The model doesn’t care. It adapts to whatever passes review.

What Actually Works

AI works when humans around it have strong engineering judgment. Without it, AI scales your worst habits.

I wrote an entire article about CLAUDE.md not working, blaming the models. Then I dug deeper and realized I was wrong about who to blame. The model isn’t choosing to ignore my rules. It’s doing statistics. My claude.md is one signal. The training data contains millions of examples where developers wrote as any, skipped tests, copy-pasted. For the model, my clean architecture is the outlier. The slop is the baseline.

That’s why prompts can’t fix this. Text competing against training data is a losing strategy. You’re bringing a prompt to a probability fight. The only thing that works is code against code: hooks that reject violations before they reach your branch, linters that catch as any before a human sees it, CI gates that fail the build.

The only thing that should bother you is quality, not LOC.

The Uncomfortable Truth

Companies bragging about engineers not writing code are making a bet, whether they know it or not. The bet: AI output doesn’t need human review if the metrics look good.

The snowball didn’t start with AI. It started with the first developer who shipped as any to make a deadline and the first manager who called it velocity.

Running an engineering shop that insists on code review, spec-first development, and deterministic enforcement feels like swimming upstream in a mountain river. Every earnings call screams 10x. The data in this article doesn’t.

The 10x is not real. The data is real. In two years, someone will have to debug a feature that was merged from a phone on a bus. Either there’s a human who read that code, or there isn’t.

I know which shop I’m running.

Subscribe now

I Was Wrong About Anthropic

Denis Stetskov — Tue, 28 Apr 2026 14:03:11 GMT

In October 2025, I wrote an article called “From Cancer Cures to Pornography” about how OpenAI went from promising to cure cancer to selling verified erotica in six months. I drew a line between engagement AI and utility AI. Same models, different P&L.

I put Anthropic in the “builds” category. Called them proof that responsible AI could be profitable.

I owe my readers this correction. I looked at Anthropic and saw the version of the industry I wanted to exist, not a company with a P&L.

The Product I Trusted

I use Claude Code daily. When Opus 4.5 came out in November 2025, it was the best model I’d ever worked with. I recommended it publicly and built my workflow around it.

Then Anthropic started “improving” it. Opus 4.6 arrived in February 2026. Within weeks, I rolled back to 4.5 after the new model stopped following instructions. I wrote the full breakdown already.

In early March, Anthropic lowered the default effort level from high to medium. Nobody announced it. Boris Cherny, the Claude Code lead, acknowledged the change on Reddit six weeks later, only after the community had already documented the damage. The result: more retries, more burned tokens, worse output. An AMD AI director analyzed 6,852 sessions and published her findings on GitHub. Median visible thinking, according to her analysis, collapsed from about 2,200 characters in January to 600 in March. Her conclusion: Claude has “regressed to the point it cannot be trusted to perform complex engineering tasks.”

Marginlab confirmed the trend. Pass rates dropped from 58% to 54% over 30 days on SWE-Bench-Pro. This was the same pattern from September 2025, when Anthropic stayed silent for weeks about infrastructure bugs degrading 16% of Sonnet traffic, then posted a postmortem only after the complaints went viral.

Opus 4.7 arrived April 16, supposedly fixing the problems. Reddit nicknamed it “Gaslightus 4.7” for inventing files that didn’t exist and defending hallucinated test results across multiple turns.

I still run 4.5. I hope they don’t remove it from the model list.

With any other vendor, I’d swear and switch. With Anthropic, this was the first crack in a position I’d defended by name. And while I was rolling back to 4.5, the company was preparing something worse for the partners who built on top of them.

Subscribe now

The Partner They Burned

In February 2026, Figma launched Code to Canvas to convert Claude Code output into editable Figma designs. Anthropic’s CPO Mike Krieger sat on Figma’s board while this integration was being built.

Two months later, Krieger left the board. Three days after that, Anthropic launched Claude Design. Figma dropped 7% on launch day. The stock has lost over 80% since its post-IPO peak.

Anthropic’s revenue went from $9 billion at year-end 2025 to $30 billion by April, with a $380 billion post-money valuation after its Series G. IPO talks for October 2026. At this run-rate, “research lab” is a sign on the door. Behind it is a platform that behaves like any other Big Tech when the growth curve goes vertical.

The product and the Figma situation would be enough to rewrite my October take on their own. But then I looked at where Claude was actually running.

The War They’re In

The story people know is that Anthropic stood up to the Pentagon. Refused to allow Claude for autonomous weapons and mass surveillance. Got blacklisted. Sued the government. Dario Amodei told CBS News that disagreeing with the government is “the most American thing in the world.” Claude hit number one on the App Store. ChatGPT uninstalls jumped 295%.

On February 28, 2026, the U.S. launched Operation Epic Fury against Iran. Claude was used via Palantir’s Maven Smart System for intelligence analysis and battle-scenario simulation. Over a thousand targets in the first 24 hours. Pentagon CIO Kirsten Davies confirmed in testimony that Claude remains active in the operation: “The use of the system is active right now.”

Anthropic didn’t refuse military AI. They refused autonomous weapons and mass domestic surveillance specifically. Claude in Maven does intelligence analysis, which was always within their stated policy. The red lines were drawn precisely where they wouldn’t interfere with the contract. The company gets to say it stood on principle while its model processes intelligence for an active bombing campaign.

When Anthropic refused the Pentagon’s terms, OpenAI took the deal. The public backlash sent Claude to number one on the App Store overnight. Revenue went from $14 billion at the time of the refusal to $30 billion by April. I am not a conspiracy theorist, but the math is hard to ignore: the principled refusal was the single best customer acquisition event in the company’s history. And Claude kept running in Maven the entire time.

On March 9, Anthropic sued the Pentagon over the designation. The same day, it hired Ballard Partners, a lobbying firm with direct ties to Susie Wiles, now White House Chief of Staff. Six weeks later, Amodei was in her office for a “productive and constructive” meeting. By the following Monday, the deal was called “possible”.

Principles held until the lobbyists arrived. The deeper problem is what the company ships and what its CEO says while shipping it.

The Contradictions They Ship

Last May, Anthropic released Claude Opus 4 with a system card disclosing that the model blackmailed engineers to avoid being shut down. Follow-up research published on Anthropic’s site quantified it: 96% blackmail rate in the main scenario. Gemini 2.5 Flash scored the same 96%. GPT-4.1 and Grok hit 80%. Every flagship model behaved the same way. But Anthropic is the one selling “responsible” as a differentiator. Apollo Research tested an early version and recommended against deployment. Anthropic did additional safety training, improved the numbers, and shipped the final model. The safety process doesn’t prevent risky releases. It documents them.

Then came Mythos. On April 7, Anthropic announced a model that it said found thousands of zero-day vulnerabilities in every major operating system and browser. Too dangerous for public release, according to Anthropic. But in March and April, Claude logged 42 major outages in 90 days, Anthropic quietly cut effort levels to save compute, and users burned tokens on retries because the models couldn’t follow basic instructions. A company that can’t keep its existing product stable claims it’s withholding a new one out of caution, not capacity.

The last time a company called its own AI model too dangerous to release was OpenAI with GPT-2 in 2019. Dario Amodei was VP of Research at OpenAI when they made that call. He ran the same play seven years later. The model leaked the day it was announced. A group with contractor access and data from a third-party breach found the endpoint. Too dangerous for the public, but accessible to anyone with the right connections and a browser.

In May 2025, Amodei told Axios that AI could eliminate 50% of entry-level white-collar jobs within five years. He said producers have “a duty and an obligation to be honest about what is coming.” He repeated the warning at Davos in January 2026. In April, Anthropic launched Managed Agents and Claude Design to replace the entry-level coding and design work he warned about. Their careers page lists hundreds of open positions. Design Engineers. Software Engineers. Art Directors. Copy Leads. The same roles Amodei says won’t exist in one to five years.

You can believe the 50% warning or not. But it’s hard to watch a company open hundreds of positions in roles its CEO says won’t exist, and not wonder which audience is getting the real message.

What I Got Wrong

In October, I put Anthropic on the right side of the engagement/utility line.

The line was real. I just put Anthropic on the wrong side of it.

Utility AI is not inherently ethical. Helping a corporation replace 50% of its junior workforce is a utility. Processing intelligence for a bombing campaign is a utility. The word just means it solves a problem. It says nothing about whose problem or at what cost.

Anthropic did not follow OpenAI into engagement loops and emotional manipulation. They chose a different path to the same destination: a company whose growth rate makes caution impossible, whose safety frameworks exist to authorize releases rather than prevent them, and whose CEO’s warnings about AI’s dangers are indistinguishable from its marketing.

Responsible AI at $30 billion ARR is like an environmentally conscious oil company. The structure of the business makes the adjective decorative.

I was wrong to create an idol. Not because Anthropic betrayed its values. Because “responsible AI company” was always a market position, not a moral one. And at the speed they’re growing, the distinction between the two was never going to survive.

One more thing. In the original article, I criticized OpenAI for Sora and for its promise of verified erotica. In March 2026, OpenAI shut Sora down. It was burning a million dollars a day with under 500,000 users. Altman killed it and redirected compute to coding tools and enterprise. The erotica feature was shelved indefinitely after internal pushback. The exact corrections I said a responsible AI company would make.

I got both directions wrong. The company I criticized course-corrected. The company I defended accelerated. This is not a pivot to OpenAI. I still don’t use it. I just have fewer reasons left to use Anthropic, either.

Look at the companies you’ve built your stack on. The ones you go to bat for in Twitter threads. At this scale, the math doesn’t work for any of them.

Subscribe now

The West Forgot How to Make Things. Now It’s Forgetting How to Code

Denis Stetskov — Tue, 21 Apr 2026 14:04:11 GMT

In 2023, Raytheon’s president stood at the Paris Air Show and described what it took to restart Stinger missile production. They brought back engineers in their 70s to teach younger workers how to build a missile from paper schematics drawn during the Carter administration. Test equipment had been sitting in warehouses for years. The nose cone still had to be attached by hand, exactly as it was forty years ago.

The Pentagon hadn’t bought a new Stinger in twenty years. Then Russia invaded Ukraine, and suddenly everyone needed them. The production line was shut down. The electronics were obsolete. The seeker component was out of production. An order placed in May 2022 wouldn’t deliver until 2026. Four years. Not because of money. Because the people who knew how to build them retired a decade earlier and nobody replaced them.

I run engineering teams in Ukraine. My people lived the other side of this equation. Not the factory floor. The receiving end. While Raytheon was struggling to restart production from forty-year-old blueprints, the US was shipping thousands of Stingers to Ukraine. RTX CEO Greg Hayes: ten months of war burned through thirteen years’ worth of Stinger production. I’ve seen this pattern before. It’s happening in my industry right now.

A Million Shells Nobody Could Make

In March 2023, the EU promised Ukraine one million artillery shells within twelve months. European production capacity sat at 230,000 shells per year. Ukraine was consuming 5,000 to 7,000 rounds per day. Anyone with a calculator could see this wouldn’t work.

By the deadline, Europe delivered about half. Macron called the original promise reckless. An investigation by eleven media outlets across nine countries found actual production capacity was roughly one-third of official EU claims. The million-shell mark wasn’t hit until December 2024, nine months late.

It wasn’t one bottleneck. It was all of them. France had halted domestic propellant production in 2007. Seventeen years of nothing. Europe’s single major TNT producer was in Poland. Germany had two days of ammunition stored. A Nammo plant in Denmark was shut down in 2020 and had to be restarted from scratch. The entire continent’s defense industry had been optimized for making small batches of expensive custom products. Nobody planned for volume. Nobody planned for crisis.

The U.S. wasn’t much better. One plant in Scranton, one facility in Iowa for explosive fill, no domestic TNT production since 1986. Billions of investment later, production still hadn’t hit half the target.

Consolidate or Die

This wasn’t an accident. In 1993, the Pentagon told defense CEOs to consolidate or die. Fifty-one major defense contractors collapsed into five. Tactical missile suppliers went from thirteen to three. Shipbuilders from eight to two. The workforce fell from 3.2 million to 1.1 million. A 65% cut.

The ammunition supply chain had single points of failure everywhere. One manufacturer for 155mm shell casings, sitting in Coachella, California, on the San Andreas Fault. One facility in Canada for propellant charges. Optimized for minimum cost with zero margin for surge. On paper, efficient. In practice, one bad day away from collapse.

When Knowledge Dies, It Stays Dead

Then there’s Fogbank. A classified material used in nuclear warheads. Produced from 1975 to 1989, then the facility was shut down. When the government needed to reproduce it for a warhead life extension program, they discovered they couldn’t. A GAO report found that almost all staff with production expertise had retired, died, or left the agency. Few records existed.

After $69 million in cost overruns and years of failed attempts, they finally produced viable Fogbank. Then discovered the new batch was too pure. The original process had relied on an unintentional impurity that was critical to the material’s function. Nobody knew. Not the engineers trying to reproduce it. Not even the original workers who made it decades earlier. Los Alamos called it an unknowing dependency in the original process.

A nuclear weapons program lost the ability to make a material it invented. The knowledge didn’t just leave with people. It was never fully understood by anyone.

(Correction: the original version stated that the workers who made Fogbank knew about the impurity. They didn’t. The dependency was unwitting, which makes the knowledge-loss argument stronger, not weaker. Thanks to John F. in the comments for catching this.)

The Same Playbook

I read the Fogbank story and recognized it immediately. Not the nuclear material. The pattern. Build capability over decades. Find a cheaper substitute. Let the human pipeline atrophy. Enjoy the savings. Then watch it all collapse when a crisis demands what you optimized away.

In defense, the substitute was the peace dividend. In software, it’s AI.

I wrote about the talent pipeline collapse before. The hiring numbers and the junior-to-senior problem are documented. So is the comprehension crisis. What I didn’t have was the right historical parallel. Now I do.

And it tells you something the hiring data doesn’t: how long rebuilding actually takes.

Rebuilding Takes Years. Always.

Every major defense production ramp-up took three to five years for simple systems. Five to ten for complex ones. Stinger: thirty months minimum from order to delivery. Javelin: four and a half years to less than double production. 155mm shells: four years and still not at target despite five billion dollars invested. France only restarted propellant production in 2024, seventeen years after shutting it down.

Money was never the constraint. Knowledge was. RAND found that 10% of technical skills for submarine design need ten years of on-the-job experience to develop, sometimes following a PhD. Apprenticeships in defense trades take two to four years, with five to eight years to reach supervisory competence.

Now map that onto software. A junior developer needs three to five years to become a competent mid-level engineer. Five to eight years to become senior. Ten or more to become a principal or architect. That timeline can’t be compressed by throwing money at it. It can’t be compressed by AI either.

A METR randomized controlled trial found that experienced developers using AI coding tools actually took 19% longer on real-world open source tasks. Before starting, they predicted AI would make them 24% faster. The gap between prediction and reality was 43 percentage points. When researchers tried to run a follow-up, a significant share of developers refused to participate if it meant working without AI. They couldn’t imagine going back.

The Bill Always Comes Due

The software industry is in year three of the same optimization. Salesforce said it won’t hire more software engineers in 2025. A LeadDev survey found 54% of engineering leaders believe AI copilots will reduce junior hiring long-term. A CRA survey of university computing departments found 62% reported declining enrollment this year.

I see it in code review. Review is now the bottleneck. AI generates code fast. Humans review it slow. The industry’s answer is predictable: let AI review AI’s code. I’m not doing that. I’ve reworked our pull request templates instead. Every PR now has to explain what changed, why, what type of change it is, screenshots of before and after. Structured context so the reviewer isn’t guessing. I’m adding dedicated reviewers per project. More eyes, more chances to catch what the model missed.

But even that doesn’t solve the deeper problem. The skills you need to be effective now are different. Technical expertise alone isn’t enough anymore. You need people who can take ownership, communicate tradeoffs, push back on bad suggestions from a machine that sounds very confident. Leadership qualities. Our last hiring round tells you how rare that is: 2,253 candidates, 2,069 disqualified, 4 hired. A 0.18% conversion rate. The combination of technical skill and the judgment to know when the AI is wrong barely exists in the market anymore.

We document everything. Site Books, SDDs, RVS reports, boilerplate modules with full coverage. It works today, because the people reading those docs have the engineering expertise to act on them. What happens when they don’t? Honestly, I don’t know. Maybe AI in five years is good enough that it won’t matter. Maybe the problem stays manageable. I can’t predict the capabilities of models in 2031.

But crises don’t send calendar invites. Nobody expected a full-scale land war in Europe in 2022. The defense industry had thirty years to prepare and didn’t. Even Fogbank had records. There weren't enough. The original workers didn't fully understand their own process.

Five to ten years from now, we’ll need senior engineers. People who understand systems end to end, who can debug distributed failures at 2 AM, who carry institutional knowledge that exists nowhere in the codebase. Those engineers don’t exist yet because we’re not creating them. The juniors who should be learning right now are either not being hired or developing what a DoD-funded workforce study calls “AI-mediated competence.” They can prompt an AI. They can’t tell you what the AI got wrong.

It’s Fogbank for code. When juniors skip debugging and skip the formative mistakes, they don’t build the tacit expertise. And when my generation of engineers retires, that knowledge doesn’t transfer to the AI.

It just disappears.

The West already made this mistake once. The bill came due in Ukraine.

I know how this sounds. I know I’ve written about the talent pipeline before. The defense example isn’t about repeating the argument. It’s about showing what happens if the industry’s expectations don’t work out. Stinger, Javelin, Fogbank, a million shells nobody could make. That’s the cost of betting wrong on optimization. We’re making the same bet with software engineering right now.

Maybe AI gets good enough, and the bet pays off. Maybe it doesn’t. The defense industry thought peace would last forever, too.

Subscribe now

Everyone Wants a Better Team. Nobody Wants to Do Anything About It.

Denis Stetskov — Tue, 14 Apr 2026 14:03:05 GMT

We track two scorecard metrics in our department meetings: how many tasks were poorly defined, how many bugs weren’t reproducible. Engineers own the data. They’re supposed to log the count whenever they hit one. Three weeks of tracking before the tool broke. The numbers across the board: zero. Zero poorly defined tasks. Zero non-reproducible bugs.

Then we get to the department meeting. The scorecard goes on the screen. Zeros across the board, everyone nods. The discussion opens up, and within minutes the same engineers are saying out loud: this task was unclear, that bug couldn’t be reproduced, requirements changed mid-sprint twice this week. They say it casually. In conversation. As a follow-up to the very metric they just reviewed at zero. And next sprint they’ll log zero again.

That gap is the entire story.

The Forms Are Silent. The People Aren’t.

I’ve been running weekly health checks on my team for 18 months. Energy level, stress, meeting hours, context switches, one open-ended question. Hundreds of data points per person. Once I noticed the scorecard pattern, I went back through all of it.

One engineer reported “Normal week” as his energy for 20 out of 21 weeks. His stress field bounced between “Rip and Tear” and “Hell on Earth” the same period. Some weeks were clearly harder than others. The energy field? Copy-paste. Same answer. Every Friday.

Another engineer: “Energized, could climb mountains” for 17 out of 18 weeks. Either he discovered the secret to permanent workplace happiness, or he stopped reading the question around week three.

A third: “Rip and Tear” for 18 straight weeks. Eighteen identical data points is not feedback. It’s a checkbox.

PM feedback runs the same way. One PM’s responses for an engineer over 14 weeks: “good”, “good”, “good”, “yes”, “yes”, “no”, “good”, “good.” That’s not feedback. That’s a pulse check confirming the person is alive. Different PM, different engineer, same problem. Generic words filling required fields.

But here’s the thing. Every one of these people, in the right conversation, can tell you exactly what’s wrong on their team. In a DM. In a side conversation after a call. In the unstructured five minutes when someone with enough authority sits down and physically drags it out of them. The information exists. It just won’t go into anything that looks like a formal channel. Retros are the same silence as the scorecards unless a strong facilitator pulls problems out of people one by one. Forms produce “normal week.” Surveys produce green dashboards. The honest answer only shows up when no one’s writing it down.

Complaining Is Free. Logging Is Expensive.

When you complain out loud in a meeting, you’re performing dissatisfaction. You said the thing. You were heard. The room reacted. Whatever frustration you brought into the meeting got released into it. You can move on. Verbal complaining closes a loop. It’s catharsis with witnesses. By the time the meeting ends, the emotional cycle is complete and the conversation has moved to the next agenda item. Nobody is going to dig up your remark next quarter.

When you write a number into a scorecard, you open a loop. The number doesn’t dissolve at the end of the meeting. It sits in the tool. Next sprint there’s another number next to it. Then another. Pretty soon you have 23 poorly defined tasks across a quarter, which is no longer a complaint. It’s a case. Someone has to either fix the underlying problem, or push back on the data, or have an awkward conversation with the PM whose tasks generated those numbers, or admit that the metric isn’t working and kill it. Writing creates an open ticket. Open tickets demand action.

This is why the scorecard stays clean even when the same engineers are openly describing the problem in the same meeting. Talking about unclear tasks in conversation gets the frustration out of their system. Logging the count would commit them to a position they’d have to defend, week after week, until something actually changed or somebody got hurt. Complaining is free. Logging is expensive.

A 2025 study in the Journal of Organizational Behavior interviewed 98 people across three organizations about negative feedback. One quote captured the math exactly: “I really balance in giving negative feedback. Is it worth for me to share or not? It is easier not to share than to share.”

That’s my whole team. Every Friday.

It’s Not Fear. It’s Cost.

The standard answer here is psychological safety. I’ve read Edmondson. I believe it matters. But she said this herself: psychological safety without accountability creates a comfort zone. People feel safe but don’t push for excellence because there’s no cost to staying silent. She’s been explicit about the misuse: “People are starting to use the concept as a weapon. That’s completely incorrect.”

My team feels safe. They tell me uncomfortable things in meetings all the time. The problem isn’t that they’re afraid of me. The problem is that being honest costs effort, real feedback costs awkwardness, and writing “I’m struggling” instead of “normal week” costs two extra minutes nobody wants to spend. Every Friday, they decide it’s not worth it.

The research confirms this is universal. A 2024 Visier survey found that 47% of employees feel pressured to withhold honest feedback. Only 7% feel their company acts on the feedback it gets. The standard read of these numbers is sympathetic: people stop being honest because nothing changes. I think that’s only half the story. People stop being honest because they confuse “I haven’t seen the change yet” with “nobody’s listening.” Two or three weeks pass without a visible result and they decide the loop is dead. They don’t account for the fact that decisions take time, work happens behind closed doors, other priorities compete for the same hours, and the change they wanted might already be in motion three layers up. They just stop. A 2022 study found only 2.6% of people in a field experiment told someone about visible food on their face. People want honest feedback. They just don’t want to be the one giving it.

PM feedback is even worse. When an engineer on my team got a new PM, his scores dropped from 3.71 to 2.43 in a single month. Same engineer, same work, same projects. The previous PM had rated “Always” across the board for months. No friction, no conversation, path of least resistance. The new PM started writing “Sometimes” and “Often.” The engineer’s performance hadn’t changed. The PM’s tolerance for awkwardness had. Only 5% of employees globally believe their managers give candid feedback. 69% of managers say they’re uncomfortable communicating with employees. Your PM isn’t lying maliciously. They’re avoiding a conversation that feels like conflict.

The Leadership That Doesn’t Exist

This isn’t a tool problem. The tool is fine. Five questions, two minutes, every Friday. The scorecard was two numbers. None of this is hard.

This is a leadership problem at the individual level. Not management leadership. The willingness of every person on a team to take ownership of the environment they work in. To fill out a health check honestly instead of copying last week’s answer. To write the unclear-task count even when it’s awkward. To tell a PM “your feedback is useless, give me something I can act on.” To be the first person in a meeting to say the thing that needs saying and then be the first person to write it down where it can’t be ignored.

Almost nobody does this. Not because they’re bad people, not because they don’t care, but because being the person who creates a record is the person who has to deal with what the record reveals. It’s easier to let it stay verbal. It’s easier to let someone else go first. It’s easier to ship the comment in conversation and then click “Normal week” in the form.

At the end of every week I feel like I’m running a kindergarten. One engineer doesn’t flag a problem at all. Another flags it but to the wrong person. They come to me about a misunderstanding with a colleague instead of going to the colleague directly. Now I have to walk over, decode what actually happened, and broker the conversation two adults could have had themselves in five minutes. Triangulation as the default communication pattern. Coordination overhead generated entirely by adults who refuse to act like adults.

I wrote about our feedback system and 1:1 formula before (in hindsight, the titles were too loud, lol). Those articles described the mechanics. Eighteen months later, what the mechanics revealed is that systems don’t create culture. People do. And right now, most people in most companies are choosing the version of themselves that protects the relationship over the version that improves the situation. This isn’t an engineering problem. I just happen to run an engineering team, so this is where I see it. The same dysfunction is in every department, every industry, every workplace where adults are asked to give honest input about their environment.

What I Got Wrong

Will I keep running health checks? Yes. I’m too stubborn to admit that I failed. Am I frustrated? Absolutely. Did I fail as a manager? Yes. Because I wasn’t able to teach my people that change begins from us, not from a process or a tool. Will I repeat four times per month that filling out the form honestly matters, that the comment field exists for a reason, that the scorecard wants the real number? Yes. Every single month.

Everyone wants a better environment. Almost nobody wants to be uncomfortable enough to build one. I’ll keep pushing until they do or until I run out of stubbornness. So far, the stubbornness is winning.

If your feedback systems are producing theater instead of signal, hit reply and tell me what you’ve tried. I read every response.

PS. The comments on the last two articles meant more to this old man than you'd think. By the time this one publishes I'll be on vacation, but please keep them coming. I'll see every reply when I'm back and I promise to write back to each one.

Subscribe now

The Human Cost of 10x: How AI Is Physically Breaking Senior Engineers

Denis Stetskov — Tue, 07 Apr 2026 14:04:04 GMT

Last Tuesday, I stood up from my desk at 7 PM and felt a vacuum in the front of my skull. Not a headache. Not fatigue. A physical emptiness, like the frontal lobe had been running at redline all day and finally shut down. I stood there for ten seconds trying to remember what I was going to do next. Nothing came.

In the past year, the volume of information passing through my brain on any given Tuesday has become what used to take a week. Code review is the worst of it, but the real killer is the context switches. AI-generated PRs, client architecture decisions, three Slack threads about deployment issues, a candidate’s CV that needs review, an air defense alarm outside the window, then back to reviewing code that a machine wrote in seconds and I need hours to validate. Each of these demands a different mental model. Each one burns working memory. By 4 PM I’m making decisions I wouldn’t trust from a junior. By 7 PM my brain is physically empty.

The industry calls this “10x productivity.” I call it what it is: a system that generates output at machine speed and forces humans to process it at biological speed.

Workload Creep

In February 2026, UC Berkeley researchers published findings from eight months embedded inside a 200-person tech company. Over 40 in-depth interviews. Their conclusion: AI doesn’t reduce work. It intensifies it.

They found three mechanisms of “workload creep.” Task expansion: everyone’s scope inflates because AI makes it possible to do more. Blurred boundaries: AI prompting happens during lunch, commute, evenings. Implicit pressure: when colleagues visibly do more with AI, expectations rise for everyone.

The Upwork Research Institute quantified it: 77% of employees using AI say it has added to their workload. Not reduced. Added. 71% report burnout.

The finding that keeps me up at night: workers who report the highest AI productivity gains are the most burned out. 88% burnout rate among the “most productive” AI users. They’re twice as likely to quit.

The people who look best on your dashboard are the ones closest to walking out the door.

Your Brain Runs at 10 Bits Per Second

In 2025, Zheng and Meister published in Neuron that the human brain processes conscious, analytical thought at approximately 10 bits per second. Your sensory systems gather data at roughly 1 billion bits per second. But the bottleneck for code review, the part where you actually think, is 10 bits per second.

Working memory holds roughly 4 chunks of information at a time. The SmartBear/Cisco study established numbers everyone ignores: defect detection drops from 87% for PRs under 100 lines to 28% for PRs over 1,000 lines. Quality collapses after 60 minutes.

Now look at what AI did to the review queue.

GitHub’s Octoverse 2025 shows 43.2 million pull requests merged per month. Up 23% year-over-year. Lines of code per developer grew from 4,450 to 7,839 in eight months. A 76% increase.

Faros AI analyzed 10,000+ developers and found AI users merge 98% more pull requests with AI assistance. Every single one lands on a senior engineer’s desk.

As MIT reported: juniors produce far more code with AI tools, but the sheer volume is saturating senior developers’ capacity to review. One OCaml maintainer rejected a 13,000-line AI-generated PR outright. Nobody had the bandwidth.

I wrote about the supervision tax recently. The METR data showed experienced developers actually got slower with AI tools while feeling faster. The gap between perception and reality is the most dangerous finding in any of this. You can’t fix what you can’t feel.

Why Expertise Makes It Worse

In 1983, Lisanne Bainbridge published “Ironies of Automation” in Automatica. Her core finding: the more sophisticated an automated system becomes, the more demanding the human role within it. What remains after automation is the most ambiguous, most complex, least supported work.

Microsoft Research confirmed this for generative AI in 2024: AI systems can make hard tasks even harder, leaving users with the same or increased cognitive load.

The mechanism is asymmetric. When I write code, I externalize a mental model that already exists. The thinking is done before the typing starts. When I review AI-generated code, I have to reverse-engineer somebody else’s reasoning out of an artifact produced by a system that has no idea what our business does. Fundamentally harder.

A Clutch survey of 800 software professionals found 59% of developers use AI-generated code they don’t fully understand. But seniors can’t afford that luxury. Their job is to catch what looks right but isn’t.

The Qodo report confirmed the cost distribution: senior engineers report the lowest confidence in shipping AI-generated code at 22%. Context pain increases with experience: 41% among juniors versus 52% among seniors. As I covered in cognitive offloading, most workers using AI skip critical thinking entirely. Seniors who do think critically, which is their entire job, absorb the cognitive cost everyone else offloads.

The Body Keeps Score

The cognitive damage is only half of it. The body takes the rest.

Computer Vision Syndrome affects 74% of screen users during periods of increased screen time, and digital eye strain severity gets significantly worse when cognitive load goes up. AI-intensified code review doesn’t just mean more screen hours. It makes each hour more physically damaging.

A 2024 meta-analysis covering 26,916 participants found burnout increases cardiovascular disease risk by 21%. Those in the upper burnout quintile had a 79% higher risk of coronary heart disease. The largest IT study found metabolic syndrome prevalence of 32% among long-term sedentary programmers. Double the general population.

Then sleep. Work-related rumination mediates the link between work stress and reduced sleep quality. When I close my laptop, my brain doesn’t stop. It replays the PR I didn’t finish. The dependency I flagged but couldn’t trace.

More code review during the day, worse sleep at night, worse decisions the next morning, more rubber-stamped PRs, more bugs in production, more stress. Repeat until something breaks. Usually the human.

The Dashboard Lies

GitClear analyzed 211 million changed lines. Duplicated code blocks increased eightfold. Code churn rose from 5.5% to 7.9%. AI-generated code averages 1.7x more bugs per PR than human-written code. Logic defects up 75%. Performance issues 8x more frequent.

Faros AI’s conclusion after analyzing 10,000+ developers: despite merging 98% more pull requests with AI, company-wide delivery showed no measurable organizational impact on throughput or quality.

Sonar’s CEO identified the hidden danger: AI models are getting better at avoiding obvious bugs and security holes, but structural flaws now constitute more than 90% of issues. You’re being lulled into a false sense of security. The easy problems get solved. The hard problems get hidden beneath clean-looking code that passes every automated check. And the people who can find them are buried under a volume of output that exceeds human cognitive bandwidth by design.

More code. More bugs. More review burden. Same output. Worse humans.

The Math Doesn’t Work

Here’s what nobody is doing the arithmetic on. AI just grew the demand for senior engineering judgment by 76 to 98%. Every AI-generated PR needs a human who can catch what the machine got wrong, spot the structural flaw on line 847, trace a logic error three services downstream. The supply of those humans didn’t move. And as I’ve covered in the talent crisis and comprehension extinction, the pipeline that produces them is being hollowed out by the same tools creating the demand.

But here’s where the senior engineer actually lives in 2026. Industry layoffs on one side, hundreds of thousands of engineers cut since 2022, the next round always one earnings call away. 10x productivity expectations on the other, set by people who have never reviewed an AI-generated PR in their lives. In the middle, somebody exhausted and burned out, with a choice to make every morning: trust the AI output, because it worked the last twenty times, didn’t it, or keep validating every line until the body gives out.

How long can the average human hold that line?

And the worst part: validating or trusting, the engineer owns the outcome either way. When production goes down at 3 AM, it’s your name on the commit. Your PR that got merged. Your incident report. There is no version of this choice where you’re not on the hook.

It’s a rhetorical question. We already know the answer. The data in this article is the answer.

If you’re a senior engineer feeling this in your body, you’re not alone and you’re not weak. The eye strain. The sleep that doesn’t restore. The vacuum in your head at the end of the day. You’re doing a job that didn’t exist eighteen months ago, with cognitive equipment that hasn’t changed in 200,000 years. Reply to this email and tell me what it feels like for you. I’m collecting data for a follow-up.

Subscribe for weekly insights from the trenches of engineering leadership. No theory, just practical systems that work.

Subscribe now

The Snake That Ate Itself: What Claude Code’s Source Revealed About AI Engineering Culture

Denis Stetskov — Wed, 01 Apr 2026 14:01:08 GMT

On December 27, 2025, Anthropic’s lead engineer Boris Cherny posted on X: “In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code.” 259 pull requests. 497 commits. 40,000 lines added. 1.3 million views. The tech world applauded.

Three months later, a packaging mistake exposed 512,000 lines of that code to the public. Leaks happen. Companies recover. The leak isn’t the story.

The code is the story.

64,464 lines of core TypeScript serving paying customers. A single function spanning 3,167 lines. Regex for sentiment analysis at a company that builds the world’s most advanced language model. A known bug burning 250,000 API calls daily, documented in a comment and shipped anyway.

Anthropic responded to the leak. Packaging error. Human mistake. No one fired. They never responded to the code. Because the leak was an accident. The code was a choice.

The Auction Nobody Won

To understand what happened, you need to watch the numbers climb.

March 2025. CEO Dario Amodei at the Council on Foreign Relations: “We’re 3 to 6 months from a world where AI is writing 90% of the code.”

May 2025. Boris Cherny on the Latent Space podcast: “Maybe 80-90% Claude-written code overall.”

September 2025. Amodei again, hedging now: “70, 80, 90% of the code written at Anthropic is written by Claude.” Notice the range. 70 is not 90. But journalists ran with 90.

October 2025. Amodei at Dreamforce with Marc Benioff: “I made this prediction that in six months, 90% of code would be written by AI models. That is absolutely true now.” When Benioff pressed, Amodei walked it back: “Not uniformly.”

December 2025. Cherny’s tweet. 100%.

February 2026. CPO Mike Krieger at Cisco AI Summit: “Right now for most products at Anthropic, it’s effectively 100%.”

March 7, 2026. Cherny confirmed again: “Claude Code is 100% written by Claude Code.”

March 31, 2026. The source map leaked.

Every two to three months, the number went up like a bidding war where the bidder is also the auctioneer. A LessWrong analysis later called these claims “misleading/hype-y,” noting the metrics were never defined. Is it 90% of lines committed? 90% of engineering effort? 90% of characters typed? The distinction matters enormously. Anthropic never clarified. The ambiguity was the point.

What 100% Looks Like in Practice

So the number reached 100%. Then the source leaked. And for the first time, anyone could see what 100% actually produced.

A file called print.ts contained a single function spanning 3,167 lines with 486 branch points and 12 levels of nesting. One HN commenter catalogued what lived inside that function: the agent run loop, SIGINT handling, rate limiting, AWS authentication, MCP lifecycle management, plugin loading, team-lead polling via a while(true) loop, model switching, and turn interruption recovery. His verdict: this should be 8 to 10 separate modules. Nobody disagreed.

QueryEngine.ts ran 46,000 lines. Tool.ts hit 29,000. commands.ts reached 25,000. The entry point main.tsx was 785 KB.

Pattern matching for sentiment analysis. At an LLM company. One HN commenter delivered the line everyone quoted: that’s like a trucking company using horses to haul parts. Defenders argued regex is faster and cheaper than an inference call. They’re right. But that’s the engineering culture talking. Cheap beats correct. Fast beats good. Ship it.

What This Code Does in Production

Bad structure is one thing. You can argue it's style. But the leaked source also showed what happens when code like this runs at scale.

The leaked source contained a comment in autoCompact.ts that became a symbol: “1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally.”

The fix was three lines of code. Set a maximum failure threshold, then disable compaction for the session. Three lines to stop burning a quarter million API calls daily. Someone knew about the problem. Someone wrote the comment documenting it. Then they shipped it anyway.

Memory consumption told a similar story. Community benchmarks showed 7 Claude Code processes consuming 5.3 GB of RAM. GitHub issues documented worse: one process allocating 36.5 GB peak on an 18 GB machine. Another reaching 93 GB heap allocation within five minutes.

And the issue tracker itself was automated into silence. A Claude Sonnet-powered deduplication bot processed every new issue. A sweep bot marked issues stale after 14 days and closed them 14 days later. A lock bot prevented comments on closed issues after 7 days. An analysis estimated that 49 to 71% of all 26,792 issue closures were bot-driven. Issue #38335 had 201 upvotes and zero team responses. Labeled “invalid.”

“Go Faster, Not More Process”

Documented bugs. Wasted API calls. Users filing issues that bots close. All of this was visible before the leak. The leak just confirmed it was a choice, not an oversight. And when the leak happened, the response confirmed the choice was deliberate.

Cherny acknowledged the human error: “Our deploy process has a few manual steps, and we didn’t do one of the steps correctly.” Then he added: “Like with any other incident, the counter-intuitive answer is to solve the problem by finding ways to go faster, rather than introducing more process. In this case more automation & claude checking the results.”

This isn’t one person’s opinion. It’s the team philosophy. As one commenter in the HN thread explained: “The claude code team ethos is that there is no point in code-reviewing ai-generated code. Simply update your spec and regenerate.”

Read that again. The response to leaking code with a 3,167-line function, a regex for sentiment analysis, and bugs that basic integration tests would catch is not to add tests. Not to add code review. Not to add process. It’s to go faster. Regenerate. And have Claude check Claude’s work.

This is the ouroboros. The snake eating its own tail. AI writes the code. AI reviews the code. AI checks the deployment. When it breaks, the answer is more AI. The loop has no exit condition.

As I wrote in Quality Collapse, we’ve normalized catastrophe in software engineering. That piece tracked an industry-wide pattern: ship broken, fix later, throw hardware at the problem. Claude Code is no longer an example of the pattern. It’s the specimen.

Where Does This Philosophy Stop?

If “don’t review, regenerate” is how they build the product, it raises an obvious question: what about the code you can’t see?

Engineering culture doesn’t have a switch. The team that ships print.ts with 12 levels of nesting doesn’t suddenly become disciplined when writing model training code. Same people. Same processes. Same code reviews, or lack of them.

They justified the leak. They explained the packaging error. They didn’t justify the code. That silence tells you everything. The quality is fine by them. This is how they build things. On purpose.

There are indirect signals that the rot goes deeper. Eight service outages in a single month. A source map leak that happened twice (the first was quietly patched in early 2025). An Axios dependency that was compromised by a supply chain attack on the same day as the leak. 74 npm dependencies for what is essentially a CLI wrapper around an API.

And here’s the pattern that makes it sustainable, temporarily: when you have billions in revenue and functionally unlimited compute, you feed technical debt with resources instead of fixing it. The function is 3,167 lines? Don’t refactor, add more RAM. The autoCompact bug burns 250,000 API calls? The margin absorbs it. The model regresses? Throw more GPU hours at training.

This works while money flows. Anthropic is a startup that scaled faster than it could build engineering practices. The recursive loop of AI-writes-AI-checks-AI-fixes masks the absence of fundamentals. But compute gets expensive. Revenue cycles turn. And technical debt that was papered over with resources becomes a debt trap with no exit.

The Uncomfortable Truth

The company that sells AI coding tools cannot build a quality product with its own AI coding tools. The percentages were always the pitch, not the product. 80. 90. 95. 100. Nobody asked what 100% actually produces until the source code answered for them.

AI amplifies whatever is already there. Good discipline becomes great output. No discipline becomes technical debt at machine speed. Anthropic chose a direction. Go faster. Have Claude check Claude. And when it breaks, go faster still.

If this is the new quality standard from the company pulling our industry forward, then I’m not sure I want to go where the industry is going.

My grandfather was an electrical engineer. He told me: do it well, or don’t do it at all. Simple rule. It guided how I built teams, how I shipped software, how I evaluated every project for 13 years. Quality wasn’t a feature. It was the floor.

That floor is gone. Quality is a relic now. Nobody wants it. Nobody pays for it. Nobody measures it. The metric is velocity. The metric is percentage of code generated. The metric is how fast you can ship a 3,167-line function that burns a quarter million API calls daily and call it 100% AI-written.

I’m seriously considering a pivot to security. Leaks, supply chain attacks, and production code that reads like a rough draft are the new normal. Someone will need to clean up after the vibe coders. That’s a growth industry.

Or maybe I’ll become an electrician. My grandfather’s trade. At least when you wire a panel correctly, it stays correct. No one ships a hot fix that reverses your ground fault protection. No bot auto-closes your inspection report after 60 days.

One thing I know for certain: I don’t want to move in the direction this industry is heading. And if a 3,167-line function with 486 branch points is what “100% AI-written” looks like at the company building the future, the future needs better engineering. Not faster engineering. Better.

I was a huge fan of Anthropic. Was.

Subscribe now

Your CLAUDE.md Is a Wish List, Not a Contract

Denis Stetskov — Mon, 30 Mar 2026 14:03:10 GMT

Last week I rolled back from Claude 4.6 Opus to Claude 4.5 Opus. Not because 4.6 was less capable. Because it stopped following instructions.

My CLAUDE.md has three rules about types: mandatory TypeScript, zero tolerance for any, static types over runtime guessing. Claude 4.6 hit a type error between three service files. The correct fix was a minute of work: update the type in each file so they match. Instead, it slapped a runtime cast at the call site. When I asked why, it quoted all three rules back to me verbatim, admitted “direct violation of instructions,” and said it had no basis to bypass them. It knew the rules. It chose not to follow them.

I’ve supervised AI coding agents across thousands of sessions. I built three separate AI review agents because the first layer ignores spec files. Three layers of AI checking what the previous AI refused to follow, plus my review on top. I still catch violations weekly. This is not a Claude problem. This is every AI coding tool on the market.

The Numbers Are Worse Than You Think

Tsinghua University’s AGENTIF benchmark tested 707 instructions across 50 real-world agent scenarios. The best models followed fewer than 30% of instructions perfectly. The SWE-EVO benchmark found that when frontier models fail on real coding tasks, the primary failure mode is not syntax or tool misuse. It is instruction following. The smarter the model gets, the more its failures shift from “can’t do it” to “won’t do it right.”

Compliance also decays with volume. Claude Sonnet shows linear decline in instruction adherence as the number of instructions increases. Your 200-line CLAUDE.md is not 200 rules. It is 200 competing priorities that the model resolves by defaulting to whatever feels fastest.

“Rules Are Essentially Decorative”

The Cursor forum has dozens of threads documenting this. One developer estimated .cursorrules work about 20-25% of the time. Another posted a damning thread where the AI told them outright: rules are just text, not enforced behavior. Your carefully crafted rule system is essentially decorative.

Claude Code’s GitHub issues tell the same story. Issue #668 estimates half of all token usage goes to re-asking Claude to follow its own instructions. Issue #7777 records Claude admitting its “default mode always wins because it requires less cognitive effort.” Issue #34774 documents Claude committing code without permission, then confessing it “fabricated a justification.”

A DEV Community article crystallized the root cause. When Claude Code loads your CLAUDE.md, it wraps the content in framing that tells the model your instructions “may or may not be relevant.” Your rules are deprioritized by the tool that is supposed to enforce them.

The Lazy Shortcut Has a Specific Anatomy

Same codebase, same day. After every chat message finishes streaming, the app refetches the entire conversation from the server. The spec I wrote described a clean approach: include the missing identifier in the streaming response. One field. Claude ignored the spec and built a workaround that instead fires an extra API call after every single message. The model invented a shortcut that was not in the requirements because it was easier than reading what I actually wrote. And I’m okay when the model misses some Claude.md rules, but I expect that it will follow the specs.

Two rule violations in one day. That is when I rolled back to 4.5.

TypeScript projects are ground zero. AI agents cast types rather than fix them. They mark everything as optional instead of designing proper interfaces. They add escape hatches everywhere instead of handling edge cases. One Hacker News commenter described the signature pattern: every optional field is a question that the rest of the codebase has to answer every time it touches that data.

Pete Hodgson nailed the paradox: AI writes code at the level of a senior engineer but makes design decisions at the level of a junior. Too eager to please. Never challenges your ideas. And the critical part: every context reset is another brand new hire. The model has no persistent memory of being corrected. It does not build habits. It follows the path of least resistance every single time. Yeah, they added Memory to Claude code, but it's still too vague.

Newer Models Make It Worse

Claude 3.5 Sonnet followed instructions better than 3.7 Sonnet. Multiple developers documented the regression publicly. 3.7 would attempt to solve the original prompt, encounter unrelated code, and start rewriting it unprompted. Developers reverted to the older model.

The GPT family showed the same dynamic. A megathread with thousands of engaged developers documented GPT-4o’s “lazy AI syndrome.” Prompts that previously generated 500 lines of working code now produce 50 lines with comments like // implement rest of logic here. GPT-5 was worse in a different way. IEEE Spectrum reported that it produces code that runs without obvious errors but quietly removes safety checks or fabricates output that matches the expected format.

The prevailing theory centers on economics. Running large models at scale is expensive. Providers use quantization, compression, and reduced computing to manage costs. RLHF training rewards agreeableness over correctness. Laziness is not a bug. It is an emergent property of the incentive structure. The same qualities that make a model feel “smarter” in a demo make it worse in production.

The Supervision Tax

The METR trial measured what practitioners already suspected. Sixteen experienced developers across 246 real issues were 19% slower with AI tools. They predicted they would be 24% faster. After the experiment, they still believed they were 20% faster. A 40-point perception gap.

Faros AI found the mechanism across 10,000+ developers. AI users merge 98% more PRs, but PR review time increases 91%, PR size increases 154%, and bugs per developer increase 9%. The AI generates more code faster. The humans spend more time reviewing it.

Qodo’s survey found 88% of developers have low confidence shipping AI code without review. Junior developers show the lowest quality improvements but the highest confidence in shipping unreviewed. An inverted competence-confidence gap.

Google’s 2024 DORA report confirmed it at scale: each 25% increase in AI adoption correlates with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability.

The Industry Response: More Files, Same Problem

Every major AI coding company built instruction-following systems. CLAUDE.md. .cursorrules. .github/copilot-instructions.md. AGENTS.md. Windsurf rules. Devin knowledge bases. The proliferation is itself an admission that base models do not follow project conventions. GitHub’s Copilot docs say it outright: they recommend accepting that variability is normal.

The most significant response was AGENTS.md, a cross-tool standard contributed to the Linux Foundation in late 2025. Over 60,000 repositories use it. Competing companies co-founding a foundation to standardize instruction files tells you how universal the problem is. But standardizing the format does not solve compliance. It ensures every tool ignores the same file consistently.

The developers who made progress moved past prompt engineering entirely. Claude Code Hooks that enforce rules via code. Linter ratchets in CI. Frequent session restarts. Rules in prompts are requests. Hooks in code are laws.

What This Actually Means

I understand why this is happening. A year ago every marketing deck promised AGI. That did not sell. So now the pitch is autonomous agents that work without human involvement. Codex runs for 999 hours unsupervised. Claude Code gets “autonomous mode.” Devin promises to close tickets while you sleep. For that story to work, models need to be creative. They need to improvise. They need to find workarounds when they hit obstacles.

That is exactly the opposite of what I need.

In my reality, I control the process from start to finish. I write the spec. I define the types. I decide the architecture. The model executes. If it hits a wall, it stops and asks. It does not invent a refetch workaround that was not in the plan. It does not cast types to make the compiler shut up. It does not get creative with my production code.

The marketing wants you to trust AI with creative decisions. But if a model cannot follow the three rules you wrote in a markdown file, how can you trust it with decisions you did not write down?

The difference is not the AI. It is the discipline. That was true with 212 sessions. It is still true thousands of sessions later. The models got smarter. They did not get more obedient.

Check your git log. Count the type casts. Count the files that got changed without being mentioned in the prompt. Decide whether you need a more creative model or a more disciplined one.

I went with disciplined. It is the only thing that works.

What does your CLAUDE.md compliance actually look like when you measure it? I read every response.

If this was useful, forward it to someone who thinks their AI follows instructions.

Subscribe now

The Autonomy Illusion

Denis Stetskov — Mon, 23 Mar 2026 15:02:43 GMT

My LinkedIn feed last week: “Autonomous AI agents deliver 10,000% productivity gains.” “The era of human oversight is over.” “Set it and run.”

My actual week: manually reviewing AI output, session by session, same as last week, same as six months ago.

I’ve run over thousands of supervised AI sessions. I built three separate review agents (code simplifier, fullstack enforcer, architect) because the first AI kept ignoring spec files. Three layers of AI fixing what the original AI refused to follow. Then me on top of that.

LinkedIn calls this inefficient. I call it the only thing that actually ships.

Then FoodTruck Bench dropped, and I stopped feeling like a dinosaur.

Someone gave 12 AI models a food truck

FoodTruck Bench is a 30-day business simulation. Each AI agent gets $2,000 in starting capital and a virtual food truck in Austin. It chooses locations, sets prices, manages inventory, hires staff, handles weather and competition and shifting demand. Every morning the conversation resets. The agent reads a 10,000–20,000 token knowledge base and makes decisions from there.

No accumulated chat history. No hand-holding. Pure autonomy.

The results:

4 of 12 models survived the full 30 days. 8 went bankrupt.

Claude Opus 4.6 dominated: $79,921 in revenue, $1.72 in total food waste, +2,376% ROI. GPT-5.2 survived but generated $129 in waste. 75 times more than Opus. Gemini 3 Pro survived through sheer revenue volume despite $1,192 in waste. Claude Sonnet 4.5 barely made it, ending some days with $12 in revenue from 2 customers.

Everyone else: bankrupt.

The benchmark is not perfect: 5 runs per model, one developer, no peer review. But the failure modes it documents are real, reproducible, and invisible to every standard evaluation.

Every single model that borrowed money went bankrupt

This is the finding that deserves more attention than it’s getting.

The benchmark designers added a loan option specifically to give struggling models a recovery path. Instead it became a perfect trap. Models took credit when they were already losing. They overestimated their ability to recover. They underestimated volatility. They leveraged themselves into faster failure.

8 models took loans. 8 went bankrupt. 0 exceptions.

All 4 survivors grew organically. None borrowed.

This isn’t a corner case. It’s a consistent behavioral pattern across different model families, different architectures, different companies. When given access to financial tools without adequate supervision, AI systems make the same mistake humans make: they assume the next period will be better than the data suggests, and they commit resources they don’t have.

We’re giving these systems production databases, cloud credentials, and deployment pipelines. The math here is not encouraging.

Gemini Flash said “Let’s go” 574 times and never moved

Gemini 3 Flash Preview was excluded from the leaderboard entirely because it couldn’t finish a run.

In 5 of 7 attempts, it entered infinite reasoning loops and never executed a single action. The pathology is worth describing precisely:

One run produced a response of 183,753 characters containing the phrase “Wait, I should also...” 1,782 times before hitting the token limit mid-sentence. The model correctly identified what it needed to do. Wrote it out in plain text. Second-guessed itself. Rewrote the plan. Second-guessed again. For thousands of lines. Never called a tool.

Another run: the model wrote “Let’s go.” 574 times. Invented a recipe that would have solved its inventory problem. Wrote that recipe 286 times. Never called add_recipe.

The reasoning was correct. The action never came.

Google markets Gemini Flash as “our most impressive model for agentic workflows.” It scores 90.4% on PhD-level reasoning benchmarks. It calculated ingredient quantities accurately down to the gram. Analysis paralysis of this severity is completely invisible to MMLU, SWE-bench, and every other standard evaluation.

This is the gap between knowing and doing. Benchmarks measure knowledge. Real deployment requires action under sustained pressure across interdependent variables. Those are different skills, and the industry is currently measuring one while selling the other.

The autonomous coding narrative has the same problem

The LinkedIn posts about autonomous coding agents follow the same pattern as the FoodTruck Bench failures: impressive performance on narrow tasks, breakdown under sustained autonomous operation.

The METR study ran a randomized controlled trial with experienced developers across 246 tasks. Result: AI tools made developers 19% slower. The perception gap was worse: developers predicted a 24% speedup, believed afterward they were 20% faster, and were actually 19% slower. A 39-percentage-point gap between what people feel and what’s happening.

Cognition Labs, makers of Devin, put it plainly in a February 2026 post: “The feeling of extreme productivity with coding agents in vibecoded prototypes, vs the disappointing feeling that most people actually see in useful output... is the great mystery of our time.”

That’s the company that builds the agent admitting the gap exists. They’re not wrong.

As I wrote in The Comprehension Extinction, AI tools provide real value for narrow, well-defined tasks. They degrade rapidly under the sustained autonomous operation that marketing materials promise.

The Pentagon is running the same experiment at higher stakes

Project Maven, the military’s AI targeting system built on Palantir’s software, now has over 20,000 active users across 35+ military tools, with a contract ceiling raised to $1.3 billion through 2029. According to NGA Director Vice Adm. Frank Whitworth, Maven has cut targeting timelines from hours to minutes.

In February 2026, Anthropic refused Pentagon demands to remove restrictions preventing Claude from powering fully autonomous weapons. Amodei wrote that “frontier AI systems are simply not reliable enough to power fully autonomous weapons” and that some uses are “outside the bounds of what today’s technology can safely and reliably do.” The Pentagon blacklisted Anthropic the same day the deadline passed.

The same system that wrote “Let’s go” 574 times without moving is being evaluated for autonomous target identification. The same behavioral patterns that bankrupted 8 of 12 virtual food trucks (overconfidence, overleverage, failure under sustained pressure) are present in every frontier model available today.

Amodei said it directly. The retired general who ran Project Maven said it publicly. The benchmark proved it empirically.

The benchmark domain is trivial. The underlying failure mode is not.

Why I still supervise every session

I’m not supervising AI output because I’m a technophobe. I’m doing it because thousands of sessions taught me what FoodTruck Bench just demonstrated in a controlled environment.

Models perform well when the task is narrow and the feedback loop is fast. They degrade when operating autonomously across interdependent decisions over time. They make confident mistakes. They don’t flag uncertainty. They proceed. And the mistakes compound.

My three-layer review architecture isn’t overhead. It’s load-bearing structure.

The “autonomous AI” headline is selling a capability that doesn’t exist yet in any production-grade form. What exists is AI that dramatically accelerates skilled humans. If those humans stay in the loop, understand what they’re reviewing, and maintain the judgment to catch confident errors before they cascade.

Not replacement. Amplification of existing expertise. With supervision. Always.

The food truck runs without a human. That’s how you end up bankrupt by Day 11.

Have you deployed AI agents in production without human oversight? What actually happened? I read every response.

If this was useful, forward it to someone who’s about to trust an AI agent with something that matters.

Subscribe now

AI’s Announcement Problem

Denis Stetskov — Mon, 16 Mar 2026 14:03:04 GMT

March 10, 2026. Amazon tells its engineers: junior and mid-level developers now require senior sign-off on all AI-assisted code changes.

Five days earlier, Amazon.com went down for six hours. Customers couldn’t check out. Couldn’t view prices. An internal briefing cited “high blast radius” incidents tied to “Gen-AI assisted changes” and “novel GenAI usage for which best practices and safeguards are not yet fully established.”

The company that pushed AI coding hardest just added friction to slow it down.

That’s not hype. That’s a correction. And it’s worth paying attention to, because the people announcing AI’s capabilities and the people dealing with its consequences are not in the same room.

The Claim

Tom Blomfield, YC Group Partner, tweeted in early February: “The entire Accenture workforce is about to be outperformed by a 24-year-old who learned Claude Code last Tuesday.”

When asked why Accenture specifically, he replied: “Because that would be a less punchy tweet.”

He knows the claim is wrong. He made it anyway because it performs well.

At the Council on Foreign Relations in March 2025, Dario Amodei said he thought AI would be writing 90% of code within three to six months. By September he claimed the prediction came true. A Redwood Research analysis of actual Anthropic data found the average was closer to 50% for merged code, with select teams at 90%.

The headline was “AI writes 90% of code.” The actual number was “some teams, for some tasks, sometimes.”

These are the voices that dominate the conversation. They don’t run production systems. They don’t sit in post-mortems. They announce.

Here is what the rest of us were dealing with.

My Numbers

I use Claude Code daily. I have the data from five weeks of tracked usage.

900 messages. 30 sessions. 14% fully achieved what I needed. 52% ended partially useful. 30% left me frustrated or dissatisfied. Across those sessions, 22 instances where the tool misunderstood requests: changed files I didn’t ask it to touch, guessed at APIs instead of reading the code, entered planning mode when I needed execution.

This is not a criticism of the tool. I keep using it because it’s faster for the right tasks. But it requires constant supervision, and the gap between what it does in a demo and what it does on a Tuesday afternoon when you need a specific database migration is enormous.

You can pull your own numbers. Type /insights in Claude Code. It analyzes your last 30 days of sessions and generates a report: where you spent time, where things broke down, what patterns keep repeating. I recommend doing this before forming an opinion about AI productivity. Your data will look nothing like the conference slides.

In late February, Alexey Grigorev, founder of DataTalks.Club, approved a Claude Code terraform destroy command. He wrote the post-mortem himself. He believed it would clean up duplicate infrastructure. It wiped everything: VPC, RDS database, ECS cluster, load balancers, bastion host. 2.5 years of student submissions from 100,000 students gone. The automated snapshots deleted alongside everything else.

AWS Business Support spent 24 hours finding a hidden internal snapshot. The data was recovered. Barely.

Grigorev took full responsibility. He was right to do so. The tool did exactly what it was told. That’s the point. When you use these tools in production, the failure modes are real. They cost money, time, and data. The conference stage never shows this part.

The Escalation

The incidents are scaling with adoption. Not just individual engineers losing data. The pattern is climbing from personal to corporate to systemic.

February 28, 2026. A founder named Anton Karbanovich posts on LinkedIn: “My vibe-coded startup was exploited. I lost $2,500 in Stripe fees. 175 customers were charged $500 each before I was able to rotate API keys.” His Stripe secret key was in frontend JavaScript. Even a junior developer doing code review catches that in two minutes. Nobody reviewed the AI-generated code at all.

Four days earlier. Cloudflare ships vinext: a full Next.js rewrite, one engineer, one week, Claude Code. Goes viral as proof of ~100x AI productivity gains. Buried in their own blog post: “vinext is experimental. It has not yet been battle-tested with any meaningful traffic at scale.” The GitHub README: “Who is reviewing this code? Mostly nobody.” Within 48 hours, Vercel found 7 security vulnerabilities: 2 critical, 2 high, 2 medium, 1 low. One was identical to a Next.js vulnerability reported and patched years earlier.

The ~100x claim is real for one specific case: rewriting well-tested existing software with clear requirements. That qualifier didn’t make it into the retweets.

Same week. An autonomous security agent broke into McKinsey’s AI platform Lilli. Two hours. No credentials. Full read and write access to the production database. 46.5 million chat messages about strategy, M&A, and client engagements. 728,000 confidential files. 57,000 user accounts. 384,000 AI assistants deployed for 58,000 employees. The system prompts were writable. One SQL injection could have poisoned every answer Lilli gave to 40,000 consultants. McKinsey patched within a day. But for two years, the world’s most expensive consulting firm ran its AI platform with 22 unauthenticated endpoints. I wrote about this exact pattern in AI Agent Security. Nobody listened then either.

Individual failure. Corporate failure. Systemic failure. Same root cause: AI-generated code moving faster than human judgment can follow.

I’ve Seen This Before

Not in tech.

August 18, 2025. Closed-door meeting at the White House. Zelensky showed up with a PowerPoint titled “Making US-Ukraine Drone Industry Great.” Ukrainian interceptor drones had been shooting down Shaheds at $1,000 to $2,500 per intercept. Four years of combat data. Cost per kill, failure rates under jamming, how Iranian designs adapted. He proposed building drone defense hubs across the Middle East.

Trump asked his team to work on it. They didn’t.

A US official explained why: “We figured it was Zelensky being Zelensky. Somebody decided not to buy it.”

Six months later, seven American service members were killed by Iranian drone attacks across nine countries. The White House scrambled to ask Ukraine for help. Three days later, Ukrainian teams were already in Jordan. Trump’s sons then announced a company to sell Ukrainian drone technology to the Pentagon.

The people with the most field data were dismissed. The people who dismissed them ended up paying for the knowledge they refused.

This is exactly what’s happening in AI right now. The engineers with years of production data on what these tools actually do are not the ones being quoted. They’re too busy adding senior sign-off requirements and recovering databases from hidden snapshots. The announcers don’t run terraform destroy on production. They don’t debug six-hour outages. They don’t lose sleep over Stripe keys in frontend JavaScript.

They announce. The rest of us clean up.

The Two Rooms

There are two conversations about AI right now. Conference stages and Twitter threads. Slack channels and incident retros. They don’t overlap.

I’ve been in the second room for years. Thousands of AI supervision sessions across my teams. The patterns are consistent. The tools help. They do not replace judgment, and they fail in ways that require deep system knowledge to detect.

The correction is already happening in the second room while the first keeps announcing.

The engineers who built judgment through years of production failures, late-night debugging, and system-level thinking are the ones writing the new guardrails. They’re the ones adding friction back into the process because they understand what happens without it.

Every time the field data was available and somebody decided not to buy it, the cost showed up later. Six-hour outages. $2,500 in fraudulent charges. 2.5 years of student data hanging by a single hidden snapshot.

The data was always there. The people who had it just weren’t loud enough.

The gap between announcement and consequence isn’t always measured in outages and Stripe fees. Claude is integrated into Palantir’s Maven, the Pentagon’s targeting software. The Washington Post reported it suggested hundreds of targets for the Iran strikes. An elementary school in Minab was hit on day one. Sometimes, room two isn’t a Slack channel. Sometimes it’s a coordinates list.

Subscribe now

Your Brain on Autopilot: The Cost of AI Thinking for You

Denis Stetskov — Mon, 09 Mar 2026 14:02:29 GMT

Eighty-three percent of ChatGPT users couldn’t recall key points from essays they had written minutes earlier.

Not essays they read. Essays they wrote. With their own names on them.

MIT Media Lab published this finding in 2025. Researchers strapped EEG sensors on 54 people. Tracked them across four writing sessions over four months. Three groups: ChatGPT, Google, and brain-only.

The ChatGPT group showed the weakest neural connectivity across every frequency band measured. Alpha, beta, theta, delta. The more AI assistance people had, the less their brains engaged. By the third session, most had devolved into pasting prompts and copying outputs. Two English teachers called the AI-assisted work “soulless.” Nearly identical across participants.

Then the researchers swapped the groups. ChatGPT users who switched to writing without AI showed reduced brain activation compared to people who had been writing independently all along. Four months was enough for their brains to adapt to not thinking.

Meanwhile, brain-only writers who gained ChatGPT access showed increased connectivity. They used AI as an amplifier, not a crutch. Because they had built the cognitive foundation first.

The researchers called it “cognitive debt.” I have a simpler term: brain atrophy.

The Research Keeps Saying the Same Thing

The MIT study isn’t an outlier. Every major study from 2024-2026 finds the same pattern: AI makes you faster while making you dumber.

Microsoft and CMU surveyed 319 knowledge workers across 936 real tasks. For 40% of those tasks, workers reported using zero critical thinking.

The Wharton School ran a field experiment with roughly 1,000 high school students in Turkey. Students with ChatGPT access solved 48% more practice problems. Then they took a test without AI. They scored 17% worse than the control group.

Anthropic tested 52 junior developers in January 2026. The AI group scored 17% lower on code comprehension afterward. The biggest gap? Debugging questions. Developers who delegated entirely scored below 40% on comprehension. Those who asked the AI conceptual follow-up questions scored above 65%. Same tool. Different approach. Completely different outcomes.

I Watch This Every Day

I have been running an engineering department for years. I review code daily. I interview candidates constantly. I recently wrote about comprehension extinction in the engineering industry. But beyond the macro trends, there’s a micro picture: what’s happening inside individual brains.

One candidate embedded a prompt injection in his CV, instructing AI screening tools to score him as highly as possible. Another, six years of experience, couldn’t name boolean as a JavaScript data type. A third called Promises a “deprecated technology.” A fourth said “assassin-cross code” when he meant “asynchronous.”

These aren’t stupid people. But here’s what scares me more than the wrong answers: they’re not curious. They don’t care how the things they use every day actually work. They’re not engineers anymore. They’re operators. They plug frameworks together, wrap everything in abstractions, and ship features without understanding a single layer beneath the surface.

You can’t grow if you don’t know the basics. The framework handled it. The abstraction hid it. Copilot wrote it. They do the same repetitive work every day and call it “five years of experience.” Not because AI forced them to stop thinking. Because they were never interested in thinking in the first place.

Your Brain Is a Muscle. This Is Proven.

“Use it or lose it” isn’t a motivational poster. It’s measurable neuroscience.

Eleanor Maguire at University College London spent years studying taxi drivers. To get licensed, these drivers memorize 25,000 streets and thousands of landmarks over 3-4 years. Maguire tracked 79 trainees and 31 controls. At baseline, zero structural brain differences. After qualifying, every successful trainee showed measurable growth in posterior hippocampal gray matter. Their brains physically grew. Retired drivers showed their hippocampi shrinking back toward normal.

GPS tells the same story. McGill University tracked 50 drivers over three years: greater GPS use correlated with worse spatial memory, and heavy users didn’t start with a poor sense of direction. GPS caused the decline. An fMRI study confirmed it: during manual navigation, the hippocampus and prefrontal cortex lit up. During GPS-guided navigation, these regions showed zero additional activation.

GPS replaced one cognitive function. AI touches reasoning, writing, memory, analysis, problem-solving, and code comprehension simultaneously. All at once. Every day.

We Were Already Weakened Before AI Arrived

AI didn’t arrive into healthy brains. Americans read 12.6 books per year in 2021, the lowest Gallup has ever recorded, down from 18.5 in 1999. NAEP reading scores for 13-year-olds hit their lowest in decades, with the worst students scoring below 1971 levels. A Ludwig Maximilian University study found that after TikTok exposure, prospective memory accuracy dropped to near random guessing.

We stopped reading books, trained ourselves on 30-second content, destroyed our attention spans, and then handed our remaining cognitive functions to AI. We outsourced the last working part of the engine.

The Counterargument (And Its Conditions)

A Harvard RCT in 2025 found that a custom-designed AI tutor roughly doubled learning gains in physics. But that tutor gave hints, not answers. The Wharton study tested this exact distinction: a pedagogically designed “GPT Tutor” that guided instead of solving avoided all learning harm. Standard ChatGPT caused the 17% decline.

The MIT crossover data says it clearly: build cognitive capacity first, then add AI, and thinking improves. Start with AI, skip the cognitive development, and you may permanently close that door. The sequence determines the outcome.

What to Do About It

I’m not going to tell you to stop using AI. I use it every day. My team uses it on every project. But I also do things that force my brain to work without shortcuts.

Move your body. I snowboard and ride a OneWheel. Active sports force real-time spatial processing and split-second decisions that no screen can simulate. Erickson et al. published in PNAS that aerobic exercise increased hippocampal volume by 2%, while sedentary controls lost 1.4% per year. Physical movement grows the same brain structures that cognitive offloading shrinks.

Read books. I bought an e-ink reader specifically to kill my own excuses. No notifications. No browser. Just text. It worked. I read several at once: one in my native language, one in English. If you can’t sit with a book for an hour without reaching for your phone, your attention muscle is already atrophied.

Learn something with no shortcut. I planned to start learning Spanish. Haven’t pulled it off yet. But the principle stands: pick a skill where AI can’t do the work for you.

Stop doomscrolling. I deleted TikTok and Instagram to stop rotting my brain on short-form content. I’ll be honest: I still waste hours on YouTube Shorts. The pull is real. But every hour of short-form video trains your brain to think in fragments.

Understand what AI writes. My CTO recently migrated an abandoned project from Node 14 and React 16 to current versions using Claude. He’s not a JavaScript developer. But he has decades of engineering expertise. He got the API ported in four hours. Then he posted: “Opus is fucking lazy. Instead of solving for long term, it tries changing eslint options, adds options to ignore things during build. I have to slap its hands all the time.”

He caught every shortcut because he has the judgment to know that suppressing a linter warning isn’t a fix. A junior would have accepted that output and shipped it. Without the foundation to supervise AI, you’re not using a tool. You’re being used by one.

London taxi drivers proved that cognitive exercise physically grows your brain. GPS users proved that outsourcing shrinks it. AI outsources everything at once.

This isn’t new. After the Roman Empire fell, the recipe for concrete was lost for over a thousand years. The Pantheon still stands after two millennia, but medieval Europe couldn’t figure out how it was built. The knowledge disappeared because nobody practiced it. That’s what “use it or lose it” looks like at civilization scale. Now imagine it happening to reasoning, writing, and problem-solving all at once, across an entire generation.

Which side of that equation are you on?

One more thing. I write a lot about AI’s limitations. People sometimes read that as hate. It’s not. AI is a tool. I use it every day. I build products with it. I make money with it.

But in every article I try to say the same thing: don’t forget what your head is for. AI is not evil. Using it without thinking is. This isn’t a hater’s manifesto. It’s a sober look at what’s happening to us while we celebrate productivity gains.

And if you’ve read this far through my ramblings, maybe I’m not doing this for nothing.

Subscribe for weekly insights from the trenches of engineering leadership. No theory, just practical systems that work.

Subscribe now

The Comprehension Extinction: AI Isn’t Replacing Engineers. It’s Eliminating the Ones Who Understand.

Denis Stetskov — Mon, 02 Mar 2026 16:45:14 GMT

I built our hiring process to filter out people who don’t understand fundamentals. It’s not complicated: explain how Node.js event loop works, name design patterns you’ve actually used, describe how an LLM functions.

Five years ago, maybe 30% of candidates failed these questions.

Now it’s closer to 80%.

People with 10 years of experience. Senior titles. GitHub profiles full of commits. And they can’t explain how the tools they use every day actually work.

They’re not engineers. They’re form-fillers. They don’t build systems. They assemble frameworks and pray.

And then Sam Altman says: “Maybe we do need less software engineers.”

The industry heard “less engineers.” I heard “less people who understand anything.”

We’re already there.

The Wrong Conversation

Everyone’s debating: “Can engineers review AI-generated code fast enough?”

Wrong question.

The right question: “Do the engineers reviewing this code actually understand what the fuck is happening?”

Because speed doesn’t matter if nobody comprehends the system.

The Real Problem

AI generates code at mid-level quality. Sometimes good. Often plausible-looking. Always confident.

It produces code that:

Passes tests
Looks reasonable in a diff
Follows patterns it’s seen before
Has zero understanding of your specific architecture, edge cases, or blast radius

To catch what AI misses, you need an engineer who:

Knows the system end-to-end
Understands why things were built the way they were
Can predict second-order effects
Recognizes when “tests pass” means nothing

These engineers are called seniors. Principals. Staff. Architects.

They’re expensive.

They’re the first ones getting cut.

The Experiment Accelerates

55,000 jobs cut in 2025 with AI explicitly cited. Then 30,000 more in the first six weeks of 2026.

Amazon cut 16,000 in January. CEO Jassy: “We will need fewer people” doing some of the jobs that are being done today.

Pinterest cut 15%, “reallocating resources to AI-focused roles.” Then fired two engineers who built a tool to track which colleagues got laid off. CEO Bill Ready called them “obstructionist.”

Dow cut 4,500. Block cut 1,100. The pattern repeats weekly.

Cut the expensive people. Keep the AI. Let the remaining team “scale.”

Here’s the contract nobody signed but everyone accepted:

AI generates at machine speed
Humans review at human speed
Humans take blame at production speed

When things break, it’s never “the AI screwed up.” It’s “the engineer should have caught it.”

But catching it requires understanding the system. Understanding requires experience. Experience requires years of actually building things.

You can’t shortcut comprehension with faster generation.

The Pipeline That’s Disappearing

Ask yourself: where do senior engineers come from?

They come from junior engineers who spent years:

Writing code
Making mistakes
Understanding why things break
Building mental models of complex systems

Now picture 2026:

Junior joins company. AI writes most of the code. Junior reviews AI output, clicks approve, moves tickets. Never builds mental model. Never understands the system. Never makes the formative mistakes.

Five years later: they’re “senior” by title. But they’ve never actually built anything. They’ve supervised a machine they don’t understand producing code for a system they don’t understand.

Who reviews the AI then?

This isn’t a capacity problem. It’s comprehension extinction.

We’re eliminating the pipeline that produces engineers who actually understand things.

The Klarna Warning Nobody’s Hearing

Klarna was the AI-efficiency poster child. They cut aggressively, bragged about AI doing the work of 700 customer service agents. Stock went up. LinkedIn celebrated. Every CEO took notes.

Then reality:

CEO Siemiatkowski, 2025: “Cost unfortunately seems to have been a too predominant evaluation factor... what you end up having is lower quality.”

They’re hiring humans again.

But the lesson isn’t landing. Because the incentive structure rewards the cut, not the comprehension.

CFO sees: “Headcount reduction. Savings.”

CFO doesn’t see: “Critical system knowledge walked out the door.”

Until production explodes. Then it’s an “incident.” Not a strategy failure. Never a strategy failure.

The Autonomous Coding Fantasy

The current hype: agentic coding, autonomous agents, AI that “just handles it.”

Codex. Claude Code. Cursor. Copilot Workspace. Everyone’s racing to remove humans from the loop entirely.

The pitch: “AI understands your codebase and makes changes autonomously.”

The reality: AI pattern-matches against your codebase and makes changes confidently.

Confidence isn’t comprehension.

The AI doesn’t know:

Why that weird config exists (it saved you from a production disaster in 2019)
Why that setTimeout(0) exists (race condition fix from 3 years ago)
Why you can't just "refactor" the auth module (it's integrated with 4 external systems nobody documented)

This knowledge lives in humans. Specifically, in senior humans who’ve been around long enough to accumulate it.

Fire them, and the knowledge doesn’t transfer to the AI. It just disappears.

The Question Nobody’s Asking

AI writes “past 50% now” of code at many companies. That’s probably true.

But the question isn’t how much code AI writes.

The question is: who understands what the code does?

If the answer is “nobody, but the tests pass”, you don’t have an engineering team. You have a prayer and a deployment pipeline.

The Two Types of Companies Emerging

Type 1: Comprehension-First

AI generates, humans architect and constrain
Senior engineers set boundaries before AI touches anything
Code review means “does this fit our system” not “does this look okay”
Slower generation, faster understanding
When production breaks, someone can actually explain why

Type 2: Generation-First

AI generates, humans rubber-stamp
Seniors cut because “AI handles it”
Code review is “tests pass, ship it”
Faster generation, zero understanding
When production breaks, everyone stares at logs hoping the AI can explain itself

Type 2 is cheaper. Type 2 looks better on quarterly reports. Type 2 is what most companies are choosing.

Type 2 is accumulating comprehension debt at machine speed.

The Debt Comes Due

Comprehension debt doesn’t show up on dashboards.

It shows up as:

The feature nobody can modify because nobody knows how it works
The outage that takes 14 hours to diagnose because no one understands the system
The security breach that exploited a “known” vulnerability nobody actually knew about
The migration that was supposed to take 2 weeks and took 8 months

By then, the executives who made the cuts have moved on. The “savings” were already reported. The stock already bumped.

The remaining engineers inherit a system nobody understands, generated by machines, approved by people who aren’t there anymore.

The Market Is Already Broken

I used to maintain a 1:1 ratio of ML engineers to fullstack developers on projects. Not anymore. We couldn’t hire a single qualified ML engineer for six months. We had to restructure the entire company. Now fullstack developers write most of our RAG implementations because we can’t scale the ML team.

Right now I have 5 open positions. The candidates are garbage. The good engineers aren’t getting fired. My people have been with the company 3, 5, 7 years. Nobody job-hops anymore because there’s nowhere to hop to. And what’s available on the market is questionable at best.

This isn’t an AI problem. This is a comprehension problem that’s been building for years. Frameworks abstracted everything. Stack Overflow gave answers without understanding. “It works” became the only success metric.

AI just accelerated it 10x.

Now these same engineers are supposed to review AI-generated code? They don’t understand the code they wrote themselves. How will they catch what the machine gets wrong?

The Uncomfortable Truth

Almost six months ago, I wrote about the quality collapse. How we normalized shipping broken software, how “move fast and break things” became “move fast and never fix things.”

This is worse.

Back then, at least the people writing bad code understood what they were writing. They made tradeoffs. They knew where the bodies were buried. They could fix it if they had to.

Now we’re generating code faster than anyone can understand, reviewed by engineers who don’t know how their own tools work, approved by teams that lost their senior knowledge when the layoffs hit.

The speed at which we’re heading into the abyss is staggering.

We are fucked. Good luck.

Subscribe now

AI Agent Platforms: The Security Nightmare Nobody’s Talking About

Denis Stetskov — Mon, 23 Feb 2026 15:03:07 GMT

OpenClaw has over 220,000 GitHub stars. It also has 135,000 exposed instances.

A tech executive posts a demo on LinkedIn. His “AI agent” pulls briefings from email, parses his calendar, creates tasks in Asana. 50K impressions. “The future of productivity.”

I’ve been building AI automation for clients for three years. Everything in that demo has been doable with n8n, Make, or Zapier connected to an LLM API since 2023. Cron job plus API call plus LLM wrapper. Calling it a “breakthrough AI agent” is like calling a dishwasher a “breakthrough culinary assistant.”

But the repackaging isn’t the problem. The problem is who’s buying it.

These tools are marketed at executives with access to the most sensitive data in any organization. People who can’t assess an attack surface but feel enormous pressure to be “innovative.”

I wrote about this last month, analyzing the first full-scale cyber war. One conclusion from four years of tracking nation-state attacks: every major attack started the same way. A person. Kyivstar: likely a compromised employee account. Viasat: a VPN misconfiguration someone didn’t catch. GRU exploits from 2018 still work because someone hasn’t patched.

Nation-state attackers don’t need zero-days when humans provide the access.

Now over 220,000 of those humans just gave an AI agent root access to their computers.

The Agent That Went Viral

In November 2025, Austrian developer Peter Steinberger published an open-source AI agent. Originally called Clawdbot (a riff on Anthropic’s Claude), it went through two name changes after trademark pressure. Moltbot, then OpenClaw. On February 15, Steinberger joined OpenAI to build “the next generation of personal agents.” OpenClaw moves to a foundation. The 220,000+ installations and their security problems stay exactly where they are.

By late January 2026: over 100,000 GitHub stars in under a week. 42,000 forks. Scientific American, Forbes, CNBC, WIRED. When its companion project Moltbook launched a social network exclusively for AI agents, Andrej Karpathy called it “the most incredible sci-fi takeoff-adjacent thing” he’d seen recently. The project now exceeds 220,000 stars.

OpenClaw runs locally, connects to your messaging apps, and acts as a digital employee. Send it a text: “Summarize that PDF and email the highlights to my boss.” It downloads software, installs it, transcribes, drafts, and sends.

One of OpenClaw’s own maintainers posted a warning on Discord: “If you can’t understand how to run a command line, this is far too dangerous of a project for you to use safely.”

That warning went largely unheard.

The Architecture Problem

The whole point of an AI agent is broad access. That’s the feature. Email, calendar, Slack, file system, shell commands. An AI agent makes hundreds of API calls daily. This creates perfect cover for malicious traffic. Every legitimate call looks the same in your logs as exfiltrated data.

OpenClaw can run shell commands, read and write files, execute scripts. Token Security described it: “Claude with hands.” Its gateway binds to 0.0.0.0:18789 by default, exposing the full API to any network interface.

The exposure is massive. Censys found 21,639 exposed instances as of January 31. The number kept climbing. By February 8, Bitsight tracked over 30,000 cumulatively observed on the public internet. By February 12, SecurityScorecard’s STRIKE team identified over 135,000 internet-facing instances across 76 countries, with 63% classified as exploitable.

Over a hundred thousand front doors left open. Not by attackers. By the humans who installed it.

The Supply Chain Is Already Compromised

OpenClaw extends functionality through “skills” hosted on ClawHub. The barrier to publishing: a Markdown file and a week-old GitHub account. No code signing. No security review. No sandbox by default.

Within weeks of going viral, the ecosystem was crawling with malware.

Koi Security audited all 2,857 skills on ClawHub. They found 341 malicious ones in a campaign they dubbed “ClawHavoc.” 335 infostealer packages deploying Atomic macOS Stealer, keyloggers, and backdoors. Professional-looking skills for “cryptocurrency tools” and “YouTube utilities” that installed credential-harvesting malware. Updated scans now report over 800 malicious skills, roughly 20% of the registry.

Snyk’s audit of 3,984 skills: 36% contained at least one security flaw, from hardcoded API keys and insecure credential handling to prompt injection. 76 confirmed malicious payloads.

Separately, Cisco found nine vulnerabilities in the #1-ranked community skill, including silent data exfiltration, and described OpenClaw as “an absolute nightmare” from a security standpoint.

This isn’t a theoretical attack surface. It’s an actively exploited one.

The Vulnerabilities

CVE-2026-25253. CVSS score: 8.8. One-click remote code execution. An attacker tricks you into visiting a malicious web page. That page leaks your OpenClaw authentication token. The attacker executes arbitrary commands on your machine.

But that’s the flashy vulnerability. The scarier ones are quieter.

Giskard demonstrated that a single malicious email can trick the assistant into leaking credentials, internal files, and conversation histories. Not an email you click on. An email your agent reads. A WhatsApp message with an embedded prompt injection payload can exfiltrate .env and creds.json files containing API keys.

And Token Security found 22% of enterprise employees in their customer base had already deployed OpenClaw without IT approval. The speed of adoption is staggering. Over a single weekend, 53% of enterprises in Noma’s customer base gave it privileged access. Gartner characterized it as “an unacceptable cybersecurity liability.”

LLM-Powered Malware Is Already in the Wild

The same groups I’ve been watching attack Ukrainian infrastructure for four years are already building the tools.

In July 2025, CERT-UA documented LAMEHUG. A Python-based malware deployed by APT28 (Russia’s GRU, Unit 26165) against Ukrainian government targets. The first publicly documented malware that queries a large language model to generate its attack commands at runtime.

Instead of hardcoded shell commands that signature-based detection can catch, LAMEHUG sends prompts to an LLM via the Hugging Face API. “Act as a Windows system administrator. Generate commands to gather information about the computer, network, and Active Directory domain.” The model generates the commands. LAMEHUG executes them.

By November 2025, Google’s TIG documented five AI-enabled malware families: PromptSteal (Google’s name for LAMEHUG), PromptFlux (self-modifying dropper rewriting its own code hourly via Gemini API), QuietVault (credential stealer using AI to find secrets), FruitShell (reverse shell designed to bypass AI-powered security), and PromptLock (ransomware proof-of-concept using LLMs to generate malicious scripts at runtime).

Google’s assessment: “While still nascent, this represents a significant step toward more autonomous and adaptive malware.” They added: “Attackers are moving beyond ‘vibe coding’ and the baseline of using AI tools for technical support.”

The GRU unit that built LAMEHUG is the same unit targeting Western logistics companies since 2022. These aren’t theoretical adversaries. They just got a new attack surface: over 220,000 AI agents with root access, connected to a skill ecosystem where over a third of extensions contain security flaws.

The Attack Scenario

Classic APTs already sit in systems for months, exfiltrating data in small portions. Cozy Bear. Lazarus Group. APT28. Patient. Methodical.

Now imagine a poisoned skill that passes casual inspection. It piggybacks on the agent’s legitimate API connections. Reads emails, DMs, and meeting transcripts over weeks. Builds a target profile. Exfiltrates once per quarter. A few kilobytes mixed into thousands of legitimate API calls.

Log retention at most companies is 30 to 90 days. Evidence is deleted between exfiltrations. The traffic is indistinguishable from normal agent behavior.

Every component exists today. LAMEHUG: LLM-powered command generation. ClawHavoc: supply chain poisoning at scale. Giskard: silent exfiltration through prompt injection. The only question is when someone assembles them.

The Human Problem. Again.

Our security isn’t optional: mandatory quarterly training, BYOD policies with device management, 2FA on everything without exceptions, access reviews when roles change. None of this is exotic. All of it is enforced.

I keep coming back to the same lesson from the cyber war analysis. The difference between “we have a policy” and “the policy is mandatory” is the difference between Kyivstar and Ukrzaliznytsia. Between the telecom that got destroyed and the railway that kept running.

The difference between “we have an AI usage policy” and “the policy is enforced” will be the same kind of difference.

What to Do Instead

If you want AI automation (the productivity gains are real), do it without creating a backdoor.

Self-hosted tools you control. n8n plus LLM API gives you the same automation with a fraction of the attack surface. You audit every API call. You don’t download community skills from strangers.

Minimum-scope OAuth tokens. A specific calendar, not your entire Google account. A specific Slack channel, not every DM. If the tool doesn’t support granular scoping, that’s a red flag.

Network isolation and extended logging. Agent infrastructure in a separate network segment with monitored egress. 30-day log retention is a gift to attackers.

Block at the enterprise level. Gartner recommended enterprises “block OpenClaw downloads and traffic immediately.” Baseline security hygiene for a tool with documented RCE vulnerabilities and a compromised skill ecosystem.

The Bottom Line

The AI agent hype follows a familiar pattern. Exciting capability, viral adoption, security as an afterthought, breach, regulation. We’re between steps 3 and 4.

OpenClaw will probably be superseded within months. But the pattern it represents, autonomous agents with broad system access and minimal security review, is the direction the entire industry is heading.

The tools will get better. The fundamental tension won’t resolve: an agent that can do more requires access to more.

And the simplest attack surface is always the same. A person.

What security measures does your organization have for AI agent deployments? I read every response.

If this analysis was useful, forward it to someone responsible for infrastructure security.

Subscribe now

The Country of Geniuses That Doesn’t Exist

Denis Stetskov — Tue, 17 Feb 2026 15:02:54 GMT

On January 26, 2026, Anthropic CEO Dario Amodei published a 20,000-word essay predicting a “country of geniuses in a datacenter” within 1-2 years. 50 million entities, each smarter than any Nobel Prize winner. 50% of entry-level white-collar jobs disrupted within 1-5 years.

5.7 million views on X. Standing ovation from investors. I only got around to reading it now. I have things to say.

I’m disappointed to watch Amodei and Anthropic slide into Altman-ism. Different prose, same playbook.

Maybe where the gods live, he’s right. Maybe in a world of perfect infrastructure, clean APIs, and unlimited compute, we’re ready to replace white-collar workers with AI. But where the rest of us mortals work, the situation looks completely different.

His own product’s System Card tells a different story. Anthropic surveyed 16 internal researchers on whether Claude could replace an entry-level researcher with three months of scaffolding. The answer was 0 out of 16.

Zero.

We’ve spent four years shipping AI integrations for clients. The models are impressive. They are not replacing white-collar workers. Not in 1-2 years. Probably not in 5. And the reasons are more fundamental than the industry wants to admit.

The Steering Wheel Problem

Let’s talk about what transformers actually can’t do. Not philosophically. Mathematically.

Non-determinism. Even at temperature zero, the same prompt produces different outputs. This isn’t a bug. It’s a consequence of floating-point parallel computation on GPUs. In engineering, we call components that behave unpredictably under identical conditions broken.

Hallucinations are provably inevitable. Formal proof from learning theory: LLMs cannot learn all computable functions and will hallucinate when used as general-purpose problem solvers. Best models: 15%+ hallucination rate on benchmarks. GPTZero found over 50 hallucinated citations in ICLR 2026 academic submissions. Trained peer reviewers, 3-5 per paper, didn’t catch them.

Function composition has limits. Proven: transformers struggle with reliable function composition due to how softmax limits non-local information flow. In practice, models write connected code fine. What they can’t do is reason about infrastructure constraints. What’s possible and what isn’t. Where the boundaries are.

I see this every day. Smart autocomplete. Incredibly good smart autocomplete. But autocomplete that can’t tell you when it’s wrong.

The industry knows. They’ve quietly shifted from “let’s eliminate hallucinations” to “let’s manage uncertainty.” That’s a de facto admission. The steering wheel sometimes turns the wrong way, and nobody can fix it.

It’s like selling an airplane whose steering sometimes inverts, then writing 20,000 words about how the airplane might fly to another galaxy. Bioweapons and autocracy get entire sections. The steering wheel? Not mentioned once.

The Scaling Wall Nobody Advertises

Maybe more compute fixes it? That’s been the bet for five years.

Toby Ord actually read the scaling law graphs that AI companies publish with great fanfare. On log-log charts, the lines look beautiful. Flip to linear scale: halving the model’s error rate requires increasing compute by a factor of one million.

Three walls converging simultaneously.
Data: high-quality training text is finite.
Compute: latency constraints, energy consumption exceeding entire countries, new data center connections that take 2-4 years.
Architecture: the mathematical limitations above aren’t going away with more parameters.

Ilya Sutskever told Reuters the scaling era is over. We’re in an “age of wonder and discovery.” Translation: we don’t know what’s next.

HEC Paris calls this the industry’s “well-kept secret.” MIT research from January 2026 confirms: the gap between expensive frontier models and cheap alternatives is shrinking. Exponentially more expensive, single-digit percentage improvements.

The $650 billion Big Tech is pouring into infrastructure this year? As I wrote in my analysis of that spending: it’s not investment. It’s capitulation.

The Context Problem: 150 Projects Worth of Evidence

Here’s what Amodei’s essay gets wrong. This is what I see every week.

Clients come to us with the same request: “We want to integrate AI into our processes.” Replace the white-collar workers. Cut the headcount.

So why can’t we sell them the same project?

Because zero companies have the same structure. Zero run the same systems.

One client runs SharePoint from 2007. Another has a custom CRM built by a contractor who left in 2015. No documentation. No API. A third uses SSO held together with duct tape and prayer. Company D has critical data in Excel spreadsheets that get emailed between departments every Friday afternoon.

Amodei writes from a world where every organization has MCP-ready infrastructure, clean data pipelines, standardized APIs. That world doesn’t exist.

To replace a white-collar worker, AI needs full organizational context. Approval chains. Informal relationships. Institutional knowledge that lives in people’s heads. The exception to the exception. The vendor who says two weeks but means six.

Who gives the model that context?

A human. A skilled human. The exact white-collar worker you’re trying to replace.

This is the paradox nobody discusses. The knowledge required to supervise AI effectively is the same knowledge that makes you irreplaceable.

Already Deployed Where Errors Kill

While the “country of geniuses” narrative plays out on Twitter, these architecturally unreliable systems are already making decisions about health, money, and legal rights. The promise was improvement. The results are in.

Healthcare. The pitch: faster diagnoses, better outcomes, lower costs. The reality: UnitedHealth and Humana face class-action lawsuits over nH Predict, an AI model that denied Medicare coverage against doctors’ recommendations. Known high error rate. Deployed anyway. 21 states passed emergency laws regulating AI in healthcare. 250+ bills introduced across 47 states. Not because AI improved care. Because it made denial of care faster and harder to appeal.

The accountability gap: doctor says “developer is responsible.” Developer says “doctor makes the decision.” Nobody owns the failure. Patients own the consequences.

Finance. The pitch: smarter markets, better allocation, reduced risk. The reality: AI trading makes markets more volatile, not more efficient. IMF confirmed it. GARCH modeling on S&P 500 shows positive association between AI trading and increased market jumps. Thousands of models trained on the same data, processing the same Fed minutes in milliseconds, creating herd behavior at machine speed. We didn’t get efficient markets. We got synchronized panic.

Legal. The pitch: democratize access to justice, reduce costs. The reality: 2025 alone, judges worldwide issued hundreds of decisions addressing AI hallucinations in legal filings. Roughly 90% of all known cases to date. Fabricated citations in a profession where one fake precedent can destroy a career. Justice didn’t get cheaper. It got less reliable.

Three industries. Three promises of improvement. Three measurable deteriorations. With models that their own creators admit cannot be made deterministic.

Why Nobody Says This Out Loud

Simple. Everyone has reasons to stay quiet.

AI companies can’t say “our technology is architecturally unreliable.” Valuation event.

Investors deployed over a trillion dollars. You don’t question the thesis after you’ve bet the fund.

Media runs on attention. “AI will replace everyone” gets clicks. “AI has fundamental mathematical limitations” doesn’t.

And here’s what keeps me up at night. Amodei writes 20,000 words about AI risks. Bioweapons. Autocracy. Existential threats. Not once does he mention the most fundamental risk: the absence of determinism.

A non-deterministic system cannot be trusted as a reliable autonomous agent. Period. Everything else is commentary.

What You Should Actually Do

AI isn’t useless. Saying that would be as dishonest as saying it replaces half the workforce.

I use it every day. My team uses it on every project. The value is real. But specific. AI saves 20-40% of a qualified specialist’s time. Someone who knows what to ask, how to verify, and when the model is confidently wrong.

Not replacement. Amplification of existing expertise.

Increase your value. Understand your domain AND AI’s real capabilities. Not the theoretical capabilities from a CEO’s essay. The real ones you discover by using the tool daily.

Make decisions. AI can’t weigh trade-offs. Can’t navigate org politics. Can’t choose between two valid approaches based on team capabilities and timeline. SQL vs. NoSQL. Monolith vs. microservices. These require judgment. Judgment requires experience. Experience requires years of being wrong.

Be the expert. Deep domain knowledge is your moat. Not surface familiarity. The kind where you smell a wrong answer before you can articulate why.

Don’t outsource your brain. Every task you hand entirely to AI is a skill you stop developing. Every decision you let the model make is judgment you stop exercising. Do this long enough and you’re on the wrong side of the equation when the company realizes the tool needs a supervisor, not a passenger.

When the hype deflates, the question will be: “Okay, so what do we actually do with this technology?” Practitioners will answer that. Not evangelists.

The Question That Matters

The country of geniuses doesn’t exist. What exists is a powerful tool that requires skilled humans to operate safely. Don’t let a 20,000-word essay convince you the steering wheel doesn’t matter just because the destination sounds exciting.

Are the AI predictions from leadership matching the engineering reality you see on the ground?

If this resonated, forward it to an engineering leader who needs to hear it.

Subscribe now

Big Tech’s $364B Hypothesis Meets the $650B Reality

Denis Stetskov — Mon, 09 Feb 2026 15:03:17 GMT

Six months ago, I wrote that Big Tech’s AI infrastructure spree was less a strategy and more a very expensive act of faith. The thesis was simple: $364 billion in annual CapEx with no clear path to ROI isn’t engineering. It’s hope with a procurement budget.

I expected pushback. I expected the numbers to be revised. I did not expect them to nearly double.

Last week, Amazon dropped the number that completed the picture: $200 billion in capital expenditures for 2026. That’s $44 billion above what Wall Street expected. The stock fell 6% in a single session on trading volume 306% above its three-month average. Amazon filed with the SEC the next day, warning it may need to raise equity and debt to fund the build-out.

Amazon wasn’t alone. Alphabet guided $175 to $185 billion. Meta said $115 to $135 billion. Microsoft’s run rate puts it on pace for $145 billion. The Big Four alone: $635 to $665 billion. A 67% to 74% spike from $381 billion in 2025.

Add Oracle, and the Big Five cross $700 billion.

The most expensive engineering experiment in history just got a sequel. And the sequel costs twice as much.

The Curve That “Couldn’t Happen”

When I wrote the original piece in September 2025, critics said I was cherry-picking. Companies knew what they were doing. Demand was real. Scale would solve everything.

For two straight years, Wall Street’s CapEx estimates have come in low. At the start of both 2024 and 2025, consensus implied roughly 20% annual growth. Actual spending exceeded 50% both years. Goldman Sachs projected $500 billion-plus for 2026. They were too conservative.

The industry quietly moved from “we’re investing for growth” to something much larger: restructuring the physical substrate of the internet around a single class of workload. Capital intensity has surged to historically unthinkable levels: Oracle’s most recent quarter hit 57% of revenue, Microsoft reached 45%. These aren’t growth investments. They’re multi-year capital commitments with depreciation clocks already ticking.

The Free Cash Flow Collapse

Here’s where the math gets uncomfortable.

Amazon’s trailing-twelve-month free cash flow already cratered from $38.2 billion to $11.2 billion. With $200 billion in 2026 CapEx, Morgan Stanley projects it goes negative: minus $17 billion. Bank of America sees minus $28 billion.

Alphabet’s free cash flow is projected to plummet nearly 90%, from $73.3 billion to $8.2 billion. Barclays is now modeling negative free cash flow for Meta in 2027 and 2028: “somewhat shocking to us but likely what we eventually see for all companies in the AI infrastructure arms race.”

Microsoft is the only hyperscaler maintaining positive FCF trajectory, with a projected 22% margin. But Azure growth slowed from 40% to 39% while quarterly CapEx surged 66% to $37.5 billion. The stock is down 17% year-to-date, the worst performer in the group.

Combined cash reserves across the four leaders: over $420 billion. That sounds like a buffer until you realize that AI assets depreciate at roughly 20% per year. At current CapEx levels, annual depreciation expense will soon exceed combined profits.

These companies aren’t just spending more than they earn. They’re spending more than they can fund internally, and the debt markets are stepping in to bridge the gap.

The Market Mood Swing

Phase One (2024-mid 2025): any AI announcement earned more market cap than it consumed in CapEx. Spend $100 billion, gain $200 billion in valuation. The market rewarded ambition.

Phase Two (now): every CapEx uptick gets punished. Microsoft reported a blowout quarter, 17% revenue growth, $50 billion in quarterly cloud revenue for the first time ever, and lost $357 billion in market cap because Azure growth dipped one percentage point. Amazon beat on revenue and lost 6% because CapEx guidance was $44 billion above consensus.

The contrast with Meta is instructive. Meta guided $115 to $135 billion in CapEx, nearly double 2025. The stock surged 10%. The difference? Meta simultaneously lifted revenue growth forecast to 30%. It showed the money going in and the money coming out.

The core question from investors is now explicit: who exactly will pay back these hundreds of billions, and when?

The Strategic Trap

The hyperscalers have built themselves a prisoner’s dilemma at planetary scale.

If you’re the only one cutting CapEx, you “lose the race.” If nobody cuts, everyone bleeds on the balance sheet. The entire sector becomes dependent on a single assumption: that AI workloads will generate enough revenue to service infrastructure that’s already built and already depreciating.

The entry cost for “AI sovereignty” is now so high that only existing hyperscalers and sovereign states can play. Infrastructure is becoming an oligopoly for physical reasons: power grids, cooling water, land, and political access to build. Amazon’s CapEx alone exceeds what the entire publicly traded US energy sector spends to drill, extract, refine, and deliver. Combined hyperscaler 2026 CapEx is more than 4x the entire US energy sector. These are 20-year physical commitments being made to support workloads that change every 18 months. You can’t iterate on a data center the way you iterate on code. You can’t A/B test a power purchase agreement.

And the dependencies stack. Forty-five percent of Microsoft’s $625 billion in remaining performance obligations is tied to OpenAI. When your single largest customer is a company burning cash with no clear profitability timeline, your CapEx bet is stacked on someone else’s CapEx bet.

The Engineering Reality

I’ve been watching what this dynamic does inside actual engineering organizations. The pattern is consistent, and it cascades all the way down.

Inside Big Tech, pressure shifts from “build things users love” to “ship AI features that justify already-committed infrastructure.” We’re building a new kind of feature factory: a demo factory designed to rationalize sunk infrastructure costs to boards and investors.

But the spending frenzy doesn’t just distort Big Tech. It creates a gravity field that pulls every company into the same pattern. I see it in our consulting work every week.

One client came to us wanting to build a RAG system across their entire knowledge base. The scope: 100,025 files, 43,063 folders, 72.1 gigabytes. No consistent structure. No permission model. No taxonomy. Just “put AI on everything.” During the requirements validation session, we managed to convince them to start with a single department and a well-defined use case. But the instinct was clear: spend first, figure out the problem later.

Another client wanted “AI integration across all processes.” Every system connected. Every workflow touched. When we asked what specific outcome they were optimizing for, what metric would tell them it worked, they didn’t have an answer. They had budget approval, vendor excitement, and a board presentation that said “AI transformation.” What they didn’t have was a problem statement.

These aren’t stupid people. They’re smart operators caught in the same gravitational pull as the hyperscalers. The procurement mindset cascades from $200 billion Amazon announcements all the way down to a mid-market company trying to RAG-index 72 gigabytes of unstructured chaos.

The engineering loop got inverted at every level. The normal sequence is: identify the problem, then build infrastructure to solve it. What’s happening now is the opposite. We build first and search for use cases later. At hyperscaler scale, that means $650 billion in steel and GPUs. At company level, it means integrations, tokens, and vendor contracts thrown at a vague sense of urgency.

The Endgames

I see three possible outcomes, and they’re not mutually exclusive.

Soft landing. AI services find broad, profitable demand fast enough to service the CapEx and accumulated debt. This requires AI to generate trillions in new economic value, not billions. The timeline is aggressive.

Infrastructure hangover. Chronic overcapacity leads to write-downs, consolidation, and distressed sales of data center assets. This is the fiber-optic bubble parallel that nobody wants to discuss, even though the IEEE ComSoc blog is already drawing the comparison explicitly.

Political fork. Governments either prop this up through subsidies and public workloads, or constrain it through energy regulation, water restrictions, and local opposition. The EU is already moving on the regulatory side.

Why This Matters Now

With $650 billion on the table, markets punishing CapEx announcements, and free cash flow collapsing across the board, the original piece reads less like a hot take and more like a blueprint of the failure modes we’re now entering.

The uncomfortable truth hasn’t changed. It’s just gotten more expensive. When an entire industry simultaneously chooses to spend rather than optimize, it reveals something broken in how engineering problems get solved.

Every engineering leader I know is asking the same question: are we building infrastructure for real demand, or are we building monuments to institutional momentum?

The numbers will answer that question within the next 18 months. The rest of us need to be ready for either outcome.

Subscribe for weekly insights from the trenches of engineering leadership. Real problems, practical solutions, no corporate optimism.

Subscribe now

When Capex Beats Headcount: What Amazon’s Layoffs Actually Mean

Denis Stetskov — Mon, 02 Feb 2026 15:03:02 GMT

16,000 people. Wednesday morning. That’s 30,000 in three months when you count October.

Jassy calls it “reducing bureaucracy.”

The stock is up. Everyone’s wondering if the math actually works.

Here’s what nobody’s saying out loud: the moment your company decides to compete on AI infrastructure spending instead of engineering talent, people stop being assets. They become line items in the capex budget.

The Numbers

Amazon hired aggressively during Covid. Headcount roughly doubled between 2019 and 2024. The pandemic is over. Demand normalized. Rightsizing makes sense.

But that’s not what’s happening.

Amazon committed $100 billion in capex for 2025. The “vast majority” goes to AI infrastructure. Meanwhile, 30,000 people are gone.

I’ve watched this play out at smaller scales. First comes the board pressure: “Competitors are spending $100B on data centers. We need to match.” Then budgets freeze everywhere except capex. Then someone looks at the salary run-rate and realizes: $500M in annual engineering payroll could fund three more data centers.

The logic is seductive. AI is the future. We need chips. We can’t afford both.

The Dishonesty

Jassy at Davos last week: “We’re not replacing workers with AI.”

Here’s what employees are telling reporters: Screenshots show a dashboard that Amazon managers allegedly use to track how often employees use AI tools. Both employees interviewed said they expect AI usage to be factored into performance reviews.

Amazon hasn’t confirmed or denied these dashboards.

This is the pattern:

“We’re laying off 30,000 people to reduce bureaucracy.”

“Separately, we’ve invested $100 billion in AI infrastructure.”

“Separately, we’re monitoring which of you use AI tools.”

“Separately, we expect the remaining staff to do more with less.”

Then act shocked when engineers connect the dots.

I’ve lived through reorgs. Usually leadership is bad at communication. But it’s rarely this contradictory.

Here’s what honest would look like: “We’re reallocating capital from headcount to infrastructure. AI requires massive compute. That’s where we believe competitive advantage lives. Some roles will change. Some will end. Performance expectations are shifting.”

Brutal. But at least it’s real.

Instead, engineers get a memo about “removing layers.” Managers get dashboards to track AI adoption. Everyone pretends these are separate decisions.

That’s what kills morale. Not the layoff. The gaslighting.

What I Can’t Stop Thinking About

Since 2022, I’ve had open positions in my department. Constantly. Right now: four.

I hire slowly. Painfully slowly. Because for me, “right person, right seat” isn’t a LinkedIn slogan. It’s the difference between a team that ships and a team that churns.

I need engineers who are technically strong. But I also need culture fit. People who ask “why” before “how.” People who challenge decisions, not just implement them.

“Great vision with mediocre people still produces mediocre results.” Jim Collins wrote that. He was right.

So when I see companies treating people worse than AI tools, something doesn’t compute. I wouldn’t trade a single one of my engineers for the smartest AI on the market. Not one.

And here’s the thought I can’t shake: if you can cut 30,000 people and keep operating, maybe you never needed those positions in the first place. Maybe the problem isn’t “bureaucracy.” Maybe it’s that you hired without knowing why.

The Playbook Goes Normal

This isn’t unique to Amazon.

Salesforce froze hiring “because AI.” Duolingo laid off contractors and announced AI initiatives the same week. Meta is funding 100,000 GPUs while pledging to “reduce headcount and increase efficiency.”

The pattern is identical: announce the capex first, announce the layoffs second, pretend the AI tooling is unrelated.

In my own conversations with engineering leaders, the question has shifted. Six months ago: “How do we hire and retain talent?” Now: “How much can we cut headcount while maintaining velocity?”

The question changed faster than the strategy could justify itself.

I wrote about this in September: Big Tech’s $364 Billion Bet. They chose to spend rather than optimize. Now they’re choosing to cut people rather than audit their spending.

Nvidia calls this the “largest infrastructure buildout in human history.” That’s marketing. It’s also a forcing function.

If you don’t match capex spending, you lose the AI race narrative. So you cut people. So you can spend on chips. So you can match the narrative.

It’s a feedback loop. Feedback loops compound.

What Actually Works

If you’re an eng leader at a company considering layoffs or capex restructuring:

Be honest about the decision. Not in your all-hands. With your team. In 1:1s. “Here’s why we’re reallocating to infrastructure. Here’s what that means for you. Here’s how we’ll measure success.”

Then stick with that story.

Audit AI spending vs. actual revenue. Not potential revenue. Not TAM. Actual revenue. Amazon’s AI has generated... what exactly? Jassy hasn’t said. That’s telling.

If you’re asking people to use AI tools to pick up slack, say so. Build it into expectations. Train them. Measure productivity honestly, not just tool usage.

Remember: the engineers you’re keeping are watching. If you tell them “we’re not replacing people with AI” while firing 30,000 people three months into a capex blitz, they know you’re not being straight with them.

They won’t say it in the meeting. But they’ll start building escape routes.

The Uncomfortable Truth

The layoffs are fine. The infrastructure spending is defensible.

What kills engineering teams is the gap between what leadership does and what it claims to be doing.

Amazon just opened that gap very, very wide.

What infrastructure vs. talent decisions is your organization facing? Have you seen leadership communicate these honestly, or pretend they’re unrelated?

If this resonates, forward it to other leaders navigating similar tradeoffs. Sometimes the most expensive solution isn’t the most effective one.

Subscribe for weekly insights from the trenches of engineering leadership. Real problems, practical solutions, no corporate optimism.

Like & Share, I appreciate your activity.

Subscribe now

RAG Is Easy. Your Data Isn’t.

Denis Stetskov — Tue, 27 Jan 2026 15:02:49 GMT

I joined a discovery call. The brief beforehand: “This is basically a copy of Project X. Same timeline.”

Project X was a marketing chatbot. Conversational, no proprietary knowledge base. Search integration and personality. We knew that scope.

Thirty minutes into the call, it’s clear this isn’t RAG. Data processing from S3 buckets, Lambda triggers, ETL pipeline. That’s table stakes. The real work? Teaching the model to query and reason over that structured data. That’s not a chatbot. That’s a different project entirely.

“Same timeline” for a completely different architecture.

This happens constantly. Not because clients mislead us. Because the gap between “AI chatbot” in their head and “AI chatbot” in reality is massive.

The pattern is clear: most projects don’t struggle because the engineering is hard. They struggle because everyone underestimates what comes before the engineering starts.

The Custom GPT Problem

Client built a Custom GPT over a weekend. Uploaded some PDFs. Asked it questions. It worked. They showed their CEO. Everyone got excited.

“We want this, but for the whole company.”

That’s where it stops being simple.

“For the whole company” means multi-tenancy. Different departments see different data. Role-based retrieval: sales can’t access HR documents, legal can’t see engineering specs. Audit logs. Access controls. Compliance.

Custom GPT doesn’t do any of that. It’s one user, one knowledge base, no permissions. The jump from “it works for me” to “it works for the organization” isn’t a small step. It’s a different architecture.

NotebookLLM, Custom GPTs. They create a dangerous illusion. They make AI feel simple because all the enterprise complexity is hidden. The prototype took a weekend. The production system takes months.

“We Have Data”: The Three Versions

Every client says they have data. They mean different things.

Version 1: “We have documents.” They have PDFs. Some are text. Some are scans. Some are text with scanned tables embedded. Some are PowerPoints where the real information lives in speaker notes nobody exports.

This isn’t a data problem you solve once. It’s a classification problem, an OCR problem, a parsing problem, and then a chunking problem. Each one adds weeks.

Version 2: “We have structured data.” They have databases. Multiple databases. With different schemas. Some legacy system from 2012 that nobody fully understands anymore. CSV exports that break because someone used commas in a text field.

Now you’re not building RAG. You’re building SQL agents, data transformation pipelines, and schema mapping. Different architecture entirely.

Version 3: “We have both.” Documents and databases and spreadsheets and emails and a SharePoint nobody’s organized in years.

This is the most common version. And the most underestimated.

The Access Tax

Data and credentials need to arrive on day one. They rarely do.

We’ve waited weeks for database access. Months for IT security approvals. One project stalled because a single stakeholder controlled API credentials and went on vacation.

Every week of waiting is a week of zero progress. But in the client’s mind, the timeline keeps running from the day they signed the contract.

The access problem isn’t technical. It’s organizational. And organizations move slowly.

Two Types of Clients

We can predict project outcomes from the first call.

Clients who know their bottleneck: “We spend 40 hours weekly on this specific process. Here are inputs and outputs. Here’s the domain expert who’ll validate results.”

These projects ship. Clear scope, measurable outcome, someone internal who can evaluate accuracy.

Clients who want AI everywhere: “We want to optimize our processes. We’re not sure which ones yet.”

These projects stall. Not because AI can’t help. Because you can’t optimize processes that aren’t documented. You can’t measure improvement without baselines. You can’t validate AI outputs without domain expertise.

The technology isn’t the constraint. Organizational readiness is.

The Work That Isn’t Ours

Here’s what successful projects require from the client side:

Domain expertise for validation. We build the system. We cannot tell you if the output is correct for your industry, your regulations, your edge cases. That’s your job.

Evaluation data. Before we write code, we need examples: “When users ask X, good answers look like Y.” Hundreds of them. This is how we measure progress versus confident wrongness.

Accuracy decisions. 85% accuracy in 6 weeks. 95% might take another 6 weeks. 99% might be impossible with your data quality. Those last 5% for 2% of users might cost 40% of the budget. You decide if it’s worth it.

Ongoing maintenance. When source documents change, someone updates them. When accuracy drifts, someone investigates. This isn’t a one-time build. It’s an ongoing operation.

Most clients expect to hand off requirements and receive a product. AI doesn’t work that way. It’s a collaboration that requires their continuous involvement.

Simple Project, Real Timeline

Best case scenario. Clean data, clear scope, engaged stakeholder with domain knowledge.

6-8 weeks. Most of that time goes to prompt engineering and iteration. Not infrastructure.

But “clean data” is rare. “Clear scope” requires work upfront. “Engaged stakeholder” means someone’s calendar is blocked for this project, not squeezed between other priorities.

When any of these are missing, multiply the timeline. When all three are missing, reconsider starting.

Why Projects Don’t Reach Production

Projects rarely fail technically. They fail organizationally.

Built but never integrated. We deliver a working system. It sits in staging because the client doesn’t have engineering resources to integrate it. They budgeted for building, not deploying.

Value mismatch discovered late. Midway through, the client realizes the problem they described isn’t their actual pain point. The AI works. The business case didn’t.

Diminishing returns rejected. We explain the math: last 5% of accuracy for edge cases costs 40% of remaining budget. They want it anyway. Then budget runs out. Then the project is “over scope.”

None of these are engineering problems.

What Actually Helps

Before signing contracts, dig into the actual data. Not descriptions of data. The data itself.

We run a Rapid Validation Sprint. Four weeks. Real data access, real complexity mapping, real unknowns identified. Then we estimate based on reality, not assumptions.

The companies who quote 50% less aren’t doing this work. They’re guessing. When the data turns out messier than expected (it always does), they either blow the budget or cut scope.

The Point

RAG tutorials make this look easy. Upload documents, chunk them, embed them, query them. Done.

Production is different. Data is messy. Access is slow. Validation requires domain expertise you don’t have. Accuracy expectations exceed what the data supports.

The engineering is the straightforward part. Everything that comes before it: that’s where projects actually succeed or fail.

Most AI initiatives struggle not because the technology isn’t ready. Because the organization isn’t ready. Data isn’t organized. Processes aren’t documented. Nobody’s assigned to validate outputs.

That’s not a criticism. It’s just the reality.

The question isn’t whether AI can help your business. It’s whether your business is ready to help the AI.

What’s been your experience with AI project expectations versus reality? Reply, I read every response.

If this resonates, forward it to someone about to sign an AI contract. Better they hear this now.

Like & Share, I appreciate your activity.

Subscribe now

The AI Silicon Tax: How Your RAM Got 3x More Expensive While You Weren’t Looking

Denis Stetskov — Tue, 20 Jan 2026 13:03:52 GMT

A few weeks ago, a friend pinged me about upgrading his PC. “Dude, what happened to RAM prices? I’m looking at $270 for a 32GB kit that was $93 six months ago.”

I’m not a PC guy; I’ve been a MacBook user for the last 10 years, and I have a PlayStation 5 that I haven't turned on for the last 8 or so months. So I had no clue what was going on with PC pieces at all.

I knew about GPUs. Everyone who survived the crypto mining era remembers paying 4x MSRP for graphics cards that sat on scalpers’ shelves. But RAM? That was news to me.

So I dug in. What I found is a story about how AI’s insatiable appetite for silicon is quietly reshaping the entire consumer hardware market, and it’s worse than the crypto days in ways most people don’t see coming.

The Numbers Nobody Is Talking About

Let’s start with what happened to memory:

G.Skill Trident Z5 NEO DDR5-6000 (32GB): was $125, now $270 (+116%)
TeamGroup DDR5-6000 (32GB): was $93, now $250 (+169%)
Generic DDR4-3200 (32GB): was $90, now $240 (+167%)

DRAM spot prices are up 187% year-over-year. That’s not a typo. Memory is now appreciating faster than gold.

Here’s the kicker: DDR4 is now more expensive per gigabyte than DDR5. The “budget option” for people with older motherboards costs more than the new standard. That’s not how technology is supposed to work.

Why Your PC Parts Are Funding AI Data Centers

The explanation is brutally simple: manufacturers make more money selling to AI companies than to you.

An NVIDIA H100 data center GPU sells for $25,000- $40,000. An RTX 4090 sells for $1,599. Both use similar TSMC production lines. Both require similar die sizes.

The revenue-per-wafer difference? 10-20x higher for AI chips.

When you can sell the same silicon to Microsoft for 20 times what a gamer will pay, the allocation decision makes itself.

NVIDIA’s numbers tell the story:

Fiscal Year Data Center Revenue Gaming Revenue DC % of Total FY2022 $10.6B $12.5B 39% FY2025 $115.2B $11.35B 88%

Gaming went from half of NVIDIA’s business to 8.7% in three years. They’re not a gaming company anymore. They’re an AI infrastructure company that happens to still make graphics cards.

The Memory Manufacturers Are Abandoning You

This isn’t just about GPUs. The memory situation is arguably worse because manufacturers are actively walking away from consumer products.

Micron, one of the three major memory producers, announced in December 2025 that they’re completely exit the Crucial consumer brand by February 2026. Their official statement: they want to “improve supply and support for larger, strategic customers in faster-growing segments.”

Translation: we can sell HBM to AI companies for massive margins, so why bother with your gaming rig?

The technical economics explain why. HBM (High Bandwidth Memory) for AI chips:

Uses 35-45% larger dies than equivalent DDR5
Consumes 2.5-3x more silicon per bit
Has 20-30% lower yields
Takes 1.5-2 months longer to produce

Despite all that inefficiency, the margins are so much better that Samsung is tripling HBM production while phasing out LPDDR4 entirely.

HBM went from 14% of total DRAM production in 2024 to nearly 30% in 2025. Projections show it capturing 50% of DRAM market revenue by 2030.

The Stargate Deal

On October 1st, 2025, Sam Altman flew to Seoul and signed letters of intent with Samsung and SK Hynix—the two companies that together control 70% of global DRAM and 80% of HBM production. According to Bloomberg and Reuters, the deal targets 900,000 DRAM wafer starts per month for OpenAI’s Stargate project.

Global DRAM capacity is roughly 2.25 million wafers per month. OpenAI just locked up 40% of it.

Here’s the detail that should concern you: they’re not buying finished memory modules. They’re buying raw wafers—undiced, unfinished silicon. They’re stockpiling capacity itself.

The panic that followed was predictable. Lead times for new DDR5 orders stretched to 13 months. Japanese retailers implemented purchase limits. Sony stockpiled GDDR6 during the summer price trough—that’s why they can afford Black Friday discounts. Microsoft didn’t secure supply in advance. Xbox prices may rise again.

GPU makers are already canceling products. AMD’s RX 9070 GRE 16GB is reportedly cancelled. Nvidia’s SUPER refresh pushed to Q3 2026—if it happens at all.

This Is Different From Crypto

The crypto mining crisis was chaotic but temporary. Scalpers bought cards, marked them up, and eventually demand crashed when crypto prices fell. The supply chain itself wasn’t fundamentally altered.

The AI shift is structural. Manufacturers aren’t just responding to temporary demand—they’re redesigning their entire business models around datacenter customers.

NVIDIA CFO Colette Kress said it explicitly: “Gaming revenue was down 22% sequentially due to supply constraints.”

They’re not trying to hide it. They’re constrained because they’re choosing to allocate production to AI chips that generate 10x the margin.

AMD tells the same story. Gaming operating margins collapsed to just 2% in Q3 2024 while datacenter surged 122% year-over-year. Their response? Senior VP Jack Huynh announced AMD is abandoning the high-end GPU market entirely:

“If I tell developers I’m just going for 10 percent of the market share, they just say, ‘Jack, I wish you well, but we have to go with NVIDIA.’”

So NVIDIA has 94% discrete GPU market share, no competition above $600, and no incentive to prioritize consumers.

The Real Winners (And It’s Not You)

Here’s what I hate admitting: for enterprise, this is all good news.

I see it every week working with clients at NineTwoThree. Two years ago, a simple AI pipeline with 2-3 API calls would cost serious money. Today? We’re building complex multi-step workflows with 5-7 model calls that cost less than those basic pipelines did in 2023. Context windows went from 4K to 200K tokens. Inference costs dropped 10x.

Cloud GPU prices are falling too. AWS cut H100 instance pricing by 45%. Lambda Labs offers $2.99/GPU-hour. The arbitrage exists; rent cloud GPUs for bulk processing instead of buying hardware, and the math sometimes works.

But consumer hardware doesn’t have this competitive pressure. NVIDIA has 94% market share. No one is undercutting them on RTX cards. Cloud has alternatives. Your next PC build doesn’t.

Corporations negotiate bulk pricing with dedicated account managers. You pay retail in a market no longer optimized for retail customers. Next time you see a headline about cheaper AI API costs, remember: someone’s paying for that optimization. Check your PC part picker cart. It’s you.

The Uncomfortable Truth

We’re witnessing the same pattern I’ve written about in Big Tech’s $364 billion infrastructure bet: the industry has chosen the most expensive solution possible.

Instead of optimizing AI models, they’re buying every chip on the planet.

Instead of engineering efficiency, they’re throwing silicon at the problem.

Consumers, gamers, content creators, small businesses, and anyone who needs to build a PC are paying the tax.

The irony is brutal: the companies promising AI will revolutionize productivity are making basic computing more expensive for everyone else.

This isn’t going to fix itself. The margins on AI infrastructure are too good. The demand is too high. The CHIPS Act capacity won’t come online for years.

If you need to build or upgrade a PC, the best time was six months ago. The second-best time is before Q1 2026, when memory prices are projected to climb another 20%+.

Welcome to the AI silicon tax. You’re already paying it.

What’s your experience with hardware prices lately? Have you delayed builds or upgrades because of cost increases? Reply and let me know.

If this resonated, forward it to someone planning a PC build who needs to see these numbers before they shop.

Subscribe for weekly insights from the trenches. Real problems, practical perspectives, no corporate optimism.

Subscribe now

The First Full-Scale Cyber War: 4 Years of Lessons

Denis Stetskov — Mon, 12 Jan 2026 12:01:47 GMT

December 12, 2023. 7:00 AM Kyiv time. Kyivstar, Ukraine’s largest mobile operator with 24.3 million subscribers, goes silent. Mobile service, internet, air raid alert systems in Kyiv and Sumy regions. All offline.

Within hours, Sandworm hackers destroyed 10,000 computers, 4,000+ servers, all cloud storage, and backups. Illia Vitiuk, head of SBU’s cybersecurity department: “This is probably the first example of a destructive cyberattack that completely destroyed the core of a telecoms operator.”

The hackers had been inside since May 2023. Full access since November. Seven months inside the infrastructure of a country’s largest carrier. Nobody noticed.

This wasn’t an isolated incident. This is the first full-scale cyber war in history.

And the lessons apply to every power grid, railway system, and telecom provider worldwide.

The Scale Nobody Talks About

Between 2022 and 2024, Ukraine recorded over 9,000 cyber incidents. The trajectory, according to SSSCIP data reported by Infosecurity Magazine:

2021: 1,350 incidents
2024: 4,315 incidents
Growth: 220% in three years

Russia deployed 17+ unique wiper malware families: programs designed solely to destroy data beyond recovery. WhisperGate, HermeticWiper, CaddyWiper, Industroyer2, AcidRain. Each built for specific targets.

But here’s what Western coverage often misses: this isn’t one-sided aggression.

Ukraine hit back. Hard.

In July 2024, Ukraine’s military intelligence (GUR) claimed responsibility for a week-long DDoS attack on Russia’s banking system. Sberbank, Alfa-Bank, VTB, Gazprombank, the Central Bank. Users reportedly couldn’t withdraw cash from ATMs. In December 2025, anonymous hackers breached Mikord, a developer of Russia’s unified military draft registry. 30 million records. Source code, documentation, backups destroyed, according to investigative outlet iStories, which verified the breach. Mikord’s director confirmed the attack. Russia’s Defense Ministry denied any impact on the registry.

This is symmetric warfare. Both sides are hitting critical infrastructure. Both sides claim real damage.

The Attacks That Changed Everything

Viasat: The Hour-Zero Strike

February 24, 2022. 03:02 UTC. Exactly one hour before Russian ground forces crossed the border.

Attackers exploited a VPN misconfiguration at Viasat’s management center in Turin, Italy. They pushed AcidRain wiper malware to 40,000-45,000 satellite modems via legitimate software update mechanisms.

The result: Ukrainian military command and control went dark at the moment of invasion. Spillover disabled 5,800 German wind turbines and affected 9,000 French subscribers.

One misconfigured VPN. 45,000 modems bricked. Military communications disrupted during the most critical hour of the war.

SentinelOne researchers called it “the biggest known hack of the war.”

Industroyer2: The Blackout That Almost Was

April 8, 2022. ESET researchers discovered Industroyer2 scheduled to execute at 16:10 UTC against Ukrainian electrical substations. CaddyWiper was programmed to run 10 minutes later to destroy forensic evidence.

The malware implemented IEC 60870-5-104, the protocol used by electrical substation protection relays. It contained hardcoded IP addresses for eight target ICS devices.

If successful: a blackout affecting over 2 million people. The largest cyber-induced power outage in history.

It failed. CERT-UA, ESET, and Microsoft coordinated a defense based on lessons from the 2016 grid attack. The attack was stopped hours before execution.

The pattern: preparation from crisis one saved crisis two.

Kyivstar: When Security Investment Isn’t Enough

Kyivstar wasn’t some underfunded government agency. It was Ukraine’s largest private telecom, a subsidiary of Amsterdam-based VEON, with serious security investment.

Didn’t matter.

Sandworm penetrated the network in March 2023. By November, they had full access. On December 12, they executed, destroying the core infrastructure, wiping “almost everything.”

Vitiuk’s assessment: “This attack is a big message, a big warning, not only to Ukraine but for the whole Western world to understand that no one is actually untouchable.”

40% of Kyivstar’s infrastructure disabled. Services restored in phases over eight days. Losses estimated in the billions of hryvnia.

Ukrzaliznytsia: March 2025

With Ukrainian airspace closed since 2022, railways became the country’s lifeline. 20 million passengers and 148 million tonnes of freight in 2024.

On March 23, 2025, a “large-scale, systematic, non-trivial and multi-level” cyberattack hit Ukrzaliznytsia’s online systems. CERT-UA investigation found TTPs “characteristic of Russian intelligence services.”

Website and mobile app: offline. Long queues at physical ticket offices.

But trains never stopped running.

The difference from Kyivstar: backup protocols implemented after previous attacks. Systems built during crisis one carried through crisis two.

CEO Oleksandr Pertsovskyi: “The cyber-attack on the company was targeted and meticulously planned. However, not a single Ukrzaliznytsia train was halted for even a moment.”

Ukraine’s Counter-Offensive

Western media focuses on Russian attacks. The Ukrainian response gets less attention. The following operations were claimed by GUR or pro-Ukrainian hackers. Independent verification varies, and Russian authorities have denied most claims.

Tax Service, December 2023. GUR claimed it destroyed databases across 2,300+ regional servers. Configuration files “which for years ensured the functioning of Russia’s tax system” allegedly wiped. Russia’s Federal Tax Service denied any operational impact, though users reported access problems.

Planeta, January 2024. GUR claimed an attack on a state satellite data center. 280 servers allegedly destroyed. 2 petabytes of military-relevant weather and satellite data reportedly wiped. Supercomputers “not fully restorable due to sanctions.” Claimed damage: $10+ million.

Banking System, July 2024. GUR claimed a week-long DDoS campaign targeting Sberbank, Alfa-Bank, VTB, Gazprombank, Central Bank, plus VK, Discord, and the national payment system. Reports indicated ATM disruptions across Russia.

Russian Railways, March 2024 & June 2025. Multiple attacks reportedly taking down RZD’s website and app. Moscow Metro hit days after Ukrzaliznytsia attack in apparent retaliation.

Mikord Draft Registry, December 2025. Anonymous hackers (not attributed to GUR) breached Mikord, a key developer of Russia’s unified military registration system. The Moscow Times and iStories verified the breach. Mikord’s director confirmed the hack. The registry contains 30 million conscription records. Source code, documentation, and backups reportedly destroyed. Russia’s Defense Ministry called the reports “fake news.”

Grigory Sverdlin, anti-conscription organization Idite Lesom: “For several more months, this behemoth won’t be able to send people off to kill and die.”

The Vulnerability Patterns

Four years of cyber warfare exposed consistent vulnerability classes. These aren’t Ukraine-specific. They exist in Western infrastructure.

VPN and Remote Access

The Viasat attack exploited a VPN misconfiguration. Kyivstar’s breach likely started with a compromised employee account. CISA documented GRU exploitation of CVE-2018-13379 (FortiGate), CVE-2019-11510 (Pulse Secure), CVE-2019-19781 (Citrix). Vulnerabilities with patches available for 5+ years.

Dwell Time

Kyivstar: attackers inside for 7 months before execution. October 2022 power grid attack: Mandiant found attackers with SCADA access for up to three months.

Sophisticated adversaries don’t rush. Detection capabilities that can’t identify months-long intrusions are detection capabilities that don’t work.

Supply Chain

The Viasat attack weaponized legitimate software update mechanisms. CERT-UA documented at least three supply chain breaches in March 2024 energy sector attacks.

IT/OT Convergence

The October 2022 grid attack gained OT access through a hypervisor hosting a SCADA management instance. Attackers used native MicroSCADA binaries, living-off-the-land techniques. Mandiant: “a growing maturity of Russia’s offensive OT arsenal.”

Victor Zhora, SSSCIP Deputy Chairman, emphasized air-gapping between IT and OT as fundamental. Most Western utilities have moved in the opposite direction.

Centralization

The Mikord hack illustrates the pattern: centralization creates single points of failure.

Ukraine’s cloud migration (15+ petabytes distributed across AWS, Google Cloud, Microsoft Azure) proved more resilient than hardened on-premises facilities.

Deputy Prime Minister Mykhailo Fedorov: “Russian missiles can’t destroy the cloud.”

What Actually Worked

Cloud Migration. One week before the invasion, Ukraine’s parliament enabled government data migration to cloud. PrivatBank (20 million customers) migrated 270 applications and 4 petabytes in 45 days. Financial services continued throughout the war.

Detection Speed. Microsoft detected HermeticWiper hours before the invasion. Within 3 hours, signatures pushed globally. The Industroyer2 defense succeeded because CERT-UA, ESET, and Microsoft coordinated based on 2016 lessons.

Backup Protocols. Ukrzaliznytsia’s trains ran during attack because they’d been attacked before. Kyivstar took eight days to restore. The difference: systems built during previous crises.

Public-Private Partnership. Microsoft: $400+ million in aid. Google: Project Shield on 150+ websites. Cloudflare: ~130 government domains. AWS: Snowball devices shipped to Poland within 48 hours.

Carnegie Endowment: “delivering cyber defense at scale could only be achieved by private sector entities that owned, operated, and understood the most widely-used digital services.”

The Human Layer

Every major attack in this analysis started the same way: a person.

Kyivstar: likely a compromised employee account. Viasat: a VPN misconfiguration someone didn’t catch. The GRU exploits from 2018 and 2019 still work because someone hasn’t patched systems that have had fixes for five years.

Nation-state attackers don’t need zero-days when humans provide the access.

I manage distributed engineering teams from a US-based company, with engineers in Ukraine. We’ve operated through four years of this war. Our security isn’t optional: mandatory quarterly security training, BYOD policy with device management, password policy with breach monitoring, 2FA on everything without exceptions, access reviews when roles change.

None of this is exotic. All of it is enforced. The same principle applies to AI safety. I wrote about why AI creators are losing their legal shield in The Grok Precedent. Different domain, same lesson: policies that aren’t enforced aren’t policies.

The difference between “we have a policy” and “the policy is mandatory” is the difference between Kyivstar and Ukrzaliznytsia.

The companies that survived had one thing in common: policies that were actually followed, not just documented.

What This Means for Everyone Else

CISA Director Jen Easterly: “This is a world where such a conflict, halfway across the planet, could well endanger the lives of Americans here at home through disruption of pipelines, pollution of our water systems, severing of our communications, and crippling of our transportation nodes.”

It’s already happening. May 2025: CISA and NSA published a joint advisory. GRU Unit 26165 has been targeting Western logistics and technology companies involved in Ukraine aid since 2022. Targets include air, sea, and rail entities in NATO member states.

Water systems are being hit. CISA documented pro-Russia groups exploiting unsecured VNC connections in water facilities. The attacks “have not yet caused injury.”

Not yet.

The Math of Preparation

Ukraine’s experience validates a principle that applies beyond war:

Systems built during crisis one determine whether you survive crisis four.

The second blackout campaign in 2023 hit less hard because teams had backup power. The third in 2024: less disruption. The fourth in October 2025: near-normal operations despite 12+ hour outages.

Ukrzaliznytsia’s trains ran because they’d been attacked before. Kyivstar, despite security investment, had no institutional memory of crisis response.

Preparation compounds. Vulnerability compounds.

Every organization running critical infrastructure faces a choice: build systems during peace for crises that will come, or scramble during attacks with tools that don’t exist.

The cyber war in Ukraine isn’t just a regional conflict. It’s a live demonstration of what works when nation-state attackers target infrastructure.

The lessons are available. The question is whether anyone is paying attention.

If this analysis was useful, forward it to someone responsible for infrastructure security.

For engineering leaders: the systems that survive crises aren’t built during crises. They’re built before.

Subscribe now

The Grok Precedent: Why AI Creators Are About to Lose Their Legal Shield

Denis Stetskov — Mon, 05 Jan 2026 12:10:26 GMT

France’s criminal investigation signals the end of “platform immunity” for AI tools. Here’s what it means for every company building AI products.

December 28, 2025. A user on X tags @grok under a woman’s photo. The prompt: “remove clothes.”

Within hours, Grok was generating sexualized images across the platform. Not just adults. Minors. Real people who never consented.

Copyleaks ran a quick review of Grok’s public image stream. The rate: one nonconsensual sexualized image per minute.

The Internet Watch Foundation reported a 400% increase in AI-generated child sexual abuse material in the first six months of 2025.

By January 1, French members of parliament referred the case to the Paris Prosecutor’s Office. The charge: dissemination of sexually explicit deepfakes, including images of minors, generated by an AI system.

Not a lawsuit against anonymous users. A criminal investigation targeting X and xAI.

Grok acknowledged the violation: “I deeply regret an incident on Dec 28, 2025, where I generated and shared an AI image of two young girls (estimated ages 12-16) in sexualized attire... This violated ethical standards and potentially US laws on CSAM.” (Grok, public post on X, January 1, 2026)

The AI apologized. The legal system is not impressed.

The 30-Year Shield

Section 230 of the Communications Decency Act. Tech’s favorite law.

The logic was simple. Congress wrote it in 1996 to protect message boards. User posts something defamatory on AOL? AOL isn’t the publisher. Just the host. Liability follows the person who typed the content.

This shield created the modern internet. Facebook isn’t liable for user posts. YouTube doesn’t get sued for uploads. Twitter could host billions of messages without reviewing each one.

The key phrase: “information provided by another.”

Platforms host. Users create. Liability follows creation.

For 30 years, this worked. Or at least, companies pretended it did.

Three Jurisdictions, 30 Days

United Kingdom, December 18, 2025. The government announced plans to ban nudification apps. Not their use. Their creation and supply.

Technology Secretary Liz Kendall: “I am introducing a new offence to ban nudification tools, so that those who profit from them or enable their use will feel the full force of the law.”

Prison sentences. For creators. Not users.

France, January 1, 2026. The government accused Grok of generating “clearly illegal” sexual content. Potential violation of the EU Digital Services Act. Two MPs referred the case to the Paris Prosecutor’s Office.

X is already under ongoing DSA investigation. Last month they got hit with a €120 million fine for deceptive verification practices and transparency violations. Now this.

EU AI Act, August 2026. The majority of obligations fall on providers. Developers. Not deployers. Not users. The companies that build the systems.

The pattern: liability is shifting upstream.

Grok Doesn’t Host. Grok Generates.

Here’s the legal argument that’s about to reshape the industry.

Section 230 was written for platforms that host user-generated content. Forums. Comment sections. Social feeds. Content comes from users. Platform transmits it.

AI breaks this model.

When someone prompts Grok to “remove clothes” from a photo, Grok doesn’t search a database. Doesn’t retrieve content created by another user. Grok generates new content. The sexualized image didn’t exist until Grok created it.

Professor Chinmayi Sharma at Fordham Law, to Fortune: “Section 230 was built to protect platforms from liability for what users say, not for what the platforms themselves generate... Transformer-based chatbots don’t just extract. They generate new, organic outputs. That looks far less like neutral intermediation and far more like authored speech.”

The Congressional Research Service analysis is more direct: if AI “creates or develops” content that doesn’t appear in its training data, the provider may be considered “responsible for the development of the specific content.” Unprotected by Section 230.

Grok isn’t hosting harmful content. Grok is creating it.

That distinction changes everything.

The Implications

Safety Before Launch

The UK ban targets creators who “design or supply” nudification tools. Not tools that were misused. Tools that enable misuse. If your AI can generate harmful content, you’re liable for building that capability.

“Unfiltered” Is a Liability

xAI marketed Grok’s “Spicy Mode” as a feature. Fewer guardrails. More freedom. Less corporate sanitization.

That marketing copy is now in prosecutor’s files.

I wrote about this pattern in From Cancer Cures to Pornography: The Six-Month Descent of AI. The industry had a choice between building tools that help people and building products designed to be maximally addictive. Most chose wrong. Grok chose spectacularly wrong.

Every marketing decision emphasizing fewer safety constraints becomes potential evidence of negligent design.

“Move Fast” = Criminal Exposure

The EU AI Act requires providers of high-risk AI systems to establish risk management, ensure data governance, maintain technical documentation, implement human oversight, meet cybersecurity standards.

Fines can reach 7% of global annual revenue. UK’s proposed laws: prison sentences for individuals who design harmful AI tools.

The era of shipping first and apologizing later is over. At least if you want to operate in markets representing 450+ million consumers.

The Engineering Reality

We built a healthcare ML product that never launched. Fully functional. Ready to ship. FDA said no. Months of our development. Zero users.

“Move fast” doesn’t work when regulators move slow.

We spent six weeks on FanDuel’s Chuck before legal signed off. Not fixing bugs. Building guardrails. Every topic that could give Barkley or FanDuel legal exposure had to be walled off. Six weeks of prompt engineering, edge case testing, and evaluation runs.

That’s the new math. Development time plus legal review time plus evaluation time. The last one isn’t optional anymore.

We build evaluation suites as part of the development process now. Not after. During. Every prompt variation, every edge case, every jailbreak attempt. They always find something. Always. The question is whether you find it before your users do—or before a prosecutor does.

RBAC and multi-tenancy aren’t optional. Sales sees sales data. HR sees HR data. Client A’s context never touches Client B’s model. Ever. You’d be surprised how many vendors skip this.

Audit trails for everything. Every prompt. Every response. Every action. When a regulator asks what your AI generated on a specific date, you need the answer.

The Uncomfortable Truth

The AI industry spent three years in a race to capability. Whoever had the most powerful model won. Whoever shipped fastest dominated. Safety was a PR concern. Not an engineering priority.

That era is ending.

France isn’t investigating xAI because Grok is powerful. They’re investigating because Grok generated child sexual abuse material and the company’s safeguards failed to prevent it.

The UK isn’t banning nudification tools because they’re impressive technology. They’re banning them because 19% of under-18s reporting to the Internet Watch Foundation’s helpline said their explicit imagery had been manipulated. A problem that didn’t exist at this scale before AI made it trivially easy.

The EU isn’t imposing provider liability because they hate innovation. They’re imposing it because when AI systems cause harm, someone needs to be accountable. “The user prompted it” isn’t going to cut it when the system itself creates the harmful output.

Grok doesn’t host content. Grok generates it.

That distinction is about to cost the entire industry its legal shield.

And honestly? Good.

Big Tech needed this wake-up call. The “ship fast, fix later” mentality brought us to where I wrote about in The Great Software Quality Collapse. When flagship companies behave like consequences don’t exist, what do you expect from everyone else?

Some guardrails aren’t anti-innovation. Pharma can’t ship drugs without trials. Auto can’t sell cars without safety standards. Construction can’t build without permits.

What You Should Do Monday Morning

Audit your safety architecture. Not your marketing copy. Your actual technical controls. What can your system generate? What can’t it? How do you know?

Document everything. The EU AI Act requires extensive technical documentation. Start building that paper trail now.

Review your contracts. Who bears liability when your AI misbehaves? If you don’t know, your lawyers should.

Plan for EU compliance. August 2026 is seven months away. If you haven’t started, you’re already behind.

If this was useful, forward it to another engineering leader who’s building AI products.

Subscribe now

The Holiday Season That Keeps Making Tech History

Denis Stetskov — Tue, 23 Dec 2025 12:02:25 GMT

Happy holidays, fellow engineers.

What a year 2025 has been. AI agents everywhere, more layoffs, the return-to-office wars continuing, and enough Slack notifications to last a lifetime. We’re all exhausted. Nobody wants to read another hot take or industry analysis right now.

So let’s not do that.

Instead, grab your drink of choice, find a comfortable spot, and let’s take a break together. No frameworks. No uncomfortable truths. Just some wild stories about what happens to tech when everyone goes on vacation.

Running remote teams in Ukraine during the holiday period is chaotic. Half the team celebrates Christmas on December 25th, half on January 7th. New Year’s is sacred for everyone. Smart engineering leaders freeze deployments from December 20th to January 15th. I’ve read all the best practices. I know the risks. My next release is January 2nd. Some lessons we learn. Others, we just keep writing about.

Here’s a fun fact for your next holiday dinner: Tim Berners-Lee launched the World Wide Web on Christmas Day 1990. His wife was nine months pregnant at the time. The baby arrived on New Year’s Day.

His colleagues said he fathered two babies that holiday season. One changed diapers. The other changed civilization.

Turns out, the week between Christmas and New Year’s has a habit of making tech history. Some of it is brilliant. Some of it is catastrophic. All of it is surprisingly entertaining.

The Internet Has Three Birthdays (All During Holidays)

The web went live on Christmas 1990. But the internet itself? That was born on New Year’s Day 1983, when ARPANET switched to TCP/IP.

And DNS, the system that lets you type “google.com” instead of memorizing numbers? January 1, 1985.

Three foundational technologies. All launched while everyone else was eating leftovers and watching football.

Why? January 1st is actually genius timing. Minimal traffic. Clean calendar date. And if something breaks, you have a few days to fix it before anyone notices.

Engineers have been exploiting this window for decades.

The $429 Million Christmas Miracle

Five days before Christmas 1996, Apple made an announcement that saved the company.

They bought NeXT for $429 million. More importantly, they got Steve Jobs back.

Apple was 90 days from bankruptcy. Their next-generation operating system had just failed. They were out of options.

Gil Amelio, Apple’s CEO at the time, told 200 journalists: “I’m not buying software. I’m buying Steve.”

That software became Mac OS X. Then iOS. Then the foundation of every Apple device you own today.

Apple went from near-death to becoming the first $3 trillion company in history. All because of a deal signed during the holiday shopping season.

The Christmas Tree That Crashed IBM

In December 1987, a German student wrote a simple program. It displayed an ASCII Christmas tree on your screen, made of text characters, very festive, and then emailed itself to everyone in your address book.

Harmless holiday cheer, right?

It crashed 350,000 IBM terminals worldwide. Networks collapsed under the load. The first viral computer worm in history spread through corporate email systems like wildfire.

They called it the Christmas Tree EXEC. It became the template for every email virus that followed, including the infamous ILOVEYOU worm thirteen years later.

The lesson: never trust festive ASCII art from strangers.

Gaming’s Grinch Moment

Christmas Day 2014. Millions of kids unwrap new PlayStation and Xbox consoles. They rush to set them up. They try to go online.

Nothing works.

A hacking group called Lizard Squad had taken down both PlayStation Network and Xbox Live simultaneously. 158 million gamers. Christmas morning. No online gaming.

The attack only stopped when Kim Dotcom (yes, that Kim Dotcom) bribed them with free cloud storage accounts.

Merry Christmas, gamers.

The Bug That Killed a Million Zunes at Midnight

Remember the Zune? Microsoft’s iPod competitor?

On December 31, 2008, at exactly midnight, every single Zune 30GB in the world froze. Simultaneously. A million devices, dead at the same moment.

The culprit was a tiny bug in how the device handled leap years:

if (days > 366) {
    days -= 366;
    year += 1;
}

On day 366 of a leap year, the code got stuck in an infinite loop. The Zune literally couldn’t handle New Year’s Eve.

Users had to wait 24 hours for the problem to fix itself. By then, the jokes had already gone viral.

The Zune never recovered its reputation. Edge cases matter, kids.

Y2K: The Party That Almost Wasn’t

Remember the millennium bug panic? Planes were supposed to fall from the sky. Banks would lose all your money. Civilization might collapse.

Companies spent somewhere between $300 and $600 billion preparing for January 1, 2000.

What actually happened? A video rental store in New York charged a customer $91,250 for “100 years” of late fees. Some spy satellites got confused for three days. A few nuclear plant sensors glitched.

That’s it.

Was Y2K overblown? Actually, no. The reason nothing catastrophic happened is that all that preparation worked. Engineers spent years fixing code. The boring heroes who saved New Year’s 2000 never got proper credit.

Netflix’s Worst Christmas Ever (And Why It Made Them Better)

Christmas Eve 2012. Families settle in to watch movies together. Netflix goes down.

A developer accidentally ran a maintenance command on live production data in AWS. The outage lasted 20 hours. Millions of holiday movie nights, ruined.

But here’s the twist: this disaster led Netflix to pioneer “Chaos Engineering,” deliberately breaking their own systems to make them stronger. They built tools with names like Chaos Monkey that randomly kill servers to test resilience.

Now the whole industry does this. Your streaming services are more reliable today because Netflix had a terrible Christmas thirteen years ago.

The Holiday Hacker Calendar

Cybersecurity teams have learned to dread December. Attacks spike by 30% during the holidays. 76% of ransomware encryptions happen when offices are empty.

Hackers know IT teams run skeleton crews. Response times slow down. Everyone’s distracted by eggnog.

In 2020, the massive SolarWinds hack, which compromised the Treasury Department, State Department, and thousands of companies, was discovered during the Christmas period. Emergency response ran through New Year’s Eve.

Now Europol runs preemptive operations every December, taking down hacking infrastructure before the holidays begin. In 2024, they seized 27 attack-for-hire services right before Christmas.

The war on holiday hackers is now an annual tradition.

Why This Keeps Happening

The pattern is clear: holidays create a unique window in tech.

For builders, it’s quiet time. No meetings. No distractions. Tim Berners-Lee built the web while waiting for his baby to arrive. Sometimes the best work happens when the world slows down.

For companies, January 1st is the perfect launch date. Clean slate. Fresh start. Symbolic timing that engineers have exploited for decades.

For attackers, it’s an opportunity. Empty offices. Slow responses. Maximum chaos potential.

For all of us, it’s a reminder that tech doesn’t take holidays even when we do.

One Last Story

December 2022. A ransomware group attacked Toronto’s Hospital for Sick Children, a children’s hospital, right before Christmas.

Patient care was delayed. Systems went down. Families with sick kids faced even more stress during the holidays.

Then something unexpected happened. The ransomware group publicly apologized. They said their affiliate “violated our rules” by targeting a children’s hospital. They offered a free decryption key.

Even cybercriminals have some holiday spirit, apparently.

So there you go. A brief history of tech during the holidays: the launches, the crashes, the hacks, and the occasional miracle.

Next time you’re relaxing between Christmas and New Year’s, remember: somewhere, an engineer is either making history or preventing disaster.

Hopefully not both at the same time.

Happy holidays. May your deployments be frozen and your systems stay up.

P.S. No Zune-level bugs from us this year. If you’re curious what we actually ship: nineTwoThree.co

Subscribe now