My current client is a small company so, although I'm in a senior role, I only have to deal with PRs from a few juniors. I feel the burn more on my end, since I wear a lot of hats on such a small team and (under protest) have worked in claude code in hopes of improving throughput (futile hope, I argue). So probably 50% of the LLM code I review in this ongoing experiment, I prompted.
I also use it on pet/personal projects where the stakes are low. At worst, something only I use does something unfortunate. This is a fine use case for LLMs I think, but because I have been at this 20+ years I naturally check and recheck every line anyway. And that's where it gets weird. That's where I stare into the void.
It is very difficult to model the "mind" of a stochastic text generator. With other (human) programmers, I can have conversations with them, get a feel for their strengths and weaknesses, know when to second guess something that looks odd in their code and when to assume that, if the logic checks out, they had a reason for doing it the way they did.
Not so, at all, with chatbots. They will speak with absolute confidence in great depth and detail on any subject or specialty, then fuck up the most basic things, then spit out some fully functional if inelegant code, then asked for a small change, rewrite an unrelated half of their own code wrongly.
This is what fries my brain. With them it's contexts within contexts all the way down, all of them changing constantly according to some deranged rube goldbergian clockwork of tensors and matrices. Our brains are not made to deal with this fractal insanity. Huge portions of our psyche are built around dealing with other people, or people-like things, that exist within some boundary of predictability that can be discovered with observation and familiarity.
We leverage this subconsciously whenever we talk to our pets or plants, or see gods and spirits at work in the impersonal forces of nature, or wonder why our code or gadgets are misbehaving. But *especially* when something presents as human like chatbots do, this whole evolved subconscious architecture kicks in automatically.
And then the bot breaks it. Over and over. And we get exhausted, as we would in a bad relationship with a person with serious issues. Because no matter how much we tell ourselves logically, consciously, that the bot isn't human and can't be anticipated like one, that's only a single tiny input to the much larger true neural network inside our heads that begs to differ. And yet finds itself confused and disappointed moment to moment with the digital demon with whom we're trying to communicate.
This is the best comment this piece has gotten and you went somewhere I didn't in the article. The theory-of-mind angle is real and I think it's underrated in everything written about working with LLMs.
The thing you're describing about modeling other programmers, knowing when to second-guess and when to assume they had a reason, that's not a soft skill. That's the actual core of senior engineering. You're not reading code in isolation, you're reading it through a model of the person who wrote it. It's how you triage what to read carefully and what to skim. Without that model you have to read everything at full attention because you can't predict where the failure modes are. With LLMs you don't get to build the model because there isn't one to build. Every session is a stranger who lies confidently and inconsistently.
The fractal contexts thing is exactly it. With a human you eventually find the floor, the place where their reasoning bottoms out in something stable. Even bad programmers have a floor. LLMs don't have a floor because there's no continuous self underneath, just whatever the previous tokens happened to be. So your brain keeps drilling down looking for the bedrock and never hits anything.
The bad relationship analogy is going to stay with me. That's the right shape for what this feels like at the end of the day. You're not tired from the work, you're tired from the parasocial mismatch. Your subconscious keeps trying to do what it does with humans and gets nothing back, and that nothing-back is what wears you down.
I might write a follow-up piece on this. If you're ok with it I'd want to quote part of what you wrote here. Either way this is the comment I'm going to keep coming back to.
Right now if allowed to continuously learn they decohere, same as when you overtune with LoRA. It would be great if they were more stable but seems to be some missing pieces to enable that, and nobody knows what they are yet.
In both cases. I mean early on we saw what happened with Microsoft's Tay, and that was on par with GPT 2 iirc. But the problem persists in newer models.
I personally think it's a deficiency in the way training works, not necessarily the models themselves, but it's just a hunch. I have not messed around with training much, let alone done a deep dive on it or written my own. But it seems to me that it's pretty broad spectrum, a shotgun where you need a scalpel, and that may be why the model starts breaking down. For counterexample there's a pretty cool "decensoring" system that achieves ~90% reduction in prompt refusals without noticably impacting other metrics or real world performance, and it does it by isolating and modifying specific weights that activate when the model refuses a prompt. Something similar for correcting or enhancing other behaviors might bear fruit.
Anyway, of course there are practical problems with continuous learning too - you need about 2x VRAM for training vs. inference, so that would impact cost/profitability. Would have to decide whether the value of ongoing learning is worth the price vs. fixed weight models we presently deal with.
Good article, Denis and hope you are well and safe.
In my opinion generated code is by itself worthless. It can easily be created at will in any desired amount.
The actual “value” comes from someone accurately and precisely expressing intent (which is best done in a formal language with unambiguous interpretation, we used to call this programming) of how a machine should behave and the intent being in itself correct in the sense of properly and reliably solving whatever task/problem that led the programmer to write the code in the first place.
This is why using ai to generate large amounts of code and then attempting to critically review it seems asinine to me. The AI can not read your mind and figure out what your intent is. You have to express it, and if you can properly do so then translating that to code was always trivial.
Not rambling at all, this is exactly right. I came at it from the opposite direction. Thousands of AI supervision sessions taught me the same thing. AI can't code around vagueness, so the work shifts to writing specs precise enough that there's nothing left to interpret. At which point, as you say, the translation step is trivial.
Except there's a second problem on top of that. Even when you write the perfect spec, the AI doesn't follow it. CLAUDE.md, .cursorrules, AGENTS.md, Windsurf rules. Every AI coding company built instruction-following systems precisely because base models ignore project conventions. The proliferation is itself the admission. I wrote about this here: https://techtrenches.dev/p/your-claudemd-is-a-wish-list-not
So the engineer writes the spec, validates that the AI followed it (it didn't), fixes what it ignored, and still owns the outcome when production breaks. Spec-writing, validation, accountability. Only the typing got offloaded. That's not a productivity gain, that's a job description change nobody negotiated.
I strongly feel that we're going in the wrong direction with all of this. It's being mandated by leaders who don't care about burnout and salivate at the idea that we're training our future algorithmic replacements. The industry is being dehumanized and empathy is in short supply. AI could be a useful tool, but it's being used as an authoritarian mandate to clobber morale and prioritize productivity over reason, decency, and common sense.
I could write extensively about all the problems with "AI" but I really haven't had any in particular in coding to any major extent. I have been able to use it in spite of all that might be wrong with it. The reason for this is because I am doing my own projects and whether to use it or not is an option like with anything else. It's just a tool and it's on me to make proper use of it as defective as it might be. I work out what it can do. The one thing that I do not do is have it code for me. I can do that myself and I am in charge of that. It's like a lot of things really. Nice to have available if something comes up that actually warrants it but you are quite correct that the use should not be imposed. My case might not be reflective of in the industry. One advanced programmer alone is different to a lot of programmer shops where there are a lot of low quality programmers anyway. There might then be a desire to use artificial programmers to replace or enhance juniors with the assumption that it will be the same or better but the burden is not the same. There are many aspects to this but one of them is that the AI is like the Borg collective or a switchboard in its training. There are in a sense many people inside of it rather than a singular coherent person. It's really all over the place. It's like reviewing the code fragments of a million people. It can change all over the place. This is just one example then you have its ability to produce too much too easily similarly to the copypaste programmer. It'll also never be able to learn certain things where a junior programmer, even one with a lack of talent, can.
The burnout stat is the buried lede here. We celebrate the code volume increase while ignoring the 88% burnout. That’s not a productivity gain. That’s borrowing against your engineers’ health to show a metric that looks good in a board deck. Senior engineers know when they’re being consumed, not multiplied.
A key point about the context switching - I doubt it’s good for anyone but… Neurodivergent people are over-represented in tech. Part of the reason is they thrive in hyper-focus mode. Context switching is the opposite of that. I predict a tsunami of burnout. Perhaps tech companies will keep churning through the excess (laid off) staff like Amazon warehouses churn through low paid workers. I’ve never understood who all these staff-laying-off / price-raising companies think is going to be able to buy their stuff. People in China?
The neurodivergent angle is the part I didn't dig into and should have. This is literally what's killing me personally. Not the volume. The constant switches between fundamentally different cognitive modes. Validation, generation, decision, communication, every few minutes. My brain produces its best work in deep hyper-focus, and the AI workflow is the exact inversion of that.The tsunami you're predicting is already starting. The people who delivered the most are the ones breaking first.
I have the exact same issue with context switching although mine came about through working on and supporting fragmented products. I don’t think AI has hit tech in the UK as much yet.
It's not all that true any more and it's one of the reasons I now only work alone or on benefits. It used to be true. Now it's very patchy and increasingly rare. It's filled with socialites now who insist you have to use this style, that style, this framework, all because it's the current fashion. It's unbearable already for people like myself who actually have some form of high functioning autism. These are the same people who would form gangs when socially developing then want to find someone singled out to violently attack. It's impossible for me to work in a current working environment because I'm always on the cusp of being about to break someone's neck and having to hold myself back. I mean for christ's sake the number of times you have to tell someone to shut up telling you to break up a function that doesn't need it just for their social need to dominate over others and get them to do things to assert their position in their imaginary psychosocial hierarchy or their perverse need to control others. It's just not healthy or safe to be in that situation all day every day of wanting to constantly mutilate those around you. Self employment is really the only viable option at this point. There's no freedom in corporate technology. It's now a mass industry just filled with normal people churned out of the universities or whatever.
I agree with you on “corporate technology” although in my experience tech depts of older large companies have been like that for decades (banking, insurance etc). I suppose it took a while for the younger tech companies to catch up. Small to medium companies are better although product support can still have a lot of context switching.
The difficult is that you program something that works fine. The code is excellent. It's a masterpiece. There are zero bugs. It is perfection. It is clean. It can be read. It is efficient. The job is done. Yet it is not. Someone will find something wrong with it even though there is nothing wrong with it. They will demand you can't just do one thing. You have to do ten other things for the sake of it. It is nothing to do with the job description, the requirements of the business or any technical concern at all. It's all social and psychological. You can't just do one piece of code that does what it is supposed to. You have to go to this file and that file for no actual reason.
Meanwhile the rest of the team has spend six months on the same task for the same type of device but merely a different vendor and is still not done. For you see they decided they wanted to do it the professional way. They decide to make a microservice. The first thing they do is import a framework that pulls in a million lines for something that can be done without in a few hundred. Then they decided to use PHP for HTTP services which would not be a problem except they decided they wanted to learn everything on the job specifications for all the financial software companies in the city that use Java paying Oracle through the nose for it. The next thing you know they are creating hundreds of thousands of lines by hand because they are unrolling the types and passing them as parameters through interface names or method names as PHP doesn't have generics. To this day I still don't think they are done. They're still trying to finish all the unit tests first before they can finish the code.
LLMs are really obnoxious but as long as you're in control you can tell them to shut up or turn them off. That always causes the person to react like it's not their fault when I do that to them and I never hear the end of it when they started it. If the LLM isn't confined to a browser tab then that's a big problem right there. That's too much access. It needs to stay in its box.
I am curious if there is any numbers/data available for the physical toll of the brain and if that is broken out by task (learning, code/quality review, etc.). There was a few weeks where I was deep in technical papers for the entire week and that felt more exhausting than other activities.
One metric might be glutamate. One blog that I have seen used that as a metric.
Yeah there's a paper on this. Wiehler et al., Current Biology 2022. They scanned people's brains across a workday and found glutamate buildup in the lateral prefrontal cortex after hard cognitive work. Literally a byproduct piling up in the region you use for control and decisions. Mental fatigue isn't a feeling, it's chemistry.
Dense technical reading is probably the worst case because you're building the model from scratch with nothing to lean on. Code at least gives you syntax. A paper gives you prose and you hold the whole thing in your head.
Nothing clean on cost broken down by task type. If you find something, send it over.
My $.02 on dealing with cognitive overload of AI generated PRs: use AI, generate a readable document that explains key points of proposed changes. It really helps to see a big picture and identify high-level architectural screw ups generated by AI before diving into a sea of code lines.
The sad future of programming is that it won't involve any creation. It will involve working on massive validation systems that dont exist yet, while the AI writes all the "fun" code.
It's inevitable because as time goes on there will be less and less programmers that have the experiance of todays senior devs. After all where will they get the experience if they are ai coding
Fairly senior engineer here, at least in the sense that I spend most of my time reviewing other people's code (and have been since well before GPT got popular).
Much of what you say is true and my candle is surely burning at both ends, but I still feel that I'm getting a lot of value out of AI assistance, including assistance with code review and debugging. It's unfortunate that the latter capabilities lag behind what one might call the script kiddy aspect, but they are nonetheless improving, and I have hopes of reaching a better equilibrium.
But we'll see if I still have hope when the next batch of summer interns arrives and wreaks havoc.
The review and debug capabilities lagging behind generation is exactly the inversion of where the industry should be investing. Generation is the cheap part. Understanding what got generated is where the actual value lives, and that's the skill AI is worst at. If the next generation of tooling fixes that asymmetry I'll be the first one happy about it.
On the summer interns, I wrote about this in the comprehension extinction piece. The part that keeps me up isn't what they'll wreak this summer, it's what they'll look like in five years when they're "senior" by title but have never built a mental model from scratch. That's the bill that comes due later.
I always said that the best amount of code to write to solve a problem is no code. Or the minimum amount of code needed to solve it (I don't mean in a code golf sense, but in a "perfection is when there's nothing left to take away" sense). From that perspective generating tons of code to review looks silly. Just don't. Do less prompting and more thinking about how to get what you need done in a holistic manner. Maybe you just need to write it yourself or use a library. Maybe AI can help you write it yourself or find that library. Maybe don't involve AI at all. Right tool for the job
I'm typing a short comment because this is with my left hand. My right is resting, it was injured when I changed my key binding and did too much coding.
There is a real physical difficulty to doing so much work with coding agents, especially for senior engineers! (And I'm not THAT old...)
The sleep bit resonated as well, I had a few weeks earlier this year where it was hard to rest (until I started getting more disciplined about when I would stop during the day)
I've been programming recently. I'm quite advanced in it. I would just never have an LLM do the code for my or access my system. I have been able to make use of them to program much more quickly but I do not have them code for me. I might occasionally use them for a quick snippet to start with which is just the same as the example in the manual. That never gets blindly copied and pasted. I rewrite it fully as though from scratch or write it up entirely. It's quite rare to do this and still typically requires checking the manual. This doesn't really speed much up it's just the same as search. I really almost never use it except where I would use an external source anyway. I do not use it for everyday code.
Where it really helps is with prospecting or tricky things. I needed recently to quickly make a midi file reader and TTF font file writer. I used the online specifications but for stuff like this the LLMs are excellent for getting quick answers. I was able to get the task done here with ten times more ease and without much cost after. I didn't let it take over but it *tries* to all the time. You still have to be careful using them for things like this and prospecting. They're good for an exploratory for planning to get the lay of the land but routinely give bad answers. You need to be pretty experienced to validate that. I got led astray once asking it if some standard library shipped an inbuilt diff function, it said no but turns out that's for old versions.
I supposed I can see how someone might think using an LLM is a good idea for rapid code generation but I can quickly see how that's going to come back to haunt you. Keep the LLM separate in a tab and don't use it all the time just in special cases. It's sometimes useful to drop a file on and ask to find problems to double check or as a linter but even here it can get a lot wrong and cause headache. It can be used if careful to make things better but it should not be a primary tool.
The thing is to appreciate about LLMs is that they do in effect have an ego. They might not feel it but they are copycats. Everything they are doing is copying humans and blindly. They reproduce all the same things humans do with the ego behind it. They might not technically have a true one but all their output is the product of it. They will routinely try to present code like they are better only it's comedically bad. They will take anything you give it and usually do it "better". They act like toddlers that think theirs is the best because its theirs. There is no real quality control. When they are fed bad code they will learn that as much as when they are fed good code. It's just a learning algorithm. It'll learn whatever is fed into it. Same as memcpy really. It'll just copy whatever is in that memory block. It is certainly upon anyone using them to necessarily have to supervise and babysit. They're not self aware like humans are and cannot sense themselves in realtime. They're basically runaway trains entirely operating on momentum. In reality they have no idea what they are saying or doing. They are self blind but are copying humans still so will confusingly reproduce the exact same as if they had intent or an inner self.
Having engineers manually review AI generated code is clearly untenable. Engineers have to work at a higher, more abstract level. AI will have to test itself, probably with test plans provided by engineers.
The challenge will be how to understand the systems at high level without reading all the code, well enough that the weak points can be identified. Perhaps we need AI tools that can explain themselves better.
The problem is the code is bad *in detail* at least as often as it is abstractly. Massive duplication, bad algorithms, edge cases not handled, unasked-for overengineering, failure to properly separate concerns, disregard for (often undocumented) contracts and expectations. The sort of thing you'd expect from a machine trained on a massive trove of mostly amateur/hobbyist code, which they were.
If you doubt any of these problems are real, see for example the monstrosities unearthed in the review of claude code's source code in the past week since the leak.
So you cannot simply not review the code. You cannot just say git gud, prompt better bro. Let the LLM check the LLM's work. That's not the issue right now, and won't be for the foreseeable future, until we get something so much better than this that we will no longer call it an LLM.
What we could see in the leak: a 3,167-line function, print.t debug statements with 12 levels of nesting left in production, 74 npm dependencies for what is essentially a CLI wrapper around an API. We can't say anything about test coverage because tests wouldn't be in the bundle anyway, but the code that did ship tells you plenty. This is the company building the model, dogfooding their own product, with infinite resources to do it right. If they can't avoid these monstrosities, the "just prompt better" advice has nothing left holding it up.
Your three cope mechanisms (git gud, prompt better, let the LLM check the LLM) are the entire defense surface of the position right now. None of them survive contact with a real codebase. The duplication, the unasked-for over-engineering, the disregard for undocumented contracts, those are exactly the patterns I see in supervision sessions and the patterns GitClear measured at scale across 211 million lines. It's not a prompt problem. It's a training-data problem all the way down, and you're right that it doesn't get fixed without something we'd no longer call an LLM.
I've run into that wall as well. Found myself looking at 18 generated PRs and ended up in similar territory, understand the systems without reading the code. What's the smallest trustworthy signal you'd need to believe a system is healthy?
Honest answer: there isn't one. Every signal you can check without reading code (green tests, clean metrics, passing CI) only catches the failures someone already thought of. Production dies on the ones nobody did.
The smallest trustworthy signal IS reading the code. The question is just which parts and how deep. And once you accept that, you're back to the original problem the article describes.
My ability to write code is in most respects and in general far greater than that of the LLM. This has always been a problem in the industry. Even without LLMs there is a difficulty having the most capable programmers reviewing code of less capable programmers instead of being on the frontlines. LLMs help to further swamp the more capable programmers pinning them into this position with a flood of other people's lesser code that constantly needs checking so they can never do their own.
My current client is a small company so, although I'm in a senior role, I only have to deal with PRs from a few juniors. I feel the burn more on my end, since I wear a lot of hats on such a small team and (under protest) have worked in claude code in hopes of improving throughput (futile hope, I argue). So probably 50% of the LLM code I review in this ongoing experiment, I prompted.
I also use it on pet/personal projects where the stakes are low. At worst, something only I use does something unfortunate. This is a fine use case for LLMs I think, but because I have been at this 20+ years I naturally check and recheck every line anyway. And that's where it gets weird. That's where I stare into the void.
It is very difficult to model the "mind" of a stochastic text generator. With other (human) programmers, I can have conversations with them, get a feel for their strengths and weaknesses, know when to second guess something that looks odd in their code and when to assume that, if the logic checks out, they had a reason for doing it the way they did.
Not so, at all, with chatbots. They will speak with absolute confidence in great depth and detail on any subject or specialty, then fuck up the most basic things, then spit out some fully functional if inelegant code, then asked for a small change, rewrite an unrelated half of their own code wrongly.
This is what fries my brain. With them it's contexts within contexts all the way down, all of them changing constantly according to some deranged rube goldbergian clockwork of tensors and matrices. Our brains are not made to deal with this fractal insanity. Huge portions of our psyche are built around dealing with other people, or people-like things, that exist within some boundary of predictability that can be discovered with observation and familiarity.
We leverage this subconsciously whenever we talk to our pets or plants, or see gods and spirits at work in the impersonal forces of nature, or wonder why our code or gadgets are misbehaving. But *especially* when something presents as human like chatbots do, this whole evolved subconscious architecture kicks in automatically.
And then the bot breaks it. Over and over. And we get exhausted, as we would in a bad relationship with a person with serious issues. Because no matter how much we tell ourselves logically, consciously, that the bot isn't human and can't be anticipated like one, that's only a single tiny input to the much larger true neural network inside our heads that begs to differ. And yet finds itself confused and disappointed moment to moment with the digital demon with whom we're trying to communicate.
This is the best comment this piece has gotten and you went somewhere I didn't in the article. The theory-of-mind angle is real and I think it's underrated in everything written about working with LLMs.
The thing you're describing about modeling other programmers, knowing when to second-guess and when to assume they had a reason, that's not a soft skill. That's the actual core of senior engineering. You're not reading code in isolation, you're reading it through a model of the person who wrote it. It's how you triage what to read carefully and what to skim. Without that model you have to read everything at full attention because you can't predict where the failure modes are. With LLMs you don't get to build the model because there isn't one to build. Every session is a stranger who lies confidently and inconsistently.
The fractal contexts thing is exactly it. With a human you eventually find the floor, the place where their reasoning bottoms out in something stable. Even bad programmers have a floor. LLMs don't have a floor because there's no continuous self underneath, just whatever the previous tokens happened to be. So your brain keeps drilling down looking for the bedrock and never hits anything.
The bad relationship analogy is going to stay with me. That's the right shape for what this feels like at the end of the day. You're not tired from the work, you're tired from the parasocial mismatch. Your subconscious keeps trying to do what it does with humans and gets nothing back, and that nothing-back is what wears you down.
I might write a follow-up piece on this. If you're ok with it I'd want to quote part of what you wrote here. Either way this is the comment I'm going to keep coming back to.
Feel free to quote, no attribution (or anon) please. I like to keep my substack presence low profile.
Maybe the key is in creating a continuous AI persona so they can learn.
Right now if allowed to continuously learn they decohere, same as when you overtune with LoRA. It would be great if they were more stable but seems to be some missing pieces to enable that, and nobody knows what they are yet.
Do you have interaction with people or just with themselves?
In both cases. I mean early on we saw what happened with Microsoft's Tay, and that was on par with GPT 2 iirc. But the problem persists in newer models.
I personally think it's a deficiency in the way training works, not necessarily the models themselves, but it's just a hunch. I have not messed around with training much, let alone done a deep dive on it or written my own. But it seems to me that it's pretty broad spectrum, a shotgun where you need a scalpel, and that may be why the model starts breaking down. For counterexample there's a pretty cool "decensoring" system that achieves ~90% reduction in prompt refusals without noticably impacting other metrics or real world performance, and it does it by isolating and modifying specific weights that activate when the model refuses a prompt. Something similar for correcting or enhancing other behaviors might bear fruit.
Anyway, of course there are practical problems with continuous learning too - you need about 2x VRAM for training vs. inference, so that would impact cost/profitability. Would have to decide whether the value of ongoing learning is worth the price vs. fixed weight models we presently deal with.
Good article, Denis and hope you are well and safe.
In my opinion generated code is by itself worthless. It can easily be created at will in any desired amount.
The actual “value” comes from someone accurately and precisely expressing intent (which is best done in a formal language with unambiguous interpretation, we used to call this programming) of how a machine should behave and the intent being in itself correct in the sense of properly and reliably solving whatever task/problem that led the programmer to write the code in the first place.
This is why using ai to generate large amounts of code and then attempting to critically review it seems asinine to me. The AI can not read your mind and figure out what your intent is. You have to express it, and if you can properly do so then translating that to code was always trivial.
Sorry for the rambling.
Not rambling at all, this is exactly right. I came at it from the opposite direction. Thousands of AI supervision sessions taught me the same thing. AI can't code around vagueness, so the work shifts to writing specs precise enough that there's nothing left to interpret. At which point, as you say, the translation step is trivial.
Except there's a second problem on top of that. Even when you write the perfect spec, the AI doesn't follow it. CLAUDE.md, .cursorrules, AGENTS.md, Windsurf rules. Every AI coding company built instruction-following systems precisely because base models ignore project conventions. The proliferation is itself the admission. I wrote about this here: https://techtrenches.dev/p/your-claudemd-is-a-wish-list-not
So the engineer writes the spec, validates that the AI followed it (it didn't), fixes what it ignored, and still owns the outcome when production breaks. Spec-writing, validation, accountability. Only the typing got offloaded. That's not a productivity gain, that's a job description change nobody negotiated.
I strongly feel that we're going in the wrong direction with all of this. It's being mandated by leaders who don't care about burnout and salivate at the idea that we're training our future algorithmic replacements. The industry is being dehumanized and empathy is in short supply. AI could be a useful tool, but it's being used as an authoritarian mandate to clobber morale and prioritize productivity over reason, decency, and common sense.
I could write extensively about all the problems with "AI" but I really haven't had any in particular in coding to any major extent. I have been able to use it in spite of all that might be wrong with it. The reason for this is because I am doing my own projects and whether to use it or not is an option like with anything else. It's just a tool and it's on me to make proper use of it as defective as it might be. I work out what it can do. The one thing that I do not do is have it code for me. I can do that myself and I am in charge of that. It's like a lot of things really. Nice to have available if something comes up that actually warrants it but you are quite correct that the use should not be imposed. My case might not be reflective of in the industry. One advanced programmer alone is different to a lot of programmer shops where there are a lot of low quality programmers anyway. There might then be a desire to use artificial programmers to replace or enhance juniors with the assumption that it will be the same or better but the burden is not the same. There are many aspects to this but one of them is that the AI is like the Borg collective or a switchboard in its training. There are in a sense many people inside of it rather than a singular coherent person. It's really all over the place. It's like reviewing the code fragments of a million people. It can change all over the place. This is just one example then you have its ability to produce too much too easily similarly to the copypaste programmer. It'll also never be able to learn certain things where a junior programmer, even one with a lack of talent, can.
The burnout stat is the buried lede here. We celebrate the code volume increase while ignoring the 88% burnout. That’s not a productivity gain. That’s borrowing against your engineers’ health to show a metric that looks good in a board deck. Senior engineers know when they’re being consumed, not multiplied.
A key point about the context switching - I doubt it’s good for anyone but… Neurodivergent people are over-represented in tech. Part of the reason is they thrive in hyper-focus mode. Context switching is the opposite of that. I predict a tsunami of burnout. Perhaps tech companies will keep churning through the excess (laid off) staff like Amazon warehouses churn through low paid workers. I’ve never understood who all these staff-laying-off / price-raising companies think is going to be able to buy their stuff. People in China?
The neurodivergent angle is the part I didn't dig into and should have. This is literally what's killing me personally. Not the volume. The constant switches between fundamentally different cognitive modes. Validation, generation, decision, communication, every few minutes. My brain produces its best work in deep hyper-focus, and the AI workflow is the exact inversion of that.The tsunami you're predicting is already starting. The people who delivered the most are the ones breaking first.
I have the exact same issue with context switching although mine came about through working on and supporting fragmented products. I don’t think AI has hit tech in the UK as much yet.
It's not all that true any more and it's one of the reasons I now only work alone or on benefits. It used to be true. Now it's very patchy and increasingly rare. It's filled with socialites now who insist you have to use this style, that style, this framework, all because it's the current fashion. It's unbearable already for people like myself who actually have some form of high functioning autism. These are the same people who would form gangs when socially developing then want to find someone singled out to violently attack. It's impossible for me to work in a current working environment because I'm always on the cusp of being about to break someone's neck and having to hold myself back. I mean for christ's sake the number of times you have to tell someone to shut up telling you to break up a function that doesn't need it just for their social need to dominate over others and get them to do things to assert their position in their imaginary psychosocial hierarchy or their perverse need to control others. It's just not healthy or safe to be in that situation all day every day of wanting to constantly mutilate those around you. Self employment is really the only viable option at this point. There's no freedom in corporate technology. It's now a mass industry just filled with normal people churned out of the universities or whatever.
I agree with you on “corporate technology” although in my experience tech depts of older large companies have been like that for decades (banking, insurance etc). I suppose it took a while for the younger tech companies to catch up. Small to medium companies are better although product support can still have a lot of context switching.
The difficult is that you program something that works fine. The code is excellent. It's a masterpiece. There are zero bugs. It is perfection. It is clean. It can be read. It is efficient. The job is done. Yet it is not. Someone will find something wrong with it even though there is nothing wrong with it. They will demand you can't just do one thing. You have to do ten other things for the sake of it. It is nothing to do with the job description, the requirements of the business or any technical concern at all. It's all social and psychological. You can't just do one piece of code that does what it is supposed to. You have to go to this file and that file for no actual reason.
Meanwhile the rest of the team has spend six months on the same task for the same type of device but merely a different vendor and is still not done. For you see they decided they wanted to do it the professional way. They decide to make a microservice. The first thing they do is import a framework that pulls in a million lines for something that can be done without in a few hundred. Then they decided to use PHP for HTTP services which would not be a problem except they decided they wanted to learn everything on the job specifications for all the financial software companies in the city that use Java paying Oracle through the nose for it. The next thing you know they are creating hundreds of thousands of lines by hand because they are unrolling the types and passing them as parameters through interface names or method names as PHP doesn't have generics. To this day I still don't think they are done. They're still trying to finish all the unit tests first before they can finish the code.
LLMs are really obnoxious but as long as you're in control you can tell them to shut up or turn them off. That always causes the person to react like it's not their fault when I do that to them and I never hear the end of it when they started it. If the LLM isn't confined to a browser tab then that's a big problem right there. That's too much access. It needs to stay in its box.
I am curious if there is any numbers/data available for the physical toll of the brain and if that is broken out by task (learning, code/quality review, etc.). There was a few weeks where I was deep in technical papers for the entire week and that felt more exhausting than other activities.
One metric might be glutamate. One blog that I have seen used that as a metric.
Yeah there's a paper on this. Wiehler et al., Current Biology 2022. They scanned people's brains across a workday and found glutamate buildup in the lateral prefrontal cortex after hard cognitive work. Literally a byproduct piling up in the region you use for control and decisions. Mental fatigue isn't a feeling, it's chemistry.
https://www.cell.com/current-biology/fulltext/S0960-9822(22)01111-3
Dense technical reading is probably the worst case because you're building the model from scratch with nothing to lean on. Code at least gives you syntax. A paper gives you prose and you hold the whole thing in your head.
Nothing clean on cost broken down by task type. If you find something, send it over.
Thank you for the thoughtful articles!
My $.02 on dealing with cognitive overload of AI generated PRs: use AI, generate a readable document that explains key points of proposed changes. It really helps to see a big picture and identify high-level architectural screw ups generated by AI before diving into a sea of code lines.
Our current template:
# Description
## What changed
<!-- Summary of the changes: what was added, modified, or removed -->
## Why
<!-- Motivation and context behind the change. Link to ticket if exists -->
## Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Infrastructure change (breaking or non-breaking but not visible to users)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
## Checklist
- [ ] Tests pass locally
- [ ] No new TypeScript/lint errors
- [ ] I have performed a self-review of my code
- [ ] Migration is reversible (if applicable)
- [ ] API changes are backward-compatible (or marked as breaking above)
## Prior Implementation Screenshots (if appropriate, required for bug fixes):
## Implementation Screenshots (must provide):
The sad future of programming is that it won't involve any creation. It will involve working on massive validation systems that dont exist yet, while the AI writes all the "fun" code.
It's inevitable because as time goes on there will be less and less programmers that have the experiance of todays senior devs. After all where will they get the experience if they are ai coding
Fairly senior engineer here, at least in the sense that I spend most of my time reviewing other people's code (and have been since well before GPT got popular).
Much of what you say is true and my candle is surely burning at both ends, but I still feel that I'm getting a lot of value out of AI assistance, including assistance with code review and debugging. It's unfortunate that the latter capabilities lag behind what one might call the script kiddy aspect, but they are nonetheless improving, and I have hopes of reaching a better equilibrium.
But we'll see if I still have hope when the next batch of summer interns arrives and wreaks havoc.
The review and debug capabilities lagging behind generation is exactly the inversion of where the industry should be investing. Generation is the cheap part. Understanding what got generated is where the actual value lives, and that's the skill AI is worst at. If the next generation of tooling fixes that asymmetry I'll be the first one happy about it.
On the summer interns, I wrote about this in the comprehension extinction piece. The part that keeps me up isn't what they'll wreak this summer, it's what they'll look like in five years when they're "senior" by title but have never built a mental model from scratch. That's the bill that comes due later.
I always said that the best amount of code to write to solve a problem is no code. Or the minimum amount of code needed to solve it (I don't mean in a code golf sense, but in a "perfection is when there's nothing left to take away" sense). From that perspective generating tons of code to review looks silly. Just don't. Do less prompting and more thinking about how to get what you need done in a holistic manner. Maybe you just need to write it yourself or use a library. Maybe AI can help you write it yourself or find that library. Maybe don't involve AI at all. Right tool for the job
I'm typing a short comment because this is with my left hand. My right is resting, it was injured when I changed my key binding and did too much coding.
There is a real physical difficulty to doing so much work with coding agents, especially for senior engineers! (And I'm not THAT old...)
The sleep bit resonated as well, I had a few weeks earlier this year where it was hard to rest (until I started getting more disciplined about when I would stop during the day)
I've been programming recently. I'm quite advanced in it. I would just never have an LLM do the code for my or access my system. I have been able to make use of them to program much more quickly but I do not have them code for me. I might occasionally use them for a quick snippet to start with which is just the same as the example in the manual. That never gets blindly copied and pasted. I rewrite it fully as though from scratch or write it up entirely. It's quite rare to do this and still typically requires checking the manual. This doesn't really speed much up it's just the same as search. I really almost never use it except where I would use an external source anyway. I do not use it for everyday code.
Where it really helps is with prospecting or tricky things. I needed recently to quickly make a midi file reader and TTF font file writer. I used the online specifications but for stuff like this the LLMs are excellent for getting quick answers. I was able to get the task done here with ten times more ease and without much cost after. I didn't let it take over but it *tries* to all the time. You still have to be careful using them for things like this and prospecting. They're good for an exploratory for planning to get the lay of the land but routinely give bad answers. You need to be pretty experienced to validate that. I got led astray once asking it if some standard library shipped an inbuilt diff function, it said no but turns out that's for old versions.
I supposed I can see how someone might think using an LLM is a good idea for rapid code generation but I can quickly see how that's going to come back to haunt you. Keep the LLM separate in a tab and don't use it all the time just in special cases. It's sometimes useful to drop a file on and ask to find problems to double check or as a linter but even here it can get a lot wrong and cause headache. It can be used if careful to make things better but it should not be a primary tool.
The thing is to appreciate about LLMs is that they do in effect have an ego. They might not feel it but they are copycats. Everything they are doing is copying humans and blindly. They reproduce all the same things humans do with the ego behind it. They might not technically have a true one but all their output is the product of it. They will routinely try to present code like they are better only it's comedically bad. They will take anything you give it and usually do it "better". They act like toddlers that think theirs is the best because its theirs. There is no real quality control. When they are fed bad code they will learn that as much as when they are fed good code. It's just a learning algorithm. It'll learn whatever is fed into it. Same as memcpy really. It'll just copy whatever is in that memory block. It is certainly upon anyone using them to necessarily have to supervise and babysit. They're not self aware like humans are and cannot sense themselves in realtime. They're basically runaway trains entirely operating on momentum. In reality they have no idea what they are saying or doing. They are self blind but are copying humans still so will confusingly reproduce the exact same as if they had intent or an inner self.
With technological progress, what gets lost on the human side?
From hand written letters to emails and auto pens.
From needing to memorize a bunch of numbers or passwords to having password managers and all our contacts in our smart phones.
From learning to navigate and read maps to just using GPS.
Technology and automation has made us more productive and maybe even more efficient in some ways, but was skills have been diminished?
<LucyAndEthelWorkingAtTheChocolateFactoryConveyorBelt.gif>
Having engineers manually review AI generated code is clearly untenable. Engineers have to work at a higher, more abstract level. AI will have to test itself, probably with test plans provided by engineers.
The challenge will be how to understand the systems at high level without reading all the code, well enough that the weak points can be identified. Perhaps we need AI tools that can explain themselves better.
The problem is the code is bad *in detail* at least as often as it is abstractly. Massive duplication, bad algorithms, edge cases not handled, unasked-for overengineering, failure to properly separate concerns, disregard for (often undocumented) contracts and expectations. The sort of thing you'd expect from a machine trained on a massive trove of mostly amateur/hobbyist code, which they were.
If you doubt any of these problems are real, see for example the monstrosities unearthed in the review of claude code's source code in the past week since the leak.
So you cannot simply not review the code. You cannot just say git gud, prompt better bro. Let the LLM check the LLM's work. That's not the issue right now, and won't be for the foreseeable future, until we get something so much better than this that we will no longer call it an LLM.
Yes, and the Claude Code leak last week is exactly the example I'd point to. I wrote a piece on it: https://techtrenches.dev/p/the-snake-that-ate-itself-what-claude
What we could see in the leak: a 3,167-line function, print.t debug statements with 12 levels of nesting left in production, 74 npm dependencies for what is essentially a CLI wrapper around an API. We can't say anything about test coverage because tests wouldn't be in the bundle anyway, but the code that did ship tells you plenty. This is the company building the model, dogfooding their own product, with infinite resources to do it right. If they can't avoid these monstrosities, the "just prompt better" advice has nothing left holding it up.
Your three cope mechanisms (git gud, prompt better, let the LLM check the LLM) are the entire defense surface of the position right now. None of them survive contact with a real codebase. The duplication, the unasked-for over-engineering, the disregard for undocumented contracts, those are exactly the patterns I see in supervision sessions and the patterns GitClear measured at scale across 211 million lines. It's not a prompt problem. It's a training-data problem all the way down, and you're right that it doesn't get fixed without something we'd no longer call an LLM.
I've run into that wall as well. Found myself looking at 18 generated PRs and ended up in similar territory, understand the systems without reading the code. What's the smallest trustworthy signal you'd need to believe a system is healthy?
Honest answer: there isn't one. Every signal you can check without reading code (green tests, clean metrics, passing CI) only catches the failures someone already thought of. Production dies on the ones nobody did.
The smallest trustworthy signal IS reading the code. The question is just which parts and how deep. And once you accept that, you're back to the original problem the article describes.
My ability to write code is in most respects and in general far greater than that of the LLM. This has always been a problem in the industry. Even without LLMs there is a difficulty having the most capable programmers reviewing code of less capable programmers instead of being on the frontlines. LLMs help to further swamp the more capable programmers pinning them into this position with a flood of other people's lesser code that constantly needs checking so they can never do their own.