25 Comments
User's avatar
Maura's avatar

Are we sure Dario Amodei himself has not been replaced by a hallucinating AI with no supervision? He certainly states a lot of BS as fact, too.

John Baker's avatar

It's rare that an infographic isn't bullshit but this tends to be the case here. From my own experience it is the case that yes it can help save 20% to 40% for a skilled specialist who knows how to verify. That's a bit of a simplification. Apart from joke things I would not even use their output. They are more of an advanced search. That still needs verification. I can tell you what they can replace. 100% of mainstream journalists. In many cases it's more of a toy than anything else. Nearly all my use of it though, about 100% is in writing to explain things to it. The quality of its output is terrible but in explaining to it why it is wrong I find that there is some utility in that. It works better when it is the one providing the writing prompts to work out how to better write and explain an issue or to work out common misconceptions to address.

There is a problem.

> Non-determinism. Even at temperature zero, the same prompt produces different outputs. This isn’t a bug. It’s a consequence of floating-point parallel computation on GPUs. In engineering, we call components that behave unpredictably under identical conditions broken.

I don't think that is correct. The same prompt can be made deterministic. If it is not then that is rather strange. If that is the case then it is deliberate though it's not truly non-deterministic. It should be entirely deterministic. That does not mean reliable or predictable but it's not the non-determinism that is the source of the problem fundamentally. It takes a more complex reason to explain why. It's probability based in a sense which might cause this misunderstanding. It is just a guessing machine though.

I will try to explain a simplification. You scan text for all pairs of word. For each word you count every word that comes after. The autocomplete then when you type in a word always selects to show you the word that comes after in the scanned text the most. That's a very simple implementation. These LLMs from what I understand expand upon that but it really is just doing the same thing but with more context than just the first word. The scale of it is what makes it non-deterministic plus seeds plus perhaps not caring about synchronising timing and order for some calculations. That's emergent non-determinism but not true non-determinism in that every operation should be deterministic and if they really wanted to they could make sure order isn't arbitrary. The actual issue is that it has no true real understanding. It's just predicting and guessing using that style of approach. It's blind and has no idea what it is actually saying.

There is another paradox. This idea that AI is going to replace loads of jobs but then make loads of money. How then can people pay for it without jobs? Was this the plan for UBI and then to only have these weird organisations ruling over us paying each other for what?

A fundamental issue is that AI cannot self correct the way a skilled and competent human can. This can be misleading. It is just copying and pasting from a huge compressed text database when it comes down to it. It cannot quality control. It can be very impressive for a moment but it cannot sustain it. I never like to make people think certain things about my abilities in areas as some I cannot sustain and only do in the moment. I do not like then people to expect me to always be able to perform. LLMs are like this to the extreme only they do not have self awareness so will just go ahead and show off with no mitigation giving the wrong impression. They can also speak in a way that is very confident and might fool a layman. A recent example...

I talked about the criminalisation of speech in my country. It then responded as if correcting me in a very formal and technical manner so as to present in everyway as though legitimately correcting me. Only what it was referring to was civil law not criminal law. The original point I made is that the truth is not a defence in most cases. It then said that truth is a complete defence for defamation. Uh oh. No it is not. You can say something entirely true and that you know with absolute certainty and still lose the case. In this country you have to prove it is true and entirely according to whatever the court defines as proof. The way the LLM can write this is exquisite. It's a master actor and would fool many. It would pass many people without raising doubts if it were a legal drama. It can blag and pretend better than I certainly could yet it is not the real deal and cannot actually do what it is trying to do. Based on what it said you would think that you would then be safe if you stick to saying the truth. That is wrong and you would be in a lot of trouble if you relied on it for legal compliance in your business. If you tell it this, it can realise the mistake but doesn't stop it making it regardless which is what will really catch the uninitiated off guard.

The other problem with it is that it cannot learn. It has a learning disability. I could technically sit there and write a huge text on what I do and am programming including showing it code. It simply would not be able to do it. I would not be able to fully teach it to replace me. I could get part of the way there but the closer I get, the more I am doing it all for it anyway in explaining it. Even then, it cannot continue it as I would. LLMs just sort of drift like characters in the game that just drift in the same direction under momentum when the network goes down. The moment you let go of the reigns that will happen. Your steering wheel notion is applicable. You just can't let go of it. It will not be able to stay on the road.

Denis Stetskov's avatar

You're right that each individual operation is deterministic. The non-determinism is emergent from GPU parallel computation, batch ordering, and floating-point accumulation order. Same prompt, same temperature 0, different hardware path, different output. It's documented by both Anthropic and OpenAI. But your broader point stands: it's a guessing machine that can't self-correct, and the confident tone is exactly what makes it dangerous. We agree on more than we disagree.

John Baker's avatar

It's more the case that the lack of the ability to realistically provide determinism due to price even if technically possible further adds to the underlying problem. It's more an issue of semantics and the order in which you present them. I think you go from the layered problem then to the fundamental inescapable problem.

Denis Stetskov's avatar

Fair point. You're right that starting from price makes the argument stronger. Even if determinism is technically achievable, the cost makes it impractical at production scale, and that's the inescapable part. I led with the architectural layer when the economic constraint is the harder floor. Better framing, thanks.

Anu Sridharan's avatar

I love this article and agree with everything here. I have a different question though.

I've been talking with my founder friends and they're worried for a completely different reason. I agree retrofitting existing companies is hard and absurd, the job cuts are probably uncalled for. But my founder friends with 300-500 people companies are worried because their customers are ALREADY telling them AI native startup competitors are offering them the same product at half the cost. These are companies with none of the problems you're describing because they've built a company around AI.

This is the part I haven't heard a good answer to. Retrofit the existing work = really difficult. But what about these small companies, that probably don't need much investor money, that are quielty disrupting the industry? And it's the CUSOTMERS that are driving this disruption, ironically. I used to work at one of these AI Native startups so I know- small team, huge revenue, insanely profitable, and competing with much larger teams. Right now they aren't cutting costs because they don't have to. But they could if they wanted to.

That's what I'm hearing/seeing at least - what do you think?

Denis Stetskov's avatar

AI made strong engineers faster. It also made mediocre engineers produce garbage at scale. There's no middle ground.

Those AI-native startups your friends are worried about? Either they have genuinely strong teams who use AI as a multiplier, or they're shipping fast and fixing later. Half the cost today, twice the technical debt tomorrow.

I build AI products daily. The clients who come to us after trying the cheap AI-native competitor all tell the same story: it worked great for three months, then everything started breaking and nobody on the team could explain why.

The real question isn't cost. It's whether those small teams can maintain quality as they grow. So far the answer is: only if the engineering is strong without AI. AI amplifies what's already there. Strong discipline becomes speed. No discipline becomes faster garbage.

Anu Sridharan's avatar

Yup that totally makes sense. And I actually just interviewed a YC alum/CEO/building a new startup in his free time, on this exact point. We're seeing too many weak engineers (or even non technical people generally) try and build SaaS products (or the such). That is definitely the AI hype you're talking about in your article. And that's a bad idea. We're going to publish that interview hopefully on Friday. It speaks to your point.

What you need to do is still find a strong technical co-founder to build out your vision. That part hasn't changed, and I don't think it gets talked about enough.

And I guess the other questions is...if you're someone with a vision, can you actually spot a good technical co founder? That's an even bigger question I guess?

Anyway, thanks for sharing and for this discussion! I'm learning a lot!

Denis Stetskov's avatar

Spotting a good technical co-founder is the real problem. Non-technical people often can't evaluate engineering complexity. That's not a software-specific issue. You see it in construction, medicine, law. The person who explains things most confidently is rarely the most competent. They're just the best communicator. And hiring for communication when you need execution is how bad technical debt starts on day one.

Anu Sridharan's avatar

That's a great point. Do you have anything written about this topic? I would love to learn more. And you're right that you never know until it's too late. I wonder though if there's ways to figure it out?

Fabrice Talbot's avatar

Great post! Seems that it takes an analytical and/or developer mind those days to read through AI fallacies. The deterministic vs non-deterministic debate is lost on most people. You can’t build missing critical systems on hope.

Denis Stetskov's avatar

"You can't build missing critical systems on hope." That's the sentence. The deterministic/non-deterministic distinction is the part I keep trying to explain to non-technical executives and it never lands. They hear "AI is getting better" and assume the gap is closing. It's not a gap. It's a category difference.

Henrik van der Pol's avatar

I’m trying to understand this all as a non-technical person.

“the same prompt produces different outputs”

While the wording may vary, the essence of the output may still be the same, correct? When I ran a contract for review several times through Perplexity, the conclusion was basically the same. Tried Gemini, same conclusion. But maybe this is a rather simplistic use case and you’re aiming at something else?

Denis Stetskov's avatar

Good question. For simple tasks like summarizing a contract, you're right. The gist stays the same. That's where AI works well.

The problem shows up when details matter. Ask it to review a contract clause for liability risk and run it five times. You'll get five different risk assessments. Ask it to diagnose a medical symptom. Four studies published this month found AI chatbots gave dangerously wrong medical advice in nearly half of cases, with fabricated citations, while sounding completely confident.

And there's a deeper issue: you have zero guarantee the model gives you the same answer next year. Every time they retrain or update the weights, the output changes. That contract review you relied on in January might produce a completely different conclusion in June. Not because the contract changed. Because the model did. You'd never accept that from a lawyer or an accountant. But somehow it's fine from AI.

The simpler the task, the more consistent the output. The higher the stakes, the more the variability kills you. And the worst part: it sounds equally confident either way.

MH's avatar

> It’s a consequence of floating-point parallel computation on GPUs.

I'm curious - as all the (non-GPU) FP hardware that I'm familiar with has deterministic behaviour (executing the same instruction sequence with same data --> same results). In the absence of bugs, non-determinism should be a choice of the algorithm developer.

John Baker's avatar

I suspect they just don't bother to make it repeatable for performance. This means results can be arbitrary based on race conditions as in which batch finishes first. Cache might impact order and other tasks interleaved. It is an error in the article though as the source of the unreliability is far more fundamental than that. It would not matter if you did fix it so that it is repeatable should should be doable. The same hard limitations would still apply.

MH's avatar

Thanks, that makes sense - in its own sad way. I guess the reasoning goes that since there is no way to guarantee that the output is accurate, it “doesn’t hurt” to discard repeatability…

Stephen Thair's avatar

"Company D has critical data in Excel spreadsheets that get emailed between departments every Friday afternoon."

EVERY COMPANY has critical data in Excel spreadsheets that get emailed between departments every Friday afternoon...

How do we know it's critical? Because it's in Excel and being mailed around every week...

Jeremy Collins's avatar

The airplane analogy is really well said

vcinvest's avatar

Amodei and Altman do it because of valuation to support naich high valuation you must change every office worker

Nick Ruisi's avatar

13K line commit? I need to ask what on earth takes 13,000 lines of code to do? Immediately visions of a hard-coded isEven(int n) { if (n==1) return false; … } type of function come to mind. I know of one similar function in my company’s code base that we are actually trying to use models to refactor. A bear of a function that creates one of the ugliest SQL statements I’d ever seen. I’m not sure of the line count of that function, but it’s up there, thanks to server-side includes.

gregvp's avatar

Why does this look like AI-generated slop?

Geoff Gallinger's avatar

There’s a whole genre of essays on substack that are ai-generated and criticizing ai-optimism. I keep running into them. An enterprising cultural critic could probably come up with some interesting hot takes about what this says about our culture. I have to marinate on it a bit more myself.

Joe Biden's avatar

Seems like it for sure. Substack needs to crack down on this slop