13 Comments
User's avatar
Fabrice Talbot's avatar

I'm a number guy and I loved this one!

The scariest point you made is the reliance on AI by juniors and seniors (soon). It seems we created our own trap and there's no way out. Don't get my wrong, I love AI and everything it has to offer. I'd like to use it on my own terms rather than becoming a slave.

Francis Turner's avatar

I restacked this post with a comment on exactly that point - https://substack.com/profile/13379579-francis-turner/note/c-265852080

AI dependency like that is really really bad

pobrecollie's avatar

Isn’t the promise of Deepseek to achieve similar results at a fraction of the cost? Why hasn’t that taken off? Open AI and Claude both tell us how much better each iteration of their product is, but I don’t notice much difference personally.

Denis Stetskov's avatar

You're right that you don't notice much difference. That's because there isn't much. Core capability plateaued around Opus 4.5 level. Since then both companies pivoted from model improvements to product launches. Claude Code, Artifacts, computer use, agents. Codex, Operator, Canvas, o-series. New products for PMs, designers, everyone. Not because the models got better, but because announcements are the product now. Investors need to see momentum. Momentum needs headlines. And the money from those rounds goes straight into compute to keep the lights on.

Juan Pablo Barraza's avatar

I love how this article spells out bubble without even mentioning the word. The saving grace of this is that we can kiss the AI god Armageddon good bye. The only thing that can happen to make any of this sustainable would be nuclear fusion

MH's avatar

A couple of things I like about Cursor is that it operates as a broker - so I can use their own model or any of the OpenAI and Anthropic models. And that most of the time their in-house model is good enough (I've adapted my workflow to make most requests medium-complexity; it's interesting to see what requests get routed to the high-end models).

Denis Stetskov's avatar

You can even use your local models, but it is possible even with VSCode :)

Francis Turner's avatar

"Most of our clients run production RAG on OpenAI embeddings. The deprecation email meant one thing: their entire knowledge infrastructure sits on a model that a single API announcement can kill."

Time to run a local instance instead of a cloud one I think. This seems to be increasing practical and as old models are mostly opensourced, I think it ought to be possible to set one up using the same model and just point the API to your local instance instead of the cloud one

https://www.jaredwatkins.com/posts/2026/05/smb-inference-stack/ had some very practical advice about what to build and how much it costs

Denis Stetskov's avatar

Works great at small scale. Once you're at enterprise volumes the math gets ugly. CPU embedding inference is 10-50x slower than GPU, and the open-source models that actually match OpenAI on quality are 7B+ params, so you need real GPU anyway. One H100 is $1,500-5,000/month, if you can even get one right now. Most enterprise clients are not going to build an ML ops pipeline just to serve embeddings. They'll keep paying OpenAI until the deprecation email arrives.

Francis Turner's avatar

I thought you said the RAGs were on some older smaller model. That's what I was thinking of moving local. Sure the 7B+ models need real GPUs but I would expect it to be easy to run the RAG stuff on something simpler like a Mac mini. Heck my boss runs RAG w Ollama on his gaming laptop....

But the point about outsourcing makes sense. After all people stick everything on AWS or Azure when they could trivially host it themselves in a data center for far less.

Denis Stetskov's avatar

Totally doable for personal use or a small team. For business at scale, unlikely.

Denise Heap (private)'s avatar

Sounds like the notion of vaporware has now infected the financial side of AI.