Cursor spent $3 billion to stop losing money on every user. OpenAI won't break even until 2030. AI companies are valued like software but run like utilities. The math is catching up.
The scariest point you made is the reliance on AI by juniors and seniors (soon). It seems we created our own trap and there's no way out. Don't get my wrong, I love AI and everything it has to offer. I'd like to use it on my own terms rather than becoming a slave.
Isn’t the promise of Deepseek to achieve similar results at a fraction of the cost? Why hasn’t that taken off? Open AI and Claude both tell us how much better each iteration of their product is, but I don’t notice much difference personally.
You're right that you don't notice much difference. That's because there isn't much. Core capability plateaued around Opus 4.5 level. Since then both companies pivoted from model improvements to product launches. Claude Code, Artifacts, computer use, agents. Codex, Operator, Canvas, o-series. New products for PMs, designers, everyone. Not because the models got better, but because announcements are the product now. Investors need to see momentum. Momentum needs headlines. And the money from those rounds goes straight into compute to keep the lights on.
I love how this article spells out bubble without even mentioning the word. The saving grace of this is that we can kiss the AI god Armageddon good bye. The only thing that can happen to make any of this sustainable would be nuclear fusion
A couple of things I like about Cursor is that it operates as a broker - so I can use their own model or any of the OpenAI and Anthropic models. And that most of the time their in-house model is good enough (I've adapted my workflow to make most requests medium-complexity; it's interesting to see what requests get routed to the high-end models).
"Most of our clients run production RAG on OpenAI embeddings. The deprecation email meant one thing: their entire knowledge infrastructure sits on a model that a single API announcement can kill."
Time to run a local instance instead of a cloud one I think. This seems to be increasing practical and as old models are mostly opensourced, I think it ought to be possible to set one up using the same model and just point the API to your local instance instead of the cloud one
Works great at small scale. Once you're at enterprise volumes the math gets ugly. CPU embedding inference is 10-50x slower than GPU, and the open-source models that actually match OpenAI on quality are 7B+ params, so you need real GPU anyway. One H100 is $1,500-5,000/month, if you can even get one right now. Most enterprise clients are not going to build an ML ops pipeline just to serve embeddings. They'll keep paying OpenAI until the deprecation email arrives.
I thought you said the RAGs were on some older smaller model. That's what I was thinking of moving local. Sure the 7B+ models need real GPUs but I would expect it to be easy to run the RAG stuff on something simpler like a Mac mini. Heck my boss runs RAG w Ollama on his gaming laptop....
But the point about outsourcing makes sense. After all people stick everything on AWS or Azure when they could trivially host it themselves in a data center for far less.
I'm a number guy and I loved this one!
The scariest point you made is the reliance on AI by juniors and seniors (soon). It seems we created our own trap and there's no way out. Don't get my wrong, I love AI and everything it has to offer. I'd like to use it on my own terms rather than becoming a slave.
I restacked this post with a comment on exactly that point - https://substack.com/profile/13379579-francis-turner/note/c-265852080
AI dependency like that is really really bad
Isn’t the promise of Deepseek to achieve similar results at a fraction of the cost? Why hasn’t that taken off? Open AI and Claude both tell us how much better each iteration of their product is, but I don’t notice much difference personally.
You're right that you don't notice much difference. That's because there isn't much. Core capability plateaued around Opus 4.5 level. Since then both companies pivoted from model improvements to product launches. Claude Code, Artifacts, computer use, agents. Codex, Operator, Canvas, o-series. New products for PMs, designers, everyone. Not because the models got better, but because announcements are the product now. Investors need to see momentum. Momentum needs headlines. And the money from those rounds goes straight into compute to keep the lights on.
I love how this article spells out bubble without even mentioning the word. The saving grace of this is that we can kiss the AI god Armageddon good bye. The only thing that can happen to make any of this sustainable would be nuclear fusion
https://techtrenches.dev/p/the-ai-industrial-transformation this one with lots of bubbles :)
A couple of things I like about Cursor is that it operates as a broker - so I can use their own model or any of the OpenAI and Anthropic models. And that most of the time their in-house model is good enough (I've adapted my workflow to make most requests medium-complexity; it's interesting to see what requests get routed to the high-end models).
You can even use your local models, but it is possible even with VSCode :)
"Most of our clients run production RAG on OpenAI embeddings. The deprecation email meant one thing: their entire knowledge infrastructure sits on a model that a single API announcement can kill."
Time to run a local instance instead of a cloud one I think. This seems to be increasing practical and as old models are mostly opensourced, I think it ought to be possible to set one up using the same model and just point the API to your local instance instead of the cloud one
https://www.jaredwatkins.com/posts/2026/05/smb-inference-stack/ had some very practical advice about what to build and how much it costs
Works great at small scale. Once you're at enterprise volumes the math gets ugly. CPU embedding inference is 10-50x slower than GPU, and the open-source models that actually match OpenAI on quality are 7B+ params, so you need real GPU anyway. One H100 is $1,500-5,000/month, if you can even get one right now. Most enterprise clients are not going to build an ML ops pipeline just to serve embeddings. They'll keep paying OpenAI until the deprecation email arrives.
I thought you said the RAGs were on some older smaller model. That's what I was thinking of moving local. Sure the 7B+ models need real GPUs but I would expect it to be easy to run the RAG stuff on something simpler like a Mac mini. Heck my boss runs RAG w Ollama on his gaming laptop....
But the point about outsourcing makes sense. After all people stick everything on AWS or Azure when they could trivially host it themselves in a data center for far less.
Totally doable for personal use or a small team. For business at scale, unlikely.
Sounds like the notion of vaporware has now infected the financial side of AI.