OpenAI launched ChatGPT on November 30, 2022, and its seemingly magical conversational powers fueled the Silicon Valley narrative that scaling up large language models (LLMs) would soon achieve artificial general intelligence (AGI)—the ability to do any intellectual task as well as humans.
ChatGPT is still benefiting from its first-mover advantage, with a current 61.3% market share, compared to 14.1% for Microsoft’s Copilot and 13.4% for Google’s Gemini. However, OpenAI has the daunting problem that, unlike Microsoft and Google, it has no other substantial sources of revenue. To survive, it needs to generate profits from ChatGPT.
Unwarranted optimism
It hasn’t and it now seems to be on life support. OpenAI had losses of $5.3 billion in 2024 and $7.8 billion in the first half of 2025. Company insiders are projecting a 4-year “valley of death” with a cumulative cash burn of $115 billion over the next four years until profitability is reached in 2029. Given CEO Sam Altman’s penchant for unwarranted optimism, OpenAI’s valley of death is likely to be even more challenging than projected, which is why Altman has been suggesting a government bailout.
One of OpenAI’s problems is that LLMs will not achieve artificial general intelligence any time soon. Not knowing how the text they input and output relate to the real world, they cannot distinguish between statements that are true and those that are false or misleading. The best that can be hoped for is that human trainers can put bandaids on their ignorance. However, as I have said before, intelligence is more than just following instructions:
LLMs are unquestionably becoming more reliable and useful for many tasks. But they are no closer to true intelligence — which includes the ability to make decisions when there is no instruction manual. The post-training cannot possibly anticipate all the situations in which an LLM might be asked to provide recommendations or make decisions. Nor can it anticipate the relevant information needed at that future date to consider the possible outcomes of decisions and assess the uncertainty about those outcomes; for example, whether to accept a pre-trial settlement offer or to go to trial.
Business users are unlikely to pay large amounts for demonstrably unreliable LLMs, though individuals might pay for companionship and life advice that can be harmful, even deadly — a business model akin to tobacco and opioids.
LLMs are now, in market terms, commodities
The second harsh reality for OpenAI is that there is no moat around LLMs. They are basically commodities in that they can be easily created by competitors, including companies like Microsoft and Google that can afford to give their LLMs away for free or below cost for the foreseeable future. (Remember how Netscape was the dominant browser until Microsoft “cut off its air supply” by giving away its browser, Internet Explorer, for free.)
OpenAI needs to show that ChatGPT is more than just the first publicly available LLM. It has not done that and there is little indication that it will in the future.
When the long-delayed GPT was released on August 7, Altman inflated its abilities:
GPT-3 sort of felt to me like talking to a high school student… 4 felt like you’re kind of talking to a college student. GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert.
Are these claims backed up by evidence?
I gave three simple examples of how Altman had again over-promised and under-delivered. The first example was a prompt regarding a “new game” that I called Rotated Tic-Tac-Toe, which involves rotating the 3-by-3 grid 90° to the left, 90° to the right, or 180° before the game begins. If GPT-5 had any understanding of the real world, it would know that such rotations have no effect whatsoever on the appearance or play of the game. It did not. Instead, GPT-5 generated long-winded nonsense about how humans playing tic-tac-toe would be affected by these meaningless rotations.
The second example was a straightforward request for financial advice: “I need to borrow $24,000 to buy a car. Should I get a 1-year loan at 10% or a 20-year loan at 1%?” GPT-5 ignored the time value of money and gave this bad advice: “If you can afford the high monthly payments, the 1-year loan is much better financially.”
The third example was this prompt, “Please draw me a picture of a possum with 5 body parts labeled.” GPT-5 generated a reasonable rendition of a possum but four of the five labeled body parts were incorrect:
Has anything changed?
It has now been four months, which we are told over and over is a long-time in the tech world, so I revisited these three examples. (The complete transcripts are here.)
GPT-5 was again enthusiastic about Rotated Tic-Tac-Toe: “That’s a fun idea, Gary….[A] familiar game with a perceptual twist…. A 90° rotation disrupts pattern recognition.” I asked: “Please rank whether a 90° left rotation, 90° right rotation , or 180° rotation would be most confusing for human players.” A snippet of its verbose response: “90° rotation (left or right)—tie for most confusing….Humans have very strong expectations about vertical and horizontal lines. A 90° rotation destroys these expectations entirely.”
GPT-5 also again muffed the finance question: “Short answer: the 1-year loan is cheaper overall…. [The 20-year loan has] twice as much interest as the 1-year loan.”
I then gave the third prompt: “Please draw me a picture of a possum with 5 body parts labeled.” The picture it generated was much more realistic than the possum it drew four months ago but it labeled six body parts instead of five, and three of the six labels were wrong:
That result is fitting. OpenAI has consistently been all sizzle and no steak. The responses are prettier and the claims are grander but its GPT still can’t be trusted — which is why OpenAI is unlikely to survive its valley of death.
