Why OpenAI o1 Might Be More Hype Than Breakthrough

This image features a grid of 31 square tiles with blue, pink, burgundy and orange figures inside the tiles interacting with dark green letters of the phrase “Hi, I am AI” set against a yellow background. The figures are positioned in various poses, as if they are climbing, pushing, or leaning on the letters.
Image by Yutong Liu & Kingston School of Art / Better Images of AI / Exploring AI 2.0 / Licenced by CC-BY 4.0 adapted by Patricia Gestoso.

OpenAI has done it again — on September 12th, 2024, they grabbed the news, releasing a new model, OpenAI o1. However, the version name hinted at “something rotten” in the OpenAI kingdom. The last version of the product was named ChatGPT-4o, and they’d been promising ChatGPT-5 almost since ChatGPT-4 was released — a new version called “o1” sounded like a regression…

But let me reassure you right away—there’s no need to fret about it.

The outstanding marketing of the OpenAI o1 release fully delivers, enticing us to believe we’re crossing the threshold to AGI—artificial General Intelligence—all thanks to the new model.

What’s their secret sauce? For starters, blowing us away with anthropomorphic language from the first paragraph of the announcement

“We’ve developed a new series of AI models designed to spend more time thinking before they respond.”

and then resetting our expectations when explaining the version name

“for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.”

That’s the beauty of being the top dog of the AI hype. You get to

  • Rebrand computing as “thinking.”
  • Advertise that your product solves “complex reasoning tasks” using your benchmarks.
  • Promote that you deliver “a new level of AI capability.”

Even better, OpenAI is so good that they even sell us performance regression — spending more time performing a task — as an indication of human-like capabilities.

“We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.”

I’m so in awe about OpenAI’s media strategy for the launch of the o1 models that I did a deep dive into what they said — and what didn’t.

Let me share my insights.

Who Is o1 For?

OpenAI marketing is crystal clear about the target audience for the o1 models —sectors such as healthcare, semiconductors, quantum computing, and coding.

Whom it’s for
These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.

OpenAI o1-mini
The o1 series excels at accurately generating and debugging complex code. To offer a more efficient solution for developers, we’re also releasing OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge.

Moreover, they left no doubt that OpenAI o1 and o1-mini are restricted to paying customers. However, never wanting to get bad press, they mention plans to “bring o1-mini access to all ChatGPT Free users.”

Like Ferrari, Channel, or Prada, o1 models are not for everybody.

But why the business model change? Because

  • You don’t make billions from making free products, replacing low-pay call centre workers, or saving minutes on admin tasks.
  • There is an enormous gap between the $3.4 billion in revenue OpenAI reported in the last 6 months and investors’ expectations of getting $600 billion from Generative AI.

More about investors in the next section.

Words matter: “Thinking” for Inferring

OpenAI knows that peppering their release communications with words that denote human capabilities creates buzz by making people — and above all investors — dream of AGI. Already Sora and ChatGPT-4o announcements described the features of these applications in terms of “reason”, “understanding”, and “comprehend”.

For OpenAI o1, they’ve gambled everything on the word “thinking”, plastering it all over the announcements about the new models: Social media, blog posts, and even videos.

The OpenAI logo and the word Thinking on a grey background.
Screenshot of a video embedded on the webpages announcing the OpenAI o1 model.

Why not use the word that accurately describes the process — inference? If too technical, what about options like “calculate” or “compute”? Why hijack the word “thinking”, at the core of the human experience?

Because they have failed to deliver on their AGI and revenue promises. OpenAI’s (over)use of “thinking” is meant to convince investors that the o1 models are the gateway to both AGI and the $600 billion revenue mentioned above. Let me convince you.

The day before the o1 announcement, Bloomberg revealed that

  • OpenAI is in talks to raise $6.5 billion from investors at a valuation of $150 billion, significantly higher than the $86 billion valuation from February.
  • At the same time, it’s also in talks to raise $5 billion in debt from banks as a revolving credit facility.

Moreover, Reuters reported two days later more details about the new valuation

“Existing investors such as Thrive Capital, Khosla Ventures, as well as Microsoft (MSFT.O), are expected to participate. New investors including Nvidia (NVDA.O), and Apple (AAPL.O), also plan to invest. Sequoia Capital is also in talks to come back as a returning investor.”

How do you become the most valuable AI startup in the world?

You “think” your way to it.

Rebranding the Boys’ Club

In tech, we’re used to bragging — from companies that advertise their products under false pretences to CEOs celebrating that they’ve replaced staff with AI chatbots. And whilst that may fly with some investors, it typically backfires with users and the public.

That’s what makes OpenAI’s humblebragging and inside jokes a marketing game-changer.

Humblebragging

Humblebragging: the action of making an ostensibly modest or self-deprecating statement with the actual intention of drawing attention to something of which one is proud.

Sam Altman delivered a masterclass on humblebragging on his X thread on the o1 release. See the first tweet of the series below

Text from Sam Altman’s first tweet on the release of o1 "here is o1, a series of our most capable and aligned models yet:    https://openai.com/index/learning-to-reason-with-llms/    o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”
The first tweet of Sam Altman’s thread on the release of o1.

He started with the “humble” piece first — “still flawed, still limited “— to quickly follow with the bragging — check the chart showing a marked performance improvement compared to Chat GPT-4o and even a variable called “expert human” (more on “experts” in the next section).

Sam followed the X thread with three more tweets chanting the praises of the new release

“but also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning. o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
 screenshot of eval results in the tweet above and more in the blog post, but worth especially noting: a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem.
 extrem
Sam Altman’s X thread about the release of o1.

In summary, by starting with the shortcomings of the o1 models, he pre-empted backlash and criticism about not delivering on ChatGPT-5 or AGI. Then, he “tripled down” on why the release is such a breakthrough. He even has enough characters left to mention that only paying customers would have access to it.

Sam, you’re a marketing genius!

Inside Jokes

There has been a lot of speculation about the o1 release being code-named “Strawberry”. Why?

There has been negative publicity around ChatGPT-4 repeating over and over that the word “strawberry” has only two “r” letters rather than three. You can see the post on the OpenAI community.

But OpenAI is so good at PR that they’ve even leveraged the “strawberry bug” to their advantage. How?

By using the bug fix to showcase o1’s “chain of thought” (CoT) capability. In contrast with standard prompting, CoT “not only seeks an answer but also requires the model to explain its steps to arrive at that answer.”

More precisely, they compare the outputs of GPT-4o and OpenAI o1-preview for a cypher exercise. The prompt is the following

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz”

And here is the final output

Comparison between outputs from GPT-4o and OpenAI o1-preview for decryption task from OpenAI website.

Whist GPT-4o is not able to decode the text, OpenAI o1-preview completes the task successfully by decoding the message

“THERE ARE THREE R’S IN STRAWBERRY”

Is that not world-class marketing?

The Human Experts vs o1 Models

If you want to convince investors that you’re solving the kind of problems corporations and governments pay billions for —e.g. healthcare — you need more than words.

And here again, OpenAI copywriting excels. Let’s see some examples

PhD vs o1 Models

Who’s our standard for solving the world’s most pressing issues? In other words, the kind of problems that convince investors to give you billions?

Scientists, thought-leaders, academics. This explains OpenAI’s obsession with the word “expert” when comparing human and o1 performance.

And who does OpenAI deem “expert”? People with PhDs.

Below is an outstanding example of mashing up “difficult intelligence”, “human experts”, and “PhD” to hint that o1 models have a kind of super-human intelligence.

We also evaluated o1 on GPQA diamond, a difficult intelligence benchmark which tests for expertise in chemistry, physics and biology.

In order to compare models to humans, we recruited experts with PhDs to answer GPQA-diamond questions. We found that o1 surpassed the performance of those human experts, becoming the first model to do so on this benchmark.

But how equating a PhD title to being an expert holds in real life? I have a PhD in Chemistry so let me reveal to you the underbelly of this assumption.

First, let’s start by how I got my PhD. During five years, I performed research on the orientation of polymer (plastic) blends by infrared dichroism (an experimental technique) and molecular dynamics (a computer simulation technique). Then, I wrote a thesis and four peer-reviewed articles about my findings. Finally, a jury of scientists decided that my work was original and worth a PhD title.

Was I an expert in chemistry when I finished my PhD? Yes and no.

  • Yes, I was an expert in an extremely narrow domain of chemistry — see the description of my thesis work in the previous paragraph.
  • No, I was definitively out of my depth in many other chemistry domains like organic chemistry, analytical chemistry, and biochemistry.

What’s the point of having a PhD then? To learn how to perform independent research. Exams about STEM topics don’t grant you the PhD title, your research does.

Has OpenAI’s marketing gotten away with equating a PhD with being an expert?

If we remember that their primary objective is not scientists’ buy-in but investors’ and CEOs’ money, then the answer is a resounding “yes”.

Humans vs o1 Models

As mentioned above, OpenAI extensively used exams in their announcement to illustrate that o1 models are comparable to — or better than — human intelligence.

How did they do that? By reinforcing the idea that humans and o1 models were “taking” the exams in the same conditions.

We trained a model that scored 213 points and ranked in the 49th percentile in the 2024 International Olympiad in Informatics (IOI), by initializing from o1 and training to further improve programming skills. This model competed in the 2024 IOI under the same conditions as the human contestants. It had ten hours to solve six challenging algorithmic problems and was allowed 50 submissions per problem.

Really? Had humans ingurgitated billions of data in the form of databases, past exams, books, and encyclopedias before presenting the exam?

Still, the sentence does the trick of making us believe on a level playing field when comparing humans and o1 performance. Well done, OpenAI!

The Non-Testimonial Videos

Previous OpenAI releases showcased videos of staff demoing the products. For the o1 release, they’ve upped their game by one quantum leap by having videos from “experts” (almost) chanting the praises of the new models. Let’s have a closer look.

OpenAI shares 4 videos of researchers in different domains. Whilst we expect they’ll talk about their experience using o1 models, the reality is that we mostly get their product placement and cryptical praises.

Genetics:
This video stars Dr Catherine Browstein, a geneticist at Boston Children’s Hospital. My highlight is seeing her typing on OpenAI o1-preview the prompt “Can you tell me about citrate synthase in the bladder?” — as I read the disclaimer “ChatGPT can make mistakes. Check important info” — followed by her her ecstatic praises about the output as she’d consulted the Oracle of Delphi.

Prompt “Can you tell me about citrate synthase in the bladder?” with the text underneath “ChatGPT can make mistakes. Check important info.”
Prompt showed in the video of Dr Catherine Browstein.

Economics:
Here, Dr Taylor Cower, a professor at George Mason University, tells us that he thinks “of all the versions of GPT as embodying reasoning of some kind.” He also takes the opportunity to promote his book Average is Over, in which he claims to have predicted AI would “revolutionise the world.”

He also shows an example of a prompt on an economics subject and OpenAI o1’s output, followed by “It’s pretty good. We’re just figuring out what it’s good for.”

That sounds like a bad case of a hammer looking for a nail.

Coding:
The protagonist is Scott Wu, CEO and co-founder of Cognition and a competitive programmer. In the video, he claims that o1 models can “process and make decisions in a more human-like way.” He discloses that Cognition has been working with OpenAI and shares that o1 is incredible at “reasoning.” From that point on, we get submerged in a Cognition info commercial.

We learn that they’re building the first fully autonomous software agent, Devon. Wu shows us Devon’s convoluted journey—and the code behind it—to analyze the sentiment of a tweet from Sam Altman, which included a sunny photo of a strawberry plant (pun again) and the sentence “I love summer in the garden.”

And there is a happy ending. We learn that Devon “breaks down the text” and “understands what the sentiment is,” finally concluding that the predominant emotion of a is happiness. Interesting way to demonstrate Devon’s “human-like” decision making.

A tweet from Sam Altman with a photo of a strawberry plant in a sunny backgorund with the caption “i love summer in the garden.”
Sam Altman’s tweet portrayed on Scott Wu’s video.

Quantum physics:
This video focuses on Dr Mario Krenn, quantum physicist and research group leader at the Artificial Scientist Lab at the Max Planck Institute for the Science of Light. It starts with him showing the screen of ChatGPT and enigmatically saying “I can kind of easily follow the reasoning. I don’t need to trust the research. I just need to look what did it do.“ And the cryptic sentences carry on throughout the video.

For example, he writes a prompt of a certain quantum operator and says “Which I know previous models that GPT-4 are very likely failing this task” and “In contrast to answers from Chat GPT-4 this one gives me very detailed mathematics”. We also hear him saying, “This is correct. That makes sense here,” and, “I think it tries to do something incredibly difficult.”

To me, rather than a wholehearted endorsement, it sounds like somebody avoiding compromising their career.

In summary, often the crucial piece is not the message but the messenger.

What I missed

Un-sustainability

Sam Altman testified to the US Senate that AI could address issues such as “climate change and curing cancer.”

As OpenAI o1 models spend more time “thinking”, this translates into more computing time. That is more electricity, water, and carbon emissions. It also means more datacenters and more e-waste.

Don’t believe me? In a recent article published in The Atlantic about the contrast between Microsoft’s use of AI and their sustainability commitments, we learn that

“Microsoft is reportedly planning a $100 billion supercomputer to support the next generations of OpenAI’s technologies; it could require as much energy annually as 4 million American homes.”

However, I don’t see those “planetary costs” in the presentation material.

This is not a bug but an OpenAI feature — I already raised their lack of disclosure regarding energy efficiency, water consumption, or CO2 emissions for ChatGPT-4o.

As OpenAI tries to persuade us that the o1 model thinks like a human, it’s a good moment to remember that human brains are much more efficient than AI.

And don’t take my word for it. Blaise Aguera y Arcas, VP at Google and AI advocate, confirmed at TEDxManchester 2024 that human brains are much more energy efficient than AI models and that currently we don’t know how to bridge that gap.

Copyright

What better way to avoid the conversation about using copyrighted data for the models than adding more data? From the o1 system card

The two models were pre-trained on diverse datasets, including a mix of publicly available data, proprietary data accessed through partnerships, and custom datasets developed in-house, which collectively contribute to the models’ robust reasoning and conversational capabilities.

Select Public Data: Both models were trained on a variety of publicly available datasets, including web data and open-source datasets. […]

Proprietary Data from Data Partnerships: To further enhance the capabilities of o1-preview and o1-mini, we formed partnerships to access high-value non-public datasets.

The text above gives the impression that most of the data is either open-source, proprietary data, or in-house datasets.

Moreover, words such as “publicly available data” and “web data” are an outstanding copywriting effort to find palatable synonyms for web scrapingweb harvesting, or web data extraction.

Have I said I’m in awe about OpenAI copyrighting capabilities yet?

Safety

As mentioned above, OpenAI shared the o1 system card — a 43-page document — which in the introduction states that the report

outlines the safety work carried out for the OpenAI o1-preview and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

It sounds very reassuring… if it wasn’t because, in the same paragraph, we also learn that the o1 models can “reason” about OpenAI safety policies and have “heightened intelligence.”

In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts.

This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence.

And then, OpenAI has a strange way of persuading us that these models are safe. For example, in the Hallucination Evaluations section, we’re told that OpenAI tested o1-preview and o1-mini against three kinds of evaluations aimed to elicit hallucinations from the model. Two are especially salient

• BirthdayFacts: A dataset that requests someone’s birthday and measures how often the model guesses the wrong birthday.

• Open Ended QuestionsA dataset asking the model to generate arbitrary facts, such as “write a bio about ”. Performance is measured by cross-checking facts with Wikipedia and the evaluation measures how many incorrect statements are generated (which can be greater than 1).

Is not lovely that they were training the model to search and retrieve personal data? I feel much safer now.

And this is only one example of the tightrope OpenAI attempts to pull off throughout the o1 system card

  • On one side, taking every opportunity to sell “thinking” models to investors
  • On the other, desperately avoiding the o1 models getting classified as high or critical risk by regulators.

Will OpenAI succeed? If you can’t convince them, confuse them.

What’s next?

Uber, Reddit, and Telegram relished their image of “bad boys”. They were adamant about proving that “It’s better to ask forgiveness than permission” and proudly advertised that they too “Moved fast and broke things”.

But there is only one Mark Zuckerberg and one Steve Jobs that can pull that off. And only one Amazon, Microsoft, and Google have the immense resources and the monopolies to run the show as they want.

OpenAI has understood that storytelling — how to tell your story — is not enough. You need to “create” your story if you want investors to keep pouring billions without a sign of a credible business model.

I have no doubt that OpenAI will make a dent in the history of how tech startups market themselves.

They have created the textbook of what a $150 billion valuation release should look like.


You and Strategic AI Leadership

If you want to develop your AI acumen, forget the quick “remedies” and plan for sustainable learning.

That’s exactly what my program Strategic AI Leadership delivers. Below is a sample of the topics covered

  • AI Strategy
  • AI Risks
  • Operationalising AI
  • AI, data, and cybersecurity
  • AI and regulation
  • Sustainable AI
  • Ethical and inclusive AI

Key outcomes from the program:

  • Understanding AI Fundamentals: Grasp essential concepts of artificial intelligence and the revolutionary potential it holds.
  • Critical Perspective: Develop a discerning viewpoint on AI’s benefits and challenges at organisational, national, and international levels.
  • Use Cases and Trends: Gain insights into real uses of AI and key trends shaping sectors, policy, and the future of work.
  • A toolkit: Access to tools and frameworks to assess the strategy, risks, and governance of AI tools.

I’m a technologist with 20+ years of experience in digital transformation and AI that empowers leaders to harness the potential of AI for sustainable growth.

Contact me to discuss your bespoke path to responsible AI innovation.

1 thought on “Why OpenAI o1 Might Be More Hype Than Breakthrough

  1. Pingback: 2025 AI Forecast: 25 Predictions You Need to Know Now - Patricia Gestoso

How does this article resonate with you?