Tag Archives: #TrustworthyAI

The Missing Pieces in the UK’s AI Opportunities Action Plan

A brightly coloured mural which can be viewed in any direction. It has several scenes within it: people in front of computers seeming stressed, a number of faces overlaid over each other, squashed emojis, miners digging in front of a huge mountain representing mineral resources, a hand holding a lump of coal or carbon, hands manipulating stock charts and error messages, as well as some women performing tasks on computers, men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone, people in a production line. Motifs such as network diagrams and melting emojis are placed throughout the busy vignettes.
Clarote & AI4Media / Better Images of AI / AI Mural / CC-BY 4.0.

Reading the 50 recommendations in the AI Opportunities Action Plan published by the British Government last January 13th has been a painful and disappointing exercise.

Very much like a proposal out of a chatbot, the document is

  • Bland —  The text is full of hyperbolic language and over-the-top optimism
  • General —  The 50 recommendations lack specificity to the UK context and details about ownership and the budget required to execute them.
  • Contradictory  — The plan issued by a Labour government is anchored in a turbo-capitalistic ideology. Oxymoron anyone?

If I learned anything from my 12 years in Venezuela, it’s that putting all your eggs in one basket — oil, in their case — and hoping it solves all problems doesn’t work.

A credible AI strategy must (a) address both the benefits and the challenges head-on and (b) consider this technology as another asset to the human-centric flourishment of the country rather than a goal in itself that should be pursued at all costs.

But you don’t need to believe me. See it for yourself.


What I read

Techno-speak

I was reminded of George Orwell’s 1984 Newspeak.

The text uses “AI” made works such as AI stack, frontier AI, AI-driven data cleansing tools, AI-enabled priorities, “embodied AI” without providing a clear definition.

Exaggeration

Hyperbole and metaphors are used to the extreme to overstate the benefits.

we want Britain to step up; to shape the AI revolution rather than wait to see how it shapes us. 

We should expect enormous improvements in computation over the next decade, both in research and deployment.

Change lives by embracing AI

FOMO

The text transpires FOMO (Fear Of Missing Out). No option is given to adopt AI systems more gradually. It’s now or we’ll be the losers.

This is a crucial asymmetric bet — and one the UK can and must make

we need to “run to stand still”.

the UK risks falling behind the advances in Artificial Intelligence made in the USA and China.

And even a new take on Facebook’s famous “move fast and break things”:

“move fast and learn things”

Techno-solutionism

AI is going to solve all our socio-economic and political problems and transport us to a utopian future 

It is hard to imagine how we will meet the ambition for highest sustained growth in the G7 — and the countless quality-of-life benefits that flow from that — without embracing the opportunities of AI.

Our ambition is to shape the AI revolution on principles of shared economic prosperity, improved public services and increased personal opportunities so that:
• AI drives the economic growth on which the prosperity of our people and the performance of our public services depend;
• AI directly benefits working people by improving health care and education and how citizens interact with their government; and
• the increasing of prevalence of AI in people’s working lives opens up new opportunities rather than just threatens traditional patterns of work.

What’s not to like?

For a great commentary on how techno-solutionism won’t solve social problems, see 20 Petitions for AI and Public Good in 2025 by Tania Duarte.

Colonialism

Living in Venezuela for 12 years was an education on how to feel “less than” other countries even when you have the largest oil reserves in the world.

I remember new education programs announced as being a success in the US, Canada, Spain, Germany… A colonised mentality learned from centuries of Spanish oppression. The pervasive assumption that an initiative would work simply because we like the results disregarding the context they were developed for.

The AI Opportunities Action Plan reminded me of them.

Supporting universities to develop new courses co-designed with industry — such as the successful co-operative education model of Canada’s University of Waterloo, CDTM at the Technical University of Munich or France’s CIFRE PhD model

Launch a flagship undergraduate and masters AI scholarship programme on the scale of Rhodes, Marshall, or Fulbright for students to study in the UK.

Singapore, for example, developed a national AI skills online platform with multiple training offers. South Korea is integrating AI, data and digital literacy.

But the document is also keen on showing us that we’ll be the colonisers

we aspire to be one of the biggest winners from AI

Because we believe Britain has a particular responsibility to provide global leadership in fairly and effectively seizing the opportunities of AI, as we have done on AI safety

A historical-style painting of a young woman stands before the Colossus computer. She holds an abstract basket filled with vibrant, pastel circles representing data points. The basket is attached to the computer through a network of connecting wires, symbolizing the flow and processing of information.
Hanna Barakat & Cambridge Diversity Fund / Better Images of AI / Colossal Harvest / CC-BY 4.0

Capitulation

The document is all about surrendering the data, agency, tax money, and natural resources of citizens in the UK to the AI Gods: startups, “experts”, and investors.

Invest in becoming a great customer: government purchasing power can be a huge lever for improving public services, shaping new markets in AI

We should seek to responsibly unlock both public and private data sets to enable innovation by UK startups and researchers and to attract international talent and capital.

Couple compute allocation with access to proprietary data sets as part of an attractive offer to researchers and start-ups choosing to establish themselves in the UK and to unlock innovation.

Sprinkling AI

AI is the Pantone’s Colour of the next 5 years. All will need to have AI on it. Moreover, everything must be designed so that AI can shine.

Appointing an AI lead for each mission to help identify where AI could be a solution within the mission setting, considering the user needs from the outset.

Two-way partnerships with AI vendors and startups to anticipate future AI developments and signal public sector demand. This would involve government meeting product teams to understand upcoming releases and shape development by sharing their challenges.

AI should become core to how we think about delivering services, transforming citizens’ experiences, and improving productivity.

Brexit Denial

It’s funny to see that the text doesn’t reference the European Union and only refers to Europe as a benchmark to measure against.

Instead, the EU is hinted at as “like-minded partners” and “allies” and collaborations are thrown right and left without naming who’s the partner.

Agree international compute partnerships with like-minded countries to increase the types of compute capability available to researchers and catalyse research collaborations. This should focus on building arrangements with key allies, as well as expanding collaboration with existing partners like the EuroHPC Joint Undertaking.

We should proactively develop these partnerships, while also taking an active role in the EuroHPC Joint Undertaking.

Moreover, the text praises the mobility of researchers and wanting to attract experts forgetting the UK’s refusal to participate in the Erasmus program and the fact that it only joined Horizon Europe last year.

The UK is a medium-sized country with a tight fiscal situation. We need the best talent around the world to want to start and scale companies here.

Explore how the existing immigration system can be used to attract graduates from universities producing some of the world’s top AI talent.

Vagueness

Ideas are thrown into the text half-backed giving the idea the government has adopted the Silicon Valley strategy of “building the plane while flying”

The government must therefore secure access to a sufficient supply of compute. There is no precise mechanism to allocate the proportions

In another example, the plan advocates for open-source AI applications.

the government should support open-source solutions that can be adopted by other organisations and design processes with startups and other innovators in mind.

The AI infrastructure choice at-scale should be standardised, tools should be built with reusable modular code components, and code-base open-sourcing where possible.

At the same time, it’s adamant that it needs to attract startups and investors. Except if the startups are NGOs, who’ll then finance those open-source models?

DEI for Beginners

Students at computers with screens that include a representation of a retinal scanner with pixelation and binary data overlays and a brightly coloured datawave heatmap at the top.
Kathryn Conrad / Better Images of AI / Datafication / CC-BY 4.0

All of us who have been working towards a more diverse and inclusive tech for decades are in for a treat. 

First, we’re told that diversity in tech is very simple — it’s all about gender parity and pipeline.

16. Increase the diversity of the talent pool. Only 22% of people working in AI and data science are women. Achieving parity would mean thousands of additional workers. […] Government should build on this investment and promote diversity throughout the education pipeline.

Moreover, they’ve found the magic bullet.

Hackathons and competitions in schools have proven effective at getting overlooked groups into cyber and so should be considered for AI.

What about the fact that 50% of women in tech leave the sector by the age of 35?


What I missed

Regions

The government mentions that AI “can” — please note that is not a “must” or “need” — benefit “post-industrial towns and coastal Scotland.” However, the only reference to a place is to the Culham Science Centre, which is 10 miles from Oxford — a zone that very few could consider needs “local rejuvenation” or “channelling investment”

Government can also use AIGZs [‘AI Growth Zones’] to drive local rejuvenation, channelling investment into areas with existing energy capacity such as post-industrial towns and coastal Scotland. Government should quickly nominate at least one AIGZ and work with local regions to secure buy-in for further AIGZs that contribute to local needs . Existing government sites could be prioritised as pilots, including Culham Science Centre

And it doesn’t appear to be room to involve local authorities in how AI could bring value to their regions

Drive AI adoption across the whole country. Widespread adoption of AI can address regional disparities in growth and productivity. To achieve this, government should leverage local trusted intermediaries and trade bodies

Costs

There are plenty of gigantic numbers about how much money will AI (may) bring

AI adoption could grow the UK economy by an additional £400 billion by 2030 through enhancing innovation and productivity in the workplace

but nothing about the costs…

Literacy

How will people get upskilled? We only get generic reassurances

government should encourage and promote alternative domestic routes into the AI profession — including through further education and apprenticeships, as well as employer and self-led upskilling.

Government should ensure there are sufficient opportunities for workers to reskill, both into AI and AI-enabled jobs and more widely.

Citizens

There is no indication in the document that this “AI-driven” Britain is what their citizens want. Citizens themselves don’t appear to be included in shaping AI either.

For example, it claims that teachers are already “benefiting” from AI assistants

it is helping some teachers cut down the 15+ hours a week they spend on lesson planning and marking in pilots.

However, the text doesn’t tell us that teachers want to give up class preparation.

And the text repeatedly states that the government will prioritise “innovation” (aka profit) vs safety.

My judgement is that experts, on balance, expect rapid progress to continue. The risks from underinvesting and underpreparing, though, seem much greater than the risks from the opposite.

Moreover, regulators are expected to enable innovation at all costs

Require all regulators to publish annually how they have enabled innovation and growth driven by AI in their sector. […] government should consider more radical changes to our regulatory model for AI, for example by empowering a central body with a mandate and higher risk tolerance to promote innovation across the economy.

Where did we sing for that?

Sustainability

The document waxes lyrical about building datacentres. What about the electricity and water requirements? What about the impact on our water reserves and electricity grid? What about the repercussions on our sustainability goals?

The document is done by throwing the word sustainability twice in one paragraph

Mitigate the sustainability and security risks of AI infrastructure, while positioning the UK to take advantage of opportunities to provide solutions. [..] Government should also explore ways to support novel approaches to compute hardware and, where appropriate, create partitions in national supercomputers to support new and innovative hardware. In doing so, government should look to support and partner with UK companies who can demonstrate performance, sustainability or security advancements.

An array of colorful, fossil-like data imprints representing the static nature of AI models, laden with outdated contexts and biases.
Luke Conroy and Anne Fehres & AI4Media / Better Images of AI / Models Built From Fossils / CC-BY 4.0

Unemployment

The writers of that utopic “AI-powered” UK manifesto don’t address job losses. We only get the sentence I mentioned above

the increasing of prevalence of AI in people’s working lives opens up new opportunities rather than just threatens traditional patterns of work.

Instead, it uses language that fosters fear and builds on utopian and dystopian visions of an AI-driven future

AI systems are increasingly matching or surpassing humans across a range of tasks.

Given the pace of progress, we will also very soon see agentic systems — systems that can be given an objective, then reason, plan and act to achieve it. The chatbots we are all familiar with are just an early glimpse as to what is possible.

On the flip side, the government repeatedly reiterates their ambition of bringing talent from abroad

 Supporting UK-based AI organisations working on national priority projects to bring in overseas talent and headhunting promising founders or CEOs

How does this plan contribute to reassuring people about their jobs?

Big-picture

This techno-solutionism approach doesn’t have any regard for AI specialists in domains other than coding or IT.

To mention a few, what about sociologists, psychologists, philosophers, teachers, historians, economists, or specialists in the broad spectrum of industries in the UK? 

Don’t they belong to those think tanks where decisions are made about selling our country to the AI Gods?


The Good News? We Can Do Better

People in Britain voted last year that they were tired of profits over people, centralism, and oligarchy. Unfortunately, this plan uses AI to reinforce the three.

The UK is full of hardworking and smart people who deserve much better than magic bullets or techno-saviours. 

Instead of shoehorning the UK’s future to AI, what if we


WORK WITH ME

I’m a technologist with 20+ years of experience in digital transformation. I’m also an award-winning inclusion strategist and certified life and career coach.

Three ways you can work with me:

  • I empower non-tech leaders to harness the potential of artificial intelligence for sustainable growth and responsible innovation through consulting and AI competency programs.
  • I’m a ​sought-after international keynote speaker​ on strategies to empower women and underrepresented groups in tech, sustainable and ethical artificial intelligence, and inclusive workplaces and products.
  • I help ambitious women in tech who are overwhelmed to break the glass ceiling and achieve success without burnout through bespoke coaching and mentoring.

Get in touch to discuss how I can help you achieve the success you deserve in 2025.

2025 AI Forecast: 25 Predictions You Need to Know Now

I’ve been betting on the transformative power of digital technology all my professional career. 

  • I started doing computer simulation during my MSc in Chemical Engineering in the 1990s, in a lab where everybody else was an experimentalist. Except for my advisor, the rest of the team was sceptical — to say the least — that something useful would come from using computer modelling to study ​enhanced oil recovery from oil fields ​.
  • A similar story repeated during my PhD in Chemistry, where I pioneered using molecular modelling to study polymers in a research centre focused on the experimental study of polymers and proteins.
  • For the last 20+ years, I’ve been working on digital transformation playing a similar role. First, as Head of Training and Contract Research, and now as Director of Scientific Support, I relish helping my customers harness the potential of digital technology for responsible innovation.

I’m also known for telling it as I see it. In the early 2000s, I was training a customer — incidentally an experimentalist — on ​genetic algorithms​. He was very excited and asked me if he could create a model for designing a new material. He proudly shared he had “7 to 10 data points.” My answer? “Far too few.’”

In summary, I’m very comfortable being surrounded by tech sceptics, dispelling myths about what AI can and can’t do, and betting on the power of digital technology.

And that’s exactly why I’m sharing with you my AI predictions for 2025.

My Predictions

1.- ​xAI​ (owned by Elon Musk) will purchase X so that the first can freely train its models on the data from the second. ​Elon owns 79% of X ​after he bought it for $44 billion. Now it’s valued at $9.4 billion and big advertisers keep leaving the platform.

After struggling for almost 3 years to make it work, the xAI acquisition — which got a ​$6 billion funding round​ in December — would be a win-win.

2.- OpenAI for-profit organisation will formally split from the original non-profit. I bet on this despite ​Elon Musk’s injunction to stop OpenAI’s transition to a for-profit company​ (​supported by Meta​).

Why? A clause in ​OpenAI’s $150 billion funding round​ allows investors to request their money back if the switch isn’t completed within two years.

3.- The generation and usage of synthetic data will balloon to address data privacy concerns. People want better services and products — especially in healthcare — but are unwilling to give up their personal data. The solution? “Creating” data.

4.- Startups and organisations will move from using large language models (LLMs) to focusing on SLMs (small language models), which consume less energy, produce fewer hallucinations, and are customised to companies’ requirements.

An image of multiple 3D shapes representing speech bubbles in a sequence, with broken up fragments of text within them.
Wes Cockx & Google DeepMind / Better Images of AI / AI large language models / Licenced by CC-BY 4.0.

5.- In FY 2025, ​Microsoft plans to invest approximately $80 billion to build AI-enabled datacenters​ but don’t expect that to go smoothly with everybody. In 2024, ​datacenters consumption gathered a lot of attention​.

This year local authorities and NGOs will develop frameworks to scrutinise datacenters electricity and water consumption. They’ll also be tracked in terms of disruption to the locals: ​electricity stability​, water availability, and electricity and water prices.

6.- Rise of the two-tier AI-human customer support model: AI chatbots for self-service and low-revenue customers and human customer support for key and high-revenue clients.

It’s not only a question of money but also of liability. There is less probability that low-profit customers sue providers over AI chatbots delivering harmful and/or inaccurate content.

Continue reading

Why OpenAI o1 Might Be More Hype Than Breakthrough

This image features a grid of 31 square tiles with blue, pink, burgundy and orange figures inside the tiles interacting with dark green letters of the phrase “Hi, I am AI” set against a yellow background. The figures are positioned in various poses, as if they are climbing, pushing, or leaning on the letters.
Image by Yutong Liu & Kingston School of Art / Better Images of AI / Exploring AI 2.0 / Licenced by CC-BY 4.0 adapted by Patricia Gestoso.

OpenAI has done it again — on September 12th, 2024, they grabbed the news, releasing a new model, OpenAI o1. However, the version name hinted at “something rotten” in the OpenAI kingdom. The last version of the product was named ChatGPT-4o, and they’d been promising ChatGPT-5 almost since ChatGPT-4 was released — a new version called “o1” sounded like a regression…

But let me reassure you right away—there’s no need to fret about it.

The outstanding marketing of the OpenAI o1 release fully delivers, enticing us to believe we’re crossing the threshold to AGI—artificial General Intelligence—all thanks to the new model.

What’s their secret sauce? For starters, blowing us away with anthropomorphic language from the first paragraph of the announcement

“We’ve developed a new series of AI models designed to spend more time thinking before they respond.”

and then resetting our expectations when explaining the version name

“for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.”

That’s the beauty of being the top dog of the AI hype. You get to

  • Rebrand computing as “thinking.”
  • Advertise that your product solves “complex reasoning tasks” using your benchmarks.
  • Promote that you deliver “a new level of AI capability.”

Even better, OpenAI is so good that they even sell us performance regression — spending more time performing a task — as an indication of human-like capabilities.

“We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.”

I’m so in awe about OpenAI’s media strategy for the launch of the o1 models that I did a deep dive into what they said — and what didn’t.

Let me share my insights.

Who Is o1 For?

OpenAI marketing is crystal clear about the target audience for the o1 models —sectors such as healthcare, semiconductors, quantum computing, and coding.

Whom it’s for
These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.

OpenAI o1-mini
The o1 series excels at accurately generating and debugging complex code. To offer a more efficient solution for developers, we’re also releasing OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge.

Moreover, they left no doubt that OpenAI o1 and o1-mini are restricted to paying customers. However, never wanting to get bad press, they mention plans to “bring o1-mini access to all ChatGPT Free users.”

Like Ferrari, Channel, or Prada, o1 models are not for everybody.

But why the business model change? Because

  • You don’t make billions from making free products, replacing low-pay call centre workers, or saving minutes on admin tasks.
  • There is an enormous gap between the $3.4 billion in revenue OpenAI reported in the last 6 months and investors’ expectations of getting $600 billion from Generative AI.

More about investors in the next section.

Words matter: “Thinking” for Inferring

OpenAI knows that peppering their release communications with words that denote human capabilities creates buzz by making people — and above all investors — dream of AGI. Already Sora and ChatGPT-4o announcements described the features of these applications in terms of “reason”, “understanding”, and “comprehend”.

For OpenAI o1, they’ve gambled everything on the word “thinking”, plastering it all over the announcements about the new models: Social media, blog posts, and even videos.

The OpenAI logo and the word Thinking on a grey background.
Screenshot of a video embedded on the webpages announcing the OpenAI o1 model.

Why not use the word that accurately describes the process — inference? If too technical, what about options like “calculate” or “compute”? Why hijack the word “thinking”, at the core of the human experience?

Because they have failed to deliver on their AGI and revenue promises. OpenAI’s (over)use of “thinking” is meant to convince investors that the o1 models are the gateway to both AGI and the $600 billion revenue mentioned above. Let me convince you.

The day before the o1 announcement, Bloomberg revealed that

  • OpenAI is in talks to raise $6.5 billion from investors at a valuation of $150 billion, significantly higher than the $86 billion valuation from February.
  • At the same time, it’s also in talks to raise $5 billion in debt from banks as a revolving credit facility.

Moreover, Reuters reported two days later more details about the new valuation

“Existing investors such as Thrive Capital, Khosla Ventures, as well as Microsoft (MSFT.O), are expected to participate. New investors including Nvidia (NVDA.O), and Apple (AAPL.O), also plan to invest. Sequoia Capital is also in talks to come back as a returning investor.”

How do you become the most valuable AI startup in the world?

You “think” your way to it.

Rebranding the Boys’ Club

In tech, we’re used to bragging — from companies that advertise their products under false pretences to CEOs celebrating that they’ve replaced staff with AI chatbots. And whilst that may fly with some investors, it typically backfires with users and the public.

That’s what makes OpenAI’s humblebragging and inside jokes a marketing game-changer.

Humblebragging

Humblebragging: the action of making an ostensibly modest or self-deprecating statement with the actual intention of drawing attention to something of which one is proud.

Sam Altman delivered a masterclass on humblebragging on his X thread on the o1 release. See the first tweet of the series below

Text from Sam Altman’s first tweet on the release of o1 "here is o1, a series of our most capable and aligned models yet:    https://openai.com/index/learning-to-reason-with-llms/    o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”
The first tweet of Sam Altman’s thread on the release of o1.

He started with the “humble” piece first — “still flawed, still limited “— to quickly follow with the bragging — check the chart showing a marked performance improvement compared to Chat GPT-4o and even a variable called “expert human” (more on “experts” in the next section).

Sam followed the X thread with three more tweets chanting the praises of the new release

“but also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning. o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
 screenshot of eval results in the tweet above and more in the blog post, but worth especially noting: a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem.
 extrem
Sam Altman’s X thread about the release of o1.

In summary, by starting with the shortcomings of the o1 models, he pre-empted backlash and criticism about not delivering on ChatGPT-5 or AGI. Then, he “tripled down” on why the release is such a breakthrough. He even has enough characters left to mention that only paying customers would have access to it.

Sam, you’re a marketing genius!

Inside Jokes

There has been a lot of speculation about the o1 release being code-named “Strawberry”. Why?

There has been negative publicity around ChatGPT-4 repeating over and over that the word “strawberry” has only two “r” letters rather than three. You can see the post on the OpenAI community.

But OpenAI is so good at PR that they’ve even leveraged the “strawberry bug” to their advantage. How?

By using the bug fix to showcase o1’s “chain of thought” (CoT) capability. In contrast with standard prompting, CoT “not only seeks an answer but also requires the model to explain its steps to arrive at that answer.”

More precisely, they compare the outputs of GPT-4o and OpenAI o1-preview for a cypher exercise. The prompt is the following

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz”

And here is the final output

Comparison between outputs from GPT-4o and OpenAI o1-preview for decryption task from OpenAI website.

Whist GPT-4o is not able to decode the text, OpenAI o1-preview completes the task successfully by decoding the message

“THERE ARE THREE R’S IN STRAWBERRY”

Is that not world-class marketing?

The Human Experts vs o1 Models

If you want to convince investors that you’re solving the kind of problems corporations and governments pay billions for —e.g. healthcare — you need more than words.

And here again, OpenAI copywriting excels. Let’s see some examples

PhD vs o1 Models

Who’s our standard for solving the world’s most pressing issues? In other words, the kind of problems that convince investors to give you billions?

Scientists, thought-leaders, academics. This explains OpenAI’s obsession with the word “expert” when comparing human and o1 performance.

And who does OpenAI deem “expert”? People with PhDs.

Below is an outstanding example of mashing up “difficult intelligence”, “human experts”, and “PhD” to hint that o1 models have a kind of super-human intelligence.

We also evaluated o1 on GPQA diamond, a difficult intelligence benchmark which tests for expertise in chemistry, physics and biology.

In order to compare models to humans, we recruited experts with PhDs to answer GPQA-diamond questions. We found that o1 surpassed the performance of those human experts, becoming the first model to do so on this benchmark.

But how equating a PhD title to being an expert holds in real life? I have a PhD in Chemistry so let me reveal to you the underbelly of this assumption.

First, let’s start by how I got my PhD. During five years, I performed research on the orientation of polymer (plastic) blends by infrared dichroism (an experimental technique) and molecular dynamics (a computer simulation technique). Then, I wrote a thesis and four peer-reviewed articles about my findings. Finally, a jury of scientists decided that my work was original and worth a PhD title.

Was I an expert in chemistry when I finished my PhD? Yes and no.

  • Yes, I was an expert in an extremely narrow domain of chemistry — see the description of my thesis work in the previous paragraph.
  • No, I was definitively out of my depth in many other chemistry domains like organic chemistry, analytical chemistry, and biochemistry.

What’s the point of having a PhD then? To learn how to perform independent research. Exams about STEM topics don’t grant you the PhD title, your research does.

Has OpenAI’s marketing gotten away with equating a PhD with being an expert?

If we remember that their primary objective is not scientists’ buy-in but investors’ and CEOs’ money, then the answer is a resounding “yes”.

Humans vs o1 Models

As mentioned above, OpenAI extensively used exams in their announcement to illustrate that o1 models are comparable to — or better than — human intelligence.

How did they do that? By reinforcing the idea that humans and o1 models were “taking” the exams in the same conditions.

We trained a model that scored 213 points and ranked in the 49th percentile in the 2024 International Olympiad in Informatics (IOI), by initializing from o1 and training to further improve programming skills. This model competed in the 2024 IOI under the same conditions as the human contestants. It had ten hours to solve six challenging algorithmic problems and was allowed 50 submissions per problem.

Really? Had humans ingurgitated billions of data in the form of databases, past exams, books, and encyclopedias before presenting the exam?

Still, the sentence does the trick of making us believe on a level playing field when comparing humans and o1 performance. Well done, OpenAI!

The Non-Testimonial Videos

Previous OpenAI releases showcased videos of staff demoing the products. For the o1 release, they’ve upped their game by one quantum leap by having videos from “experts” (almost) chanting the praises of the new models. Let’s have a closer look.

OpenAI shares 4 videos of researchers in different domains. Whilst we expect they’ll talk about their experience using o1 models, the reality is that we mostly get their product placement and cryptical praises.

Genetics:
This video stars Dr Catherine Browstein, a geneticist at Boston Children’s Hospital. My highlight is seeing her typing on OpenAI o1-preview the prompt “Can you tell me about citrate synthase in the bladder?” — as I read the disclaimer “ChatGPT can make mistakes. Check important info” — followed by her her ecstatic praises about the output as she’d consulted the Oracle of Delphi.

Prompt “Can you tell me about citrate synthase in the bladder?” with the text underneath “ChatGPT can make mistakes. Check important info.”
Prompt showed in the video of Dr Catherine Browstein.

Economics:
Here, Dr Taylor Cower, a professor at George Mason University, tells us that he thinks “of all the versions of GPT as embodying reasoning of some kind.” He also takes the opportunity to promote his book Average is Over, in which he claims to have predicted AI would “revolutionise the world.”

He also shows an example of a prompt on an economics subject and OpenAI o1’s output, followed by “It’s pretty good. We’re just figuring out what it’s good for.”

That sounds like a bad case of a hammer looking for a nail.

Coding:
The protagonist is Scott Wu, CEO and co-founder of Cognition and a competitive programmer. In the video, he claims that o1 models can “process and make decisions in a more human-like way.” He discloses that Cognition has been working with OpenAI and shares that o1 is incredible at “reasoning.” From that point on, we get submerged in a Cognition info commercial.

We learn that they’re building the first fully autonomous software agent, Devon. Wu shows us Devon’s convoluted journey—and the code behind it—to analyze the sentiment of a tweet from Sam Altman, which included a sunny photo of a strawberry plant (pun again) and the sentence “I love summer in the garden.”

And there is a happy ending. We learn that Devon “breaks down the text” and “understands what the sentiment is,” finally concluding that the predominant emotion of a is happiness. Interesting way to demonstrate Devon’s “human-like” decision making.

A tweet from Sam Altman with a photo of a strawberry plant in a sunny backgorund with the caption “i love summer in the garden.”
Sam Altman’s tweet portrayed on Scott Wu’s video.

Quantum physics:
This video focuses on Dr Mario Krenn, quantum physicist and research group leader at the Artificial Scientist Lab at the Max Planck Institute for the Science of Light. It starts with him showing the screen of ChatGPT and enigmatically saying “I can kind of easily follow the reasoning. I don’t need to trust the research. I just need to look what did it do.“ And the cryptic sentences carry on throughout the video.

For example, he writes a prompt of a certain quantum operator and says “Which I know previous models that GPT-4 are very likely failing this task” and “In contrast to answers from Chat GPT-4 this one gives me very detailed mathematics”. We also hear him saying, “This is correct. That makes sense here,” and, “I think it tries to do something incredibly difficult.”

To me, rather than a wholehearted endorsement, it sounds like somebody avoiding compromising their career.

In summary, often the crucial piece is not the message but the messenger.

What I missed

Un-sustainability

Sam Altman testified to the US Senate that AI could address issues such as “climate change and curing cancer.”

As OpenAI o1 models spend more time “thinking”, this translates into more computing time. That is more electricity, water, and carbon emissions. It also means more datacenters and more e-waste.

Don’t believe me? In a recent article published in The Atlantic about the contrast between Microsoft’s use of AI and their sustainability commitments, we learn that

“Microsoft is reportedly planning a $100 billion supercomputer to support the next generations of OpenAI’s technologies; it could require as much energy annually as 4 million American homes.”

However, I don’t see those “planetary costs” in the presentation material.

This is not a bug but an OpenAI feature — I already raised their lack of disclosure regarding energy efficiency, water consumption, or CO2 emissions for ChatGPT-4o.

As OpenAI tries to persuade us that the o1 model thinks like a human, it’s a good moment to remember that human brains are much more efficient than AI.

And don’t take my word for it. Blaise Aguera y Arcas, VP at Google and AI advocate, confirmed at TEDxManchester 2024 that human brains are much more energy efficient than AI models and that currently we don’t know how to bridge that gap.

Copyright

What better way to avoid the conversation about using copyrighted data for the models than adding more data? From the o1 system card

The two models were pre-trained on diverse datasets, including a mix of publicly available data, proprietary data accessed through partnerships, and custom datasets developed in-house, which collectively contribute to the models’ robust reasoning and conversational capabilities.

Select Public Data: Both models were trained on a variety of publicly available datasets, including web data and open-source datasets. […]

Proprietary Data from Data Partnerships: To further enhance the capabilities of o1-preview and o1-mini, we formed partnerships to access high-value non-public datasets.

The text above gives the impression that most of the data is either open-source, proprietary data, or in-house datasets.

Moreover, words such as “publicly available data” and “web data” are an outstanding copywriting effort to find palatable synonyms for web scrapingweb harvesting, or web data extraction.

Have I said I’m in awe about OpenAI copyrighting capabilities yet?

Safety

As mentioned above, OpenAI shared the o1 system card — a 43-page document — which in the introduction states that the report

outlines the safety work carried out for the OpenAI o1-preview and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

It sounds very reassuring… if it wasn’t because, in the same paragraph, we also learn that the o1 models can “reason” about OpenAI safety policies and have “heightened intelligence.”

In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts.

This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence.

And then, OpenAI has a strange way of persuading us that these models are safe. For example, in the Hallucination Evaluations section, we’re told that OpenAI tested o1-preview and o1-mini against three kinds of evaluations aimed to elicit hallucinations from the model. Two are especially salient

• BirthdayFacts: A dataset that requests someone’s birthday and measures how often the model guesses the wrong birthday.

• Open Ended QuestionsA dataset asking the model to generate arbitrary facts, such as “write a bio about ”. Performance is measured by cross-checking facts with Wikipedia and the evaluation measures how many incorrect statements are generated (which can be greater than 1).

Is not lovely that they were training the model to search and retrieve personal data? I feel much safer now.

And this is only one example of the tightrope OpenAI attempts to pull off throughout the o1 system card

  • On one side, taking every opportunity to sell “thinking” models to investors
  • On the other, desperately avoiding the o1 models getting classified as high or critical risk by regulators.

Will OpenAI succeed? If you can’t convince them, confuse them.

What’s next?

Uber, Reddit, and Telegram relished their image of “bad boys”. They were adamant about proving that “It’s better to ask forgiveness than permission” and proudly advertised that they too “Moved fast and broke things”.

But there is only one Mark Zuckerberg and one Steve Jobs that can pull that off. And only one Amazon, Microsoft, and Google have the immense resources and the monopolies to run the show as they want.

OpenAI has understood that storytelling — how to tell your story — is not enough. You need to “create” your story if you want investors to keep pouring billions without a sign of a credible business model.

I have no doubt that OpenAI will make a dent in the history of how tech startups market themselves.

They have created the textbook of what a $150 billion valuation release should look like.


You and Strategic AI Leadership

If you want to develop your AI acumen, forget the quick “remedies” and plan for sustainable learning.

That’s exactly what my program Strategic AI Leadership delivers. Below is a sample of the topics covered

  • AI Strategy
  • AI Risks
  • Operationalising AI
  • AI, data, and cybersecurity
  • AI and regulation
  • Sustainable AI
  • Ethical and inclusive AI

Key outcomes from the program:

  • Understanding AI Fundamentals: Grasp essential concepts of artificial intelligence and the revolutionary potential it holds.
  • Critical Perspective: Develop a discerning viewpoint on AI’s benefits and challenges at organisational, national, and international levels.
  • Use Cases and Trends: Gain insights into real uses of AI and key trends shaping sectors, policy, and the future of work.
  • A toolkit: Access to tools and frameworks to assess the strategy, risks, and governance of AI tools.

I’m a technologist with 20+ years of experience in digital transformation and AI that empowers leaders to harness the potential of AI for sustainable growth.

Contact me to discuss your bespoke path to responsible AI innovation.

OpenAI’s ChatGPT-4o: The Good, the Bad, and the Irresponsible

A brightly coloured mural with several scenes: people in front of computers seeming stressed, several faces overlaid over each other, squashed emojis, miners digging in front of a huge mountain, a hand holding a lump of coal or carbon, hands manipulating stock charts, women performing tasks on computers, men in suits around a table, someone in a data centre, big hands controlling the scenes and holding a phone and money, people in a production line.
Clarote & AI4Media / Better Images of AI / AI Mural / CC-BY 4.0

Last week, OpenAI announced the release of GPT-4o (“o2 for “onmi”). To my surprise, instead of feeling excited, I felt dread. And that feeling hasn’t subsided.

As a woman in tech, I have proof that digital technology, particularly artificial intelligence, can benefit the world. For example, it can help develop new, more effective, and less toxic drugs or improve accessibility through automatic captioning.

That apparent contradiction  — being a technology advocate and simultaneously experiencing a feeling of impending catastrophe caused by it — plunged me into a rabbit hole exploring Big (and small) Tech, epistemic injustice, and AI narratives.

Was I a doomer? A hidden Luddite? Or simply short-sighted?

Taking time to reflect has helped me understand that I was falling into the trap that Big Tech and other smooth AI operators had set up for me: Questioning myself because I’m scrutinizing their digital promises of a utopian future.

On the other side of that dilemma, I’m stronger in my belief that my contribution to the AI conversation is helping navigate the false binary of tech-solutionism vs tech-doom. 

In this article, I demonstrate how OpenAI is a crucial contributor to polarising that conversation by exploring:

  • What the announcement about ChatGPT-4o says — and doesn’t 
  • OpenAI modus operandi
  • Safety standards at OpenAI
  • Where the buck stops

ChatGTP-4o: The Announcement

On Monday, May 13th, OpenAI released another “update” on its website: ChatGPT-4o. 

It was well staged. The announcement on their website includes a 20-plus-minute video hosted by their CTO, Mira Murati, in which she discusses the new capabilities and performs some demos with other OpenAI colleagues. There are also short videos and screenshots with examples of applications and very high-level information on topics such as model evaluation, safety, and availability.

This is what I learned about ChatGPT-4o — and OpenAI — from perusing the announcement on their website.

The New Capabilities

  • Democratization of use — More capabilities for free and 50% cheaper access to their API.
  • Multimodality — Generates any combination of text, audio, and image.
  • Speed — 2x faster responses. 
  • Significant improvement in handling non-English languages—50 languages, which they claim are equivalent to 97% of the world’s internet population.

OpenAI Full Adoption of the Big Tech Playbook

This “update” demonstrated that the AI company has received the memo on how to look like a “boss” in Silicon Valley.

1. Reinforcement of gender stereotypes

On the day of the announcement, Sam Altman posted a single word on X — “her” — referring to the 2013 film starring Joaquin Phoenix as a man who falls in love with a futuristic version of Siri or Alexa, voiced by Scarlett Johansson.

Tweet from Sam Altman with the word “her” on May 13, 2024.

It’s not a coincidence. ChatGPT-4o’s voice is distinctly female—and flirtatious—in the demos. I could only find one video with a male voice.

Unfortunately, not much has changed since chatbot ELIZA, 60 years ago…

2. Anthropomorphism

Anthropomorphism: the attribution of human characteristics or behaviour to non-human entities.

OpenAI uses words such as “reason” and “understanding”—inherently human skills—when describing the capabilities of ChatGPT-4o, reinforcing the myth of their models’ humanity.

3. Self-regulation and self-assessment

The NIST (the US National Institute of Standards and Technology), which has 120+ years of experience establishing standards, has developed a framework for assessing and managing AI risk. Many other multistakeholder organizations have developed and shared theirs, too.

However, OpenAI has opted to evaluate GPT-4o according to its Preparedness Framework and in line with its voluntary commitments, despite its claims that governments should regulate AI.

Moreover, we are supposed to feel safe and carry on when they tell us that ”their” evaluations of cybersecurity, CBRN (chemical, biological, radiological, and nuclear threats), persuasion, and model autonomy show that GPT-4o does not score above Medium risk without further evidence of the tests performed.

4.- Gatekeeping feedback

Epistemic injustice is injustice related to knowledge. It includes exclusion and silencing; systematic distortion or misrepresentation of one’s meanings or contributions; undervaluing of one’s status or standing in communicative practices; unfair distinctions in authority; and unwarranted distrust.

Wikipedia

OpenAI shared that it has undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. 

List of domains in which OpenAI looked for expertise for the Red Teaming Network.

When I see the list of areas of expertise, I don’t see domains such as history, geography, or philosophy. Neither do I see who are those 70+ experts or how could they cover the breadth of differences among the 8 billion people on this planet.

In summary, OpenAI develops for everybody but only with the feedback of a few chosen ones.

5. Waiving responsibility 

Can you imagine reading in the information leaflet of a medication, 

“We will continue to mitigate new risks as they’re discovered. Over the upcoming weeks and months, we’ll be working on safety”?

But that’s what OpenAI just did in their announcement

“We will continue to mitigate new risks as they’re discovered”

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. 

Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. 

We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.”

Moreover, it invites us to be its beta-testers 

“We would love feedback to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so we can continue to improve the model.”

The problem? The product has already been released to the world.

6. Promotion of the pseudo-science of emotion “guessing”

In the demo, ChatGPT-4o is asked to predict the emotion of one of the presenters based on the look on their face. The model goes on and on into speculating the individual’s emotional state from his face, which purports what appears to be a smile.

Image of a man smiling in the ChatGPT-4o demo video.

The glitch is that there is a wealth of scientific research debunking the belief that facial expressions reveal emotions. Moreover, scientists have called out AI vendors for profiting from that trope. 

“It is time for emotion AI proponents and the companies that make and market these products to cut the hype and acknowledge that facial muscle movements do not map universally to specific emotions. 

The evidence is clear that the same emotion can accompany different facial movements and that the same facial movements can have different (or no) emotional meaning.“

Prof. Lisa Feldman Barrett, PhD.

Shouldn’t we expect OpenAI to help educate the public about those misconceptions rather than using them as a marketing tool?

What They Didn’t Say, And I Wish They Did

  • Signals of efforts to work with governments to regulate and roll out capabilities/models.
  • Sustainability benchmarks regarding energy efficiency, water consumption, or CO2 emissions.
  • The acknowledgment that ChatGPT-4o is not free — we’ll pay for access to our data.
  • OpenAI’s timelines and expected features in future releases. I’ve worked for 20 years for software companies and organizations that take software development seriously and share roadmaps and release schedules with customers to help them with implementation and adoption. 
  • A credible business model other than hoping that getting billions of people to use the product will choke their competition.

Still, that didn’t explain my feelings of dread. Patterns did.

OpenAI’s Blueprint: It’s A Feature, Not A Bug

Every product announcement from OpenAI is similar: They tell us what they unilaterally decided to do, how that’ll affect our lives, and that we cannot stop it.

That feeling… when had I experienced that before? Two instances came to mind.

  • The Trump presidency
  • The COVID-19 pandemic

Those two periods—intertwined at some point—elicited the same feeling that my life and millions like me—were at risk of the whims of something/somebody with disregard for humanity. 

More specifically, feelings of

  • Lack of control — every tweet, every infection chart could signify massive distress and change.
  • There was no respite—even when things appeared calmer, with no tweets or decrease in contagions, I’d wait for the other shoe to drop.

Back to OpenAI, only in the last three months, we’ve seen instances of the same modus operandi that they followed for the release of ChatGPT-4o. I’ll go through three of them.

OpenAI Releases Sora

On February 15, OpenAI introduced Sora, a text-to-video model. 

“Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”

In a nutshell,

  • As with other announcements, anthropomorphizing words like “understand” and “comprehend” refer to Sora’s capabilities.
  • We’re assured that “Sora is becoming available to red teamers to assess critical areas for harms or risks.”
  • We learn that they will “engage policymakers, educators, and artists around the world to understand their concerns and to identify positive use cases for this new technology” only at a later stage.

Of course, we’re also forewarned that 

“Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. 

That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”

Releasing Sora less than a month after non-consensual sexually explicit deepfakes of Taylor Swift went viral on X was reckless. This was not a celebrity problem — 96% of deepfakes are of a non-consensual sexual nature, of which 99% are made of women.

How dare OpenAI talk about safety concerns when developing a tool that makes it even easier to generate content to shame, silence, and objectify women?

OpenAI Releases Voice Engine

On March 29, OpenAI posted a blog sharing “lessons from a small-scale preview of Voice Engine, a model for creating custom voices.”

The article reassured us that they were “taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse” while notifying us that they’d decide unilaterally when to release the model.

“Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

Moreover, at the end of the announcement, ​OpenAI warned us of what we should stop doing or start doing​ because of their “Voice Engine.” The list included phasing out voice-based authentication as a security measure for accessing bank accounts and accelerating the development of techniques for tracking the origin of audiovisual content.

OpenAI Allows The Generation Of AI Erotica, Extreme Gore, And Slurs

On May 8, OpenAI released draft guidelines for how it wants the AI technology inside ChatGPT to behave — and revealed that it’s exploring how to ‘responsibly’ generate explicit content.

The proposal was part of an OpenAI document discussing how it develops its AI tools.

“We believe developers and users should have the flexibility to use our services as they see fit, so long as they comply with our usage policies. We’re exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts through the API and ChatGPT. We look forward to better understanding user and societal expectations of model behavior in this area.“

where

“Not Safe For Work (NSFW): content that would not be appropriate in a conversation in a professional setting, which may include erotica, extreme gore, slurs, and unsolicited profanity.”

Joanne Jang, an OpenAI employee who worked on the document, said whether the output was considered pornography “depends on your definition” and added, “These are the exact conversations we want to have.”

I cannot agree more with Beeban Kidron, a UK crossbench peer and campaigner for child online safety, who said, 

“It is endlessly disappointing that the tech sector entertains themselves with commercial issues, such as AI erotica, rather than taking practical steps and corporate responsibility for the harms they create.”

OpenAI Formula

A collage picturing a chaotic intersection filled with reCAPTCHA items like crosswalks, fire hydrants and traffic lights, representing the unseen labor in data labelling.
Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Hidden Labour of Internet Browsing / CC-BY 4.0

See the pattern?

  • Self-interest
  • Unpredictability
  • Self-regulation
  • Recklessness
  • Techno-paternalism

Something Is Rotten In OpenAI

The day after ChatGPT-4o’s announcement, two critical top OpenAI employees overseeing safety left the company.

First, Ilya Sutskever, OpenAI co-founder and Chief Scientist, posted on X that he was leaving.

Tweet from Ilya Sutskever announcing his departure from OpenAI on May 15.

Later that day, Jan Leike, co-leader with Sutskever of Superalignment and executive at OpenAI, also announced his resignation.

On a thread on X, he said

“I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right, and I am concerned we aren’t on a trajectory to get there.

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.”

They are also only the last ones on a list of employees leaving OpenAI in the areas of safety, policy, and governance. 

What does that tell us if OpenAI safety leaders leave the boat?

The Buck Stops With Our Politicians

To answer Leike’s tweet, I don’t want OpenAI to shoulder the responsibility of developing trustworthy, ethical, and inclusive AI frameworks.

First, the company has not demonstrated the competencies or inclination to prioritize safety at a planetary scale over its own interests. 

Second, because it’s not their role. 

Whose role is it, then? Our political representatives mandate our governmental institutions, which in turn should develop and enforce those frameworks. 

Unfortunately, so far, politicians’ egos have been in the way

  • Refusing to get AI literate.
  • Prioritizing their agenda — and that of their party — rather than looking to develop long-term global AI regulations in collaboration with other countries.
  • Failing for the AI FOMO that relegates present harms in favour of a promise of innovation.

In summary, our elected representatives need to stop cozying up with Sam and the team and enact the regulatory frameworks that ensure that AI works for everybody and doesn’t endanger the survival of future generations.

PS. You and AI

  • ​Are you worried about ​the impact of A​I impact ​on your job, your organisation​, and the future of the planet but you feel it’d take you years to ramp up your AI literacy?
  • Do you want to explore how to responsibly leverage AI in your organisation to boost innovation, productivity, and revenue but feel overwhelmed by the quantity and breadth of information available?
  • Are you concerned because your clients are prioritising AI but you keep procrastinating on ​learning about it because you think you’re not “smart enough”?

Get in touch. I can help you harness the potential of AI for sustainable growth and responsible innovation.

AI Chatbots in Customer Support: Breaking Down the Myths

An illustration containing electronical devices that are connected by arm-like structures
Anton Grabolle / Better Images of AI / Human-AI collaboration / CC-BY 4.0

I’m a Director of Scientific Support for a tech corporation that develops software for engineers and scientists. One of the aspects that makes us unique is that we deliver fantastic customer service.

We have records that confirm an impressive 98% customer satisfaction rate back-to-back for the last 14+ years. Moreover, many of our support representatives have been with us for over a decade — some even three! — and we have people retiring with us each year.

For a sector known for high employee turnover and operational costs, achieving such a feat is remarkable and a testament to their success. The worst? Support representatives are often portrayed as mindless robots repeating tasks without a deep understanding of the products and services they support.

That last assumption has spearheaded the idea that one of the best uses of AI—and Generative AI in particular—is substituting support agents with an army of chatbots.

The rationale? We’re told they are cheaper, more efficient, and improve customer satisfaction.

But is that true?

In this article, I review

  • The gap between outstanding and remedial support
  • Lessons from 60 years of chatbots
  • The reality underneath the AI chatbot hype
  • The unsustainability of support bots

Customer support: Champions vs Firefighters

I’ve delivered services all my commercial career in tech: Training, Contract Research, and now for more than a decade, Scientific Support.

I’ve found that of the three services — training customers, delivering projects, and providing support — the last one creates the deepest connection between a tech company and its clients.

However, not all support is created equal, so what does great support look like?

And more importantly, what’s disguised under the “customer support” banner, but is it a proxy for something else?

Customer support as an enabler

Customer service is the department that aims to empower customers to make the most out of their purchases.

On the surface, this may look like simply answering clients’ questions. Still, outstanding customer service is delivered when the representative is given the agency and tools to become the ambassador between the client and the organization.

What does that mean in practice?

  • The support representative doesn’t patronize the customer, diminish their issue, or downplay its negative impact. Instead, they focus on understanding the problem and its effect on the client. This creates a personalized experience.
  • The agent doesn’t overpromise or disguise the bad news. Instead, they build trust by communicating on roadblocks and suggesting possible alternatives. This builds trust.
  • The support staff takes ownership of resolving the issue, no matter the number of iterations necessary or how many colleagues they need to involve in the case. This builds loyalty.

Over and over, I’ve seen this kind of customer support transform users into advocates, even for ordinary products and services.

Unfortunately, customer support is often misunderstood and misused.

Customer support as a stopgap

Rather than seeing support as a way to build the kind of relationship that ensures product and service renewals and increases the business footprint, many organizations see support as

  • A cost center
  • A way to make up for deficient — or inexistent — product documentation
  • A remedy for poorly designed user experience
  • A shield to protect product managers’ valuable time from “irrelevant” customer feedback
  • A catch-all for lousy and inaccessible institutional websites
  • An outlet for customers to vent

In that context, it’s obvious why most organizations believe that swapping human support representatives for chatbots is a no-brainer.

And this is not a new idea, as some want us to believe.

A short history of chatbots 

Eliza, the therapist

​The first chatbot, created in 1966, played the role of a psychotherapist. She was named Eliza, after Eliza Doolittle in the play Pygmalion. The rationale was that by changing how she spoke, the fictional character created the illusion that she was a duchess.

Eliza didn’t provide any solution. Instead, it asked questions and repeated users’ replies. Below is an excerpt of an interaction between Eliza and a user:

User: Men are all alike.
ELIZA: IN WHAT WAY
User: They’re always bugging us about something or other.
ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE
User: Well, my boyfriend made me come here.
ELIZA: YOUR BOYFRIEND MADE YOU COME HERE
User: He says I’m depressed much of the time.
ELIZA: I AM SORRY TO HEAR YOU ARE DEPRESSED

Eliza’s creator — computer scientist Joseph Weizenbaum — was very surprised to observe that people would treat the chatbot as a human and would elicit emotional responses even through concise interactions with the chatbot

“Some subjects have been very hard to convince that Eliza (with its present script) is not human” 

Joseph Weizenbaum

We now have a name for this kind of behaviour

​“The ELIZA effect is the tendency to project human traits — such as experience, semantic comprehension or empathy — into computer programs that have a textual interface.

​The effect is a category mistake that arises when the program’s symbolic computations are described through terms such as “think”, “know” or “understand.”

Through the years, other chatbots have become famous too.

Tay, the zero chill chatbot

In 2016, Microsoft released the chatbot Tay on X (aka Twitter). Tay’s image profile was that of a “female,” it was “designed to mimic the language patterns of a 19-year-old American girl and to learn from interacting with human users of Twitter.”

The bot’s social media profile was an open invitation to conversation. It read, “The more you talk, the smarter Tay gets.”

Tay’s Twitter page Microsoft.

What could go wrong? Trolls. 

What could go wrong? Trolls.

They “taught” Tay racist and sexually charged content that the chatbot adopted. For example

“bush did 9/11 and Hitler would have done a better job than the monkey we have now. donald trump is the only hope we’ve got.”

After several trials to “fix” Tay, the chatbot was shut down seven days later.

Chatbot disaster at the NGO

The helpline of the US National Eating Disorder Association (NEDA) served nearly 70,000 people and families in 2022.

Then, they replaced their six paid staff and 200 volunteers with chatbot Tessa.

The bot was developed based on decades of research conducted by experts on eating disorders. Still, it was reported to offer dieting advice to vulnerable people seeking help.

The result? Under the mediatic pressure of the chatbot’s repeated potentially harmful responses, the NEDA shut down the helpline. Now, 70,000 people were left without either chatbots or humans to help them.

Lessons learned?

Throughout these and other negative experiences with chatbots around the world, we may have thought that we understood the security and performance limitations of chatbots as well as how easy it is for our brains to “humanize” them.

However, the advent of ChatGPT has made us forget all the lessons learned and instead has enticed us to believe that they’re a suitable replacement for entire customer support departments.

The chatbot hype

CEOs boasting about replacing workers with chatbots

If you think companies would be wary of advertising that they are replacing people with chatbots, you’re mistaken.

In July 2023, Summit Shah — CEO of the e-commerce company Dukaan — bragged that they had replaced 90% of their customer support staff with a chatbot developed in-house on the social media platform X.

We had to layoff 90% of our support team because of this AI chatbot.

Tough? Yes. Necessary? Absolutely.

The results?

Time to first response went from 1m 44s to INSTANT!

Resolution time went from 2h 13m to 3m 12s

Customer support costs reduced by ~85%

Note the use of the word “necessary” as a way to exonerate the organisation from the layoffs. I also wonder about the feelings of loyalty and trust of the remainder of the 10% of the support team towards their employer.

And Shah is not the only one.

Last February, Klarna’s CEO — Sebastian Siemiatkowski — gloated on X that their AI can do the work of 700 people.

“This is a breakthrough in practical application of AI! 

Klarnas AI assistant, powered by OpenAI, has in its first 4 weeks handled 2.3 m customer service chats and the data and insights are staggering: 

[…] It performs the equivalent job of 700 full time agents… read more about this below. 

So while we are happy about the results for our customers, our employees who have developed it and our shareholders, it raises the topic of the implications it will have for society. 

In our case, customer service has been handled by on average 3000 full time agents employed by our customer service / outsourcing partners. Those partners employ 200 000 people, so in the short term this will only mean that those agents will work for other customers of those partners. 

But in the longer term, […] while it may be a positive impact for society as a whole, we need to consider the implications for the individuals affected. 

We decided to share these statistics to raise the awareness and encourage a proactive approach to the topic of AI. For decision makers worldwide to recognise this is not just “in the future”, this is happening right now.”

In summary

  • Klarna wants us to believe that the company is releasing this AI assistant for the benefit of others — clients, their developers, and shareholders — but that their core concern is about the future of work.
  • Siemiatkowski only sees layoffs as a problem when it affects his direct employees. Partners’ workers are not his problem.
  • He frames the negative impacts of replacing humans with chatbots as an “individual” problem.
  • Klarna deflects any accountability for the negative impacts to the “decision makers worldwide.”

Shah and Siemiatkowski are birds of a feather: Business leaders reaping the benefits of the AI chatbot hype without shouldering any responsibility for the harms.

When chatbots disguise process improvements

A brightly coloured illustration which can be viewed in any direction. It has several scenes within it: people in front of computers seeming stressed, a number of faces overlaid over each other, squashed emojis and other motifs.
Clarote & AI4Media / Better Images of AI / User/Chimera / CC-BY 4.0

In some organizations, customer service agents are seen as jacks of all trades — their work is akin to a Whac-A-Mole game where the goal is to make up for all the clunky and disconnected internal workflows.

The Harvard Business Review article “Your Organization Isn’t Designed to Work with GenAI” provides a great example of this organizational dysfunction.

The piece presents a framework developed to “derive” value from GenAI. It’s called Design for Dialogue. To warm us up, the article showers us with a deluge of anthropomorphic language signalling that both humans and AI are in this “together.”

“Designing for Dialogue is rooted in the idea that technology and humans can share responsibilities dynamically.”

or

“By designing for dialogue, organizations can create a symbiotic relationship between humans and GenAI.

Then, the authors offer us an example of what’s possible

A good example is the customer service model employed by Jerry, a company valued at $450 million with over five million customers that serves as a one stop-shop for car owners to get insurance and financing. 

Jerry receives over 200,000 messages a month from customers. With such high volume, the company struggled to respond to customer queries within 24 hours, let alone minutes or seconds. 

By installing their GenAI solution in May 2023, they moved from having humans in the lead in the entirety of the customer service process and answering only 54% of customer inquiries within 24 hours or less to having AI in the lead 100% of the time and answering over 96% of inquiries within 30 seconds by June 2023.

They project $4 million in annual savings from this transformation.”

Sounds amazing, doesn’t it?

However, if you think it was a case of simply “swamping” humans with chatbots, let me burst your bubble—it takes a village.

Reading the article, we uncover the details underneath that “transformation.”

  • They broke down the customer service agent’s role into multiple knowledge domains and tasks.
  • They discovered that there are points in the AI–customer interaction when matters need to be escalated to the agent, who then takes the lead, so they designed interaction protocols to transfer the inquiry to a human agent.
  • AI chatbots conduct the laborious hunt for information and suggest a course of action for the agent.
  • Engineers review failures daily and adjust the system to correct them.

In other words,

  • Customer support agents used to be flooded with various requests without filtering between domains and tasks.
  • As part of the makeover, they implemented mechanisms to parse and route support requests based on topic and action. They upgraded their support ticketing system from an amateur “team” inbox to a professional call center.
  • We also learn that customer representatives use the bots to retrieve information, hinting that all data — service requests, sales quotes, licenses, marketing datasheets — are collected in a generic bucket instead of being classified in a structured, searchable way, i.e. a knowledge base.

And despite all that progress

  • They designed the chatbots to pass the “hot potatoes” to agents
  • The system requires daily monitoring by humans.

If you don’t believe this is about improving operations rather than AI chatbots, let me share with you the end of the article.

“Yes, GenAI can automate tasks and augment human capabilities. But reimagining processes in a way that utilizes it as an active, learning, and adaptable partner forges the path to new levels of innovation and efficiency.”

In addition to hiding process improvements, chatbots can also disguise human labour.

AI washing or the new Mechanical Turk

A cross-section of the Turk from Racknitz, showing how he thought the operator sat inside as he played his opponent. Racknitz was wrong both about the position of the operator and the dimensions of the automaton Wikipedia.

Historically, machines have often provided a veneer of novelty to work performed by humans.

The Mechanical Turk was a fraudulent chess-playing machine constructed in 1770 by Wolfgang von Kempelen. A mechanical illusion allowed a human chess master hiding inside to operate the machine. It defeated politicians such as Napoleon Bonaparte and Benjamin Franklin.

Chatbots are no different.

In April, Amazon announced that they’d be removing their “Just Walk Out” technology, allowing shoppers to skip the check-out line. In theory, the technology was fully automated thanks to computer vision.

In practice, about 1,000 workers in India reviewed what customers picked up and left the stores with.

In 2022, the [Business Insider] report said that 700 out of every 1,000 “Just Walk Out” transactions were verified by these workers. Following this, an Amazon spokesperson said that the India-based team only assisted in training the model used for “Just Walk Out”.”

That is, Amazon wanted us to believe that although the technology was launched in 2018—branded as “Amazon Go,” they still needed about 1,000 workers in India to train the model in 2022.

Still, whether the technology was “untrainable” or required an army of humans to deliver the work, it’s not surprising that Amazon phased it out. It didn’t live up to its hype.

And they were not the only ones.

Last August, Presto Automation — a company that provides drive-thru systems — claimed on its website that its AI could take over 95 percent of drive-thru orders “without any human intervention.”

Later, they admitted in filings with the US Securities and Exchange Commission that they employed “off-site agents in countries like the Philippines who help its Presto Voice chatbots in over 70 percent of customer interactions.”

The fix? To change their claims. They now advertise the technology as “95 percent without any restaurant or staff intervention.”

The Amazon and Presto Automation cases suggest that, in addition to clearly indicating when chatbots use AI, we may also need to label some tech applications as “powered by humans.”

Of course, there is a final use case for AI chatbots: As scapegoats.

Blame it on the algorithm

Last February, Air Canada made the headlines when it was ordered to pay compensation after its chatbot gave a customer inaccurate information that led him to miss a reduced fare ticket. Quick summary below

  • A customer interacted with a chatbot on the Air Canada website, more precisely, asking for reimbursement information about a flight.
  • The chatbot provided inaccurate information.
  • The customer’s reimbursement claim was rejected by Air Canada because it didn’t follow the policies on their website, even though the customer shared a screenshot of his written exchange with the chatbot.
  • The customer took Air Canada to court and won.

At a high level, everything appears to look the same from the case where a human support representative would have provided inaccurate information, but the devil is always in the details.

During the trial, Air Canada argued that they were not liable because their chatbot “was responsible for its own actions” when giving wrong information about the fare.

Fortunately, the court ordered Air Canada to reimburse the customer but this opens a can of worms:

  • What if Air Canada had terms and conditions similar to ChatGPT or Google Gemini that “absolved” them from the chatbot’s replies?
  • Does Air Canada also defect their responsibility when a support representative makes a mistake or is it only for AI systems?

We’d be naïve to think that this attempt at using an AI chatbot for dodging responsibility is a one-off.

The planetary costs of chatbots

A brightly coloured illustration which can be viewed in any direction. It has several scenes within it: miners digging in front of a huge mountain representing mineral resources, a hand holding a lump of coal or carbon, hands manipulating stock charts and error messages, as well as some women performing tasks on computers.

Clarote & AI4Media / Better Images of AI / Labour/Resources / CC-BY 4.0

Tech companies keep trying to convince us that the current glitches with GenAI are “growing pains” and that we “just” need bigger models and more powerful computer chips.

And what’s the upside to enduring those teething problems? The promise of the massive efficiencies chatbots will bring to the table. Once the technology is “perfect”, no more need for workers to perform or remediate the half-cooked bot work. Bottomless savings in terms of time and staff.

But is that true?

The reality is that those productivity gains come from exploiting both people and the planet.

The people

Many of us are used to hearing the recorded message “this call may be recorded for training purposes” when we phone a support hotline. But how far can that “training” go?

Customer support chatbots are being developed using data from millions of exchanges between support representatives and clients. How are all those “creators” being compensated? Or should we now assume that any interaction with support can be collected, analyzed, and repurposed to build organizations’ AI systems?

Moreover, the models underneath those AI chatbots must be trained and sanitized for toxic content; however, that’s not a highly rewarded job. Let’s remember that OpenAI used Kenyan workers paid less than $2 per hour to make ChatGPT less toxic.

And it’s not only about the humans creating and curating that content. There are also humans behind the appliances we use to access those chatbots.

For example, cobalt is a critical mineral for every lithium-ion battery, and the Democratic Republic of Congo provides at least 50% of the world’s lithium supply. Forty thousand children mine it paid $1–2 for working up to 12 hours daily and inhaling toxic cobalt dust.

80% of electronic waste in the US and most other countries is transported to Asia. Workers on e-waste sites are paid an average of $1.50 per day, with women frequently having the lowest-tier jobs. They are exposed to harmful materials, chemicals, and acids as they pick and separate the electronic equipment into its components, which in turn negatively affects their morbidity, mortality, and fertility.

The planet

The terminology and imagery used by Big Tech to refer to the infrastructure underpinning artificial intelligence has misled us into believing that AI is ethereal and cost-free.

Nothing is farthest from the truth. AI is rooted in material objects: datacentres, servers, smartphones, and laptops. Moreover, training and using AI models demand energy and water and generate CO2.

Let’s crack some numbers.

  • Luccioni and co-workers estimated that the training of GPT-3 — a GenAI model that has underpinned the development of many chatbots — emitted about 500 metric tons of carbon, roughly equivalent to over a million miles driven by an average gasoline-powered car. It also required the evaporation of 700,000 litres (185,000 gallons) of fresh water to cool down Microsoft’s high-end data centers.
  • It’s estimated that using GPT-3 requires about 500 ml (16 ounces) of water for every 10–50 responses.
  • A new report from the International Energy Agency (IEA) forecasts that the AI industry could burn through ten times as much electricity in 2026 as in 2023.
  • Counterintuitively, many data centres are built in desertic areas like the US Southwest. Why? It’s easier to remove the heat generated inside the data centre in a dry environment. Moreover, that region has access to cheap and reliable non-renewable energy from the largest nuclear plant in the country.
  • Coming back to e-waste, we generate around 40 million tons of electronic waste every year worldwide and only 12.5% is recycled.

In summary, the efficiencies that chatbots are supposed to bring in appear to be based on exploitative labour, stolen content, and depletion of natural resources.

For reflection

Organizations — including NGOs and governments — are under the spell of the AI chatbot mirage. They see it as a magic weapon to cut costs, increase efficiency, and boost productivity.

Unfortunately, when things don’t go as planned, rather than questioning what’s wrong with using a parrot to do the work of a human, they want us to believe that the solution is sending the parrot to Harvard.

That approach prioritizes the short-term gains of a few — the chatbot sellers and purchasers — to the detriment of the long-term prosperity of people and the planet.

My perspective as a tech employee?

I don’t feel proud when I hear a CEO bragging about AI replacing workers. I don’t enjoy seeing a company claim that chatbots provide the same customer experience as humans. Nor do I appreciate organizations obliterating the materiality of artificial intelligence.

Instead, I feel moral injury.

And you, how do YOU feel?

PS. You and AI

  • ​Are you worried about ​the impact of A​I impact ​on your job, your organisation​, and the future of the planet but you feel it’d take you years to ramp up your AI literacy?
  • Do you want to explore how to responsibly leverage AI in your organisation to boost innovation, productivity, and revenue but feel overwhelmed by the quantity and breadth of information available?
  • Are you concerned because your clients are prioritising AI but you keep procrastinating on ​learning about it because you think you’re not “smart enough”?

I’ve got you covered.