On 29th March, OpenAI – the company that develops ChatGPT and other Generative AI tools – released a blog post sharing “lessons from a small-scale preview of Voice Engine, a model for creating custom voices.”
More precisely
“a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.”
They reassure us that
“We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.”
And they warn us that they’ll make the decision unilaterally
“Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
Let’s explore why we should all be concerned.
The Generative AI mirage
In their release, OpenAI tells us all the great applications of this new tool
- Providing reading assistance
- Translating content
- Reaching global communities
- Supporting people who are non-verbal
- Helping patients recover their voice
Note for all those use cases, there are already alternatives that don’t have the downsides of recreating a voice clone.
We also learn that other organisations have been testing this capability successfully for a while now. The blog post assumes that we should trust OpenAI’s judgment implicitly. There is no supporting evidence detailing how those tests were run, what challenges were uncovered, and what mitigations were put in place as a consequence.
The caveat
But the most important information is at the end of the piece.
OpenAI warns us of what we should stop doing or start doing because of their “Voice Engine”
“Phasing out voice-based authentication as a security measure for accessing bank accounts and other sensitive information
Exploring policies to protect the use of individuals’ voices in AI
Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it’s always clear when you’re interacting with a real person or with an AI”
In summary, OpenAI has decided to develop a technology and plan to roll it out so they expect the rest of the world will adapt to it.
Techno-paternalism
To those of us who have been following OpenAI, the post announcing the development and active use of Voice Engine is not a bug but a feature.
Big Tech has a tradition of setting its own rules, denying accountability, and even refusing to cooperate with governments. Often, their defense has been that society either doesn’t understand the “big picture”, doesn’t deserve an explanation, or is stifling innovation by enacting the laws.
Some examples are
- Microsoft — In 2001, U.S. government accused Microsoft of illegally monopolizing the web browser market for Windows. Microsoft claimed that “its attempts to “innovate” were under attack by rival companies jealous of its success.”
- Apple — The Batterygate scandal affected people using iPhones in the 6, 6S, and 7 families. Customers complained that Apple had purposely slowed down their phones after they installed software updates to get them to buy a newer device. Apple countered that it was “a safety measure to keep the phones from shutting down when the battery got too low”.
- Meta (Facebook) — After the Cambridge Analytica scandal was uncovered, exposing that the personal data of about 50 million Americans had been harvested and improperly shared with a political consultancy, it took Mark Zuckerberg 5 days to reappear. Interestingly, he chose to publish a post on Facebook as a form of apology. Note that he also refused three times the invitation to testify in front of members of the UK Parliament.
- Google — Between 50 to 80 percent of people searching for porn deepfakes find their way to the websites and tools to create the videos or images via search. For example, in July 2023, around 44% of visits to Mrdeepfakes.com were via Google. Still, the onus is on the victims to “clean” the internet — Google requires them to manually submit content removal requests with the offending URLs.
- Amazon — They refused for years to acknowledge that their facial recognition algorithms to predict race and gender were biased against darker females. Instead of improving their algorithms, they chose to blame the auditor’s methodology.
OpenAI is cut from the same cloth. They apparently believe that if they develop the applications, they are entitled to set the parameters about how to use them— or not — and even change their mind as they see fit.
Let’s take their stand on three paramount issues that show us the gap between their actions and their values.
Open source
Despite their name — OpenAI — and initially being created as a nonprofit, they’ve been notorious for their inconsistent open-source practices. Still, each release has appeared to be an opportunity to lecture us about why society is much better off by leaving it to them to decide how to gatekeep their applications.
For example, Ilya Sutskever, OpenAI’s chief scientist and co-founder, said about the release of GPT-4 — not an open AI model — a year ago
“These models are very potent and they’re becoming more and more potent. At some point it will be quite easy, if one wanted, to cause a great deal of harm with those models. And as the capabilities get higher it makes sense that you don’t want want to disclose them.”
“If you believe, as we do, that at some point, AI — AGI — is going to be extremely, unbelievably potent, then it just does not make sense to open-source. It is a bad idea… I fully expect that in a few years it’s going to be completely obvious to everyone that open-sourcing AI is just not wise.”
However, the reluctant content suppliers for their models — artists, writers, journalists — don’t have the same rights to decide on the use of the material they have created. For example, let’s remember how Sam Altman shrugged off the claims of newspapers that OpenAI used their copyrighted material to train ChatGPT.
Safety
The release of Voice Engine comes from the same playbook that the unilateral decision to release their text-to-video model Sora to “red teamers” and “a number of visual artists, designers, and filmmakers“.
The blog post also gives us a high-level view of the safety measures that’ll be put in place
“For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others.
We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user.”
Let’s remember that OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. Who’ll make Sora less toxic this time?
Moreover, who’ll decide where’s the line between “mild” violence — apparently permitted —and “extreme” violence?
Sustainability
For all their claims that their “primary fiduciary duty is to humanity” is then surprising their disregard for the environmental impact of their models.
Sam Altman has been actively talking to investors, including the United Arab Emirates government, to raise funds for a tech initiative that would boost the world’s chip-building capacity, expand its ability to power AI, and cost several trillion dollars.
An OpenAI spokeswoman said
“OpenAI has had productive discussions about increasing global infrastructure and supply chains for chips, energy and data centers — which are crucial for AI and other industries that rely on them”
But nothing is free in the universe. A study conducted by Dr. Sasha Luccioni — Researcher and Climate Lead at Hugging Face — showed that training the 176 billion parameter LLM BLOOM emits at least 25 metric tons of carbon equivalents.
In the article, the authors also estimated that the training of GPT-3 — a 175 billion parameter model — emitted about 500 metric tons of carbon, roughly equivalent to over a million miles driven by an average gasoline-powered car. Why such a difference? Because, unlike BLOOM, GPT-3 was trained using carbon-intensive energy sources like coal and natural gas.
And that doesn’t stop there. Dr. Luccioni conducted further studies on the emissions associated with 10 popular Generative AI tasks.
- Generating 1,000 images was responsible for roughly as much carbon dioxide as driving the equivalent of 4.1 miles in an average gasoline-powered car.
- The least carbon-intensive text generation model was responsible for as much CO2 as driving 0.0006 miles in a similar vehicle.
- Using large generative models to create outputs was far more energy intensive than using smaller AI models tailored for specific tasks. For example, using a generative model to classify positive and negative movie reviews consumed around 30 times more energy than using a fine-tuned model created specifically for that task
Moreover, they discovered that the day-to-day emissions associated with using AI far exceeded the emissions from training large models.
And it’s not only emissions. The data centres where those models are trained and run need water as a refrigerant and in some cases as a source of electricity.
Professor Shaolei Ren from UC Riverside found that training GPT-3 in Microsoft’s high-end data centers can directly evaporate 700,000 liters (about 185,000 gallons) of fresh water. As for the use, Ren and his colleagues estimated that GPT-3 requires about 500 ml (16 ounces) of water for every 10–50 responses.
Four questions for our politicians
It’s time our politicians step up to the challenge of exercising stewardship of AI for the benefit of people and the planet.
I have four questions to get them going:
- Why are you allowing OpenAI to make decisions unilaterally on technology that affects us all?
- How can you shift from a reactive stand where you enable Big Tech like OpenAI to drive the regulation for technologies that impact key aspects of governance — from our individual rights to national cybersecurity — to becoming a proactive key player on decisions that impact society’s future?
- How can you make Big Tech accountable for the environmental planetary costs?
- How are you ensuring the public becomes digitally literate so they can develop their own informed views about the benefits and challenges of AI and other emergent technologies?
Back to you
How comfortable are you with OpenAI deciding on the use of Generative AI on behalf of humanity?
PS. You and AI
- Are you worried about the impact of AI impact on your job, your organisation, and the future of the planet but you feel it’d take you years to ramp up your AI literacy?
- Do you want to explore how to responsibly leverage AI in your organisation to boost innovation, productivity, and revenue but feel overwhelmed by the quantity and breadth of information available?
- Are you concerned because your clients are prioritising AI but you keep procrastinating on learning about it because you think you’re not “smart enough”?
I’ve got you covered.