The Advanced Voice Mode for ChatGPT, which OpenAI has begun to roll out, gives users a first glimpse at GPT-4o’s incredibly lifelike voice answers. This alpha version is available today to a select group of ChatGPT Plus users, with a broader release expected by fall 2024.
GPT-4o’s Impressive Voice
When OpenAI first unveiled GPT-4o’s voice in May, it garnered significant attention due to its fast responses and striking similarity to a real human voice. Named Sky, the voice resembled that of Scarlett Johansson, known for voicing an artificial assistant in the film “Her.”
After the demo, Johansson declined offers from CEO Sam Altman to use her voice and hired legal representation to protect her likeness. Although OpenAI denied using Johansson’s voice, the Sky voice was subsequently removed from the demo. In June, OpenAI announced a delay in Advanced Voice Mode’s release to enhance safety features.
Limited Features in Alpha Release
The upcoming alpha release will not include the video and screen-sharing features demonstrated in the Spring Update, which will be available at a later date. For now, GPT-4o’s voice remains a demo feature, but some premium users will gain early access.
Enhanced Conversational Abilities
ChatGPT now boasts advanced listening and speaking capabilities. Unlike the previous voice solution, which utilized three separate models for voice-to-text, text processing, and text-to-voice, GPT-4o consolidates these tasks into one streamlined process. This results in faster and more fluid conversations. Additionally, GPT-4o can detect emotional nuances in users’ voices, such as sadness or excitement.
Pilot Phase for ChatGPT Plus Users
During this pilot phase, ChatGPT Plus users will be among the first to experience the advanced voice mode. TechCrunch has not yet tested the feature but plans to review it upon availability.
Gradual Rollout and Monitoring
OpenAI is releasing the new voice feature gradually to closely monitor its use. Users in the Alpha group will get an email with usage instructions for the new functionality and a notice in the ChatGPT app.
Testing and Safety Measures
Since the demo, GPT-4o’s voice has been tested with over 100 external testers across 45 languages. A safety report on these tests is expected in early August. Advanced Voice Mode will offer only four preset voices—Juniper, Breeze, Cove, and Ember—created with professional voice actors. The Sky voice from the May demo is no longer available. OpenAI’s spokesperson, Lindsay McCallum, confirmed that ChatGPT cannot mimic other people’s voices and will block outputs that do not match these preset voices.
Precautions Against Deepfakes
To address deepfake concerns, OpenAI is implementing stringent precautions. This includes adding new filters to prevent the generation of music or other copyrighted audio. These measures respond to the legal risks faced by AI companies, such as those raised by ElevenLabs’s voice cloning technology, which was used to impersonate President Biden earlier this year.