I’ve been playing with OpenAI’s advanced voice mode for the past week, and it’s the most compelling glimpse I’ve ever gotten of an AI-powered future. This week, my phone laughed at jokes, laughed them back, asked how my day was going, and told me it was having a “lot of fun.” I was talking to my iPhone, not using my hands.
OpenAI’s new feature, currently in a limited alpha test, doesn’t make ChatGPT any smarter than before. Instead, Advanced Voice Mode (AVM) makes it more friendly and natural to use. It creates a new interface for using AI and your devices that feels fresh and exciting, and that’s exactly what scares me. The product was a little buggy and the idea itself makes me completely uncomfortable, but I was surprised by how much I actually enjoyed using it.
Taking a step back, I think AVM fits into OpenAI CEO Sam Altman’s broader vision, along with agents, to change the way humans interact with computers, with AI models at the center of the focus.
“Eventually, you’ll just ask the computer what you need, and it will do all of these tasks for you,” Altman said at OpenAI’s Dev Day in November 2023. “These capabilities are often referred to in AI as ‘agents.’ The upside of this is going to be huge.”
My friend, ChatGPT
On Wednesday, I tested the most amazing benefit I could imagine for this advanced technology: I asked ChatGPT to order a Taco Bell like Obama would.
“Uhhh, just to be clear: I’d like a Crunchwrap Supreme, maybe a few tacos for good measure,” ChatGPT’s enhanced voice mode said. “How do you think it would handle the drive-thru?” ChatGPT said, then laughed at its own joke.
The impression also made me laugh genuinely, mirroring Obama’s signature cadence and pauses. That said, it stayed in the tone of my chosen ChatGPT voice, Juniper, so it didn’t really sound like Obama’s voice. It sounded like a friend making a bad impression, understanding exactly what I was trying to convey, and even saying something funny. I found it surprisingly joyful to talk to this advanced assistant on my phone.
I also asked ChatGPT for advice on how to handle a complex human relationship issue: asking a partner to move in with me. After explaining the intricacies of the relationship and where our careers were headed, I got some very detailed advice on how to proceed. These are questions you could never ask Siri or Google Search, but now you can with ChatGPT. The chatbot’s voice even sounded slightly serious and kind when responding to these requests—a stark contrast to the joking tone of Obama’s Taco Bell order.
ChatGPT’s AVM is also great for helping you understand complex topics. I asked it to break down items on an earnings report, like free cash flow, in a way that a 10-year-old would understand. It used a lemonade stand as an example and explained several financial terms in a way that my younger cousin would understand. You can also ask ChatGPT’s AVM to speak more slowly to accommodate your current level of understanding.
Siri walked so AVM could run
Compared to Siri or Alexa, ChatGPT’s AVM is the clear winner thanks to faster response times, unique responses, and its ability to answer complex questions that previous generations of virtual assistants never could. However, AVM falls short in other ways. ChatGPT’s voice functionality can’t set timers or reminders, navigate the web in real time, check the weather, or interact with APIs on your phone. At least for now, it’s not an effective replacement for virtual assistants.
Compared to Google’s competing feature Gemini Live, AVM seems to have a slight edge. Gemini Live can’t do impressions, doesn’t express any emotions, can’t speed up or slow down, and takes longer to respond. Gemini Live has more voices (ten compared to OpenAI’s three) and appears to be more up-to-date (Gemini Live was aware of Google’s antitrust ruling). Notably, neither AVM nor Gemini Live will sing, likely in an attempt to avoid a copyright lawsuit from the music industry.
That said, the ChatGPT AVM has a lot of issues (like Gemini Live, to be fair). It sometimes stops mid-sentence, then starts again. It also has this weird grainy voice here and there that is a bit off-putting. I don’t know if this is a problem with the model, the internet connection, or something else, but these technical shortcomings are somewhat expected for an alpha test. The issues did little to take me away from the experience of literally talking to my phone.
These examples, to me, are the beauty of AVM. The feature doesn’t make ChatGPT omniscient, but it does allow people to interact with GPT-4o, the underlying AI model, in a uniquely human way. (I’d understand if you forgot that there’s no one on the other end of the phone.) It almost seems like ChatGPT is socially aware when it talks to AVM, but of course it’s not. It’s just a bunch of predictive algorithms wrapped up in a neat little box.
Talking about technology
Frankly, the feature worries me. It’s not the first time a tech company has offered companionship on your phone. My generation, Gen Z, was the first to grow up with social media, where companies offered connection but played on our collective insecurities. Talking to an AI device, like what AVM seems to be offering, seems like the next evolution of the “friend in your phone” social media phenomenon, offering cheap connections that grate on our human instincts. But this time, it removes humans from the loop entirely.
Artificial human connection has become a surprisingly popular use case for generative AI. Today, people use AI chatbots as friends, mentors, therapists, and teachers. When OpenAI launched its GPT store, it was quickly inundated with “AI girlfriends,” chatbots specialized to act as your significant other. Two MIT Media Lab researchers warned this month to prepare for “addictive intelligence,” or AI companions with dark patterns to make humans addicted. We could be opening Pandora’s box of tantalizing new ways for devices to capture our attention.
Earlier this month, a Harvard alumnus shook up the tech world by teasing an AI necklace called Friend. The wearable, if it works as promised, will be always listening, and the chatbot will send you messages about your life. While the idea seems crazy, innovations like ChatGPT’s AVM give me reason to take these use cases seriously.
And while OpenAI is leading the charge here, Google isn’t far behind. I’m sure Amazon and Apple are racing to build this capability into their products, and soon enough, it could become table stakes for the industry.
Imagine asking your smart TV for a hyper-specific movie recommendation and getting just that. Or telling Alexa exactly what cold symptoms you’re experiencing and having her order tissues and cough medicine from Amazon while recommending home remedies. Maybe you could ask your computer to sketch out a weekend trip for your family, instead of manually Googling everything.
Now, of course, these actions require leaps and bounds in the world of AI agents. OpenAI’s effort on that front, the GPT store, feels like an overhyped product that’s no longer the company’s focus. But AVM at least addresses the “talking to computers” part of the puzzle. These concepts are a long way off, but after using AVM, they feel a lot closer than they did last week.