OpenAI launches three new real-time audio API models

OpenAI's GPT-Realtime-2 brings GPT-5-class reasoning to live voice agents, launching alongside two additional real-time audio models through the OpenAI API.

OpenAI has launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API, now generally available for production voice agents.

Darryl Linington, Published 05/09/2026 🇩🇪 🇪🇸 ...

AI Business Software Laptop Desktop Android Apple

OpenAI has launched three new real-time audio models through its API, pushing voice AI from basic question-and-answer interactions toward agents that can listen, reason, translate, and act within a single live conversation. The release also marks the Realtime API's exit from beta, making it generally available for production use for the first time.

At the center of the release is GPT-Realtime-2, OpenAI's first voice model built on GPT-5-class reasoning. Unlike the step-by-step architecture that most voice systems rely on, GPT-Realtime-2 processes audio in a continuous stream, allowing it to interpret speech as it happens and respond without the gap caused by separate transcription and synthesis stages. The model supports a 128K token context window, up from 32K in the previous version, which makes longer voice sessions and complex multi-step agentic flows practical without external memory scaffolding.

What GPT-Realtime-2 can do

The model is built specifically for what OpenAI calls "agentic behaviour" during voice calls. Preambles let it say "Let me check that" or "One moment" while it executes tool calls, so users are not left with dead air. Parallel tool calls let it run multiple back-end requests simultaneously and narrate which one is in flight. Stronger recovery behavior means it handles failures out loud rather than freezing mid-conversation. Tone adjustment lets it shift between styles based on context: more measured for support calls and more upbeat for confirmations.

GPT-Realtime-2 scores 15.2% higher than GPT-Realtime-1.5 on Big Bench Audio, OpenAI's audio reasoning benchmark, and 13.8% higher on Audio Multichallenger for instruction following. In real-world testing, Zillow reports a 26-point lift in call success rate on its hardest adversarial benchmark, going from 69% to 95% after prompt optimization on GPT-Realtime-2. The model is priced at $32 per million audio input tokens and $64 per million audio output tokens, with $0.40 per million cached input tokens.

GPT-Realtime-Translate and GPT-Realtime-Whisper

The second model, GPT-Realtime-Translate, is a dedicated live speech translation system. It processes spoken input continuously and outputs translations in real time without requiring speakers to pause or finish complete sentences. The model supports more than 70 input languages and 13 output languages, targeting customer support, education, live events, and cross-border sales environments. BolnaAI, a voice AI company building for Indian language markets, reports 12.5% lower word error rates on Hindi, Tamil, and Telugu compared to the previous translation approach. GPT-Realtime-Translate is priced at $0.034 per minute of audio processing.

GPT-Realtime-Whisper is the third model, extending OpenAI's widely adopted Whisper speech recognition technology into a streaming system. Where the original Whisper was built for post-recording transcription, this version produces live captions as speech is being spoken. The use cases include live meetings, courtroom documentation, newsroom transcription, and accessibility tools for hearing-impaired users. It is the most affordable of the three at $0.017 per minute. All three models are available now through the OpenAI API and the developer playground.

The launch also adds MCP server support, image input capabilities, and SIP phone calling integration to the Realtime API, broadening the range of enterprise telephony and agentic workflows developers can build without leaving the API.

The AI tool space has also attracted attackers looking to exploit interest in new products. Notebookcheck reported yesterday on a fake Claude AI website that was pushing the Beagle Windows backdoor through Google-sponsored search results using a trojanized Claude-Pro Relay installer.

Source(s)

OpenAI

⟨

Latest Steam Client update fixes Steam Controller trackpad issues, compatibility problems, and more

Age of Empires IV: Yue Fei's Legacy launches with Jin Dynasty and 8-mission campaign

⟩

Add as a preferred source on Google

Loading Comments

Comment on this article

Darryl Linington - Tech Writer - 340 articles published on Notebookcheck since 2025

I’m a tech editor and journalist with more than 20 years of experience covering smartphones, AI, gaming hardware, and emerging technology. I’m passionate about making complex topics clear, engaging, and relevant—especially when they shape how we live, work, and play. I’m also an author with a love for psychological thrillers, horror, and honest, emotionally driven storytelling. My books include Drowning, 3:33 a.m., The Midnight Murderer, Keystrokes of Vengeance, and Life’s Too Short For This Sh!t!. Whether I’m writing about technology or fiction, my goal is always to connect with readers, spark thought, and leave a lasting impression. Inspired by my daughters and shaped by years of media experience, I bring curiosity and purpose to everything I write.

contact me via: @DarrylLinington, Facebook, DarrylLinington, LinkedIn

> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2026 05 > OpenAI launches three new real-time audio API models

Darryl Linington, 2026-05- 9 (Update: 2026-05-24)

What GPT-Realtime-2 can do

GPT-Realtime-Translate and GPT-Realtime-Whisper

Source(s)

Related Articles