OpenAI adds three voice intelligence models to its API
On May 7, 2026, OpenAI added three real-time voice models to its API. The release is notable not because OpenAI now has voice capabilities — those have existed for some time — but because of what the new models can do within a conversation.
The three additions are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
GPT-Realtime-2 is described as running on “GPT-5-class reasoning,” which positions it significantly above previous voice models in terms of handling complex requests within a conversation. OpenAI says the goal is to move real-time audio “from simple call-and-response toward voice interfaces that can actually do work” — meaning the voice layer becomes capable of the kind of reasoning previously reserved for text-based interactions.
GPT-Realtime-Translate handles real-time translation across more than 70 input languages with 13 output languages supported.
GPT-Realtime-Whisper provides live speech-to-text transcription.
The billing structures differ: GPT-Realtime-2 is billed per token, while the translation and transcription models are billed per minute. That distinction has practical implications for product planning — a voice assistant handling complex queries will have different unit economics than a transcription service handling high-volume short calls.
For product managers, the relevant signal is that a voice product previously requiring significant engineering work to reach “assistant quality” now has a lower-cost path via API. The combination of high-reasoning voice and live transcription also opens use cases in education, customer support, and live events where the interaction cannot be reduced to a simple question-and-answer exchange.
OpenAI built in automated conversation monitoring to detect policy violations, relevant for teams building in regulated industries or consumer-facing contexts where misuse risk is a product consideration.