OpenAI launches new voice intelligence features in its API

OpenAI has released several new voice intelligence features for its API, including a vocal simulation model that can converse with users. The company's new GPT-Realtime-2 uses GPT-5-class reasoning to handle complex requests from users, unlike its predecessor which had more limited capabilities.

The new features also include GPT-Realtime-Translate, which provides real-time translation services in over 70 input languages and 13 output languages. Additionally, OpenAI has launched a transcription capability called GPT-Realtime-Whisper, allowing for live speech-to-text capabilities as interactions occur.

OpenAI says its new voice models will be useful for companies looking to expand their customer service capabilities, but also notes that they can assist in areas such as education and media. To prevent misuse of the features, OpenAI has built guardrails into the system to stop conversations that violate its harmful content guidelines.

The new voice models are included in OpenAI's Realtime API, with GPT-Realtime-2 billed by token consumption and Translate and Whisper billed by the minute.