ChatGPT now has voice and image capabilities, revolutionizing user interactions

Share Us

584
ChatGPT now has voice and image capabilities, revolutionizing user interactions
26 Sep 2023
5 min read

News Synopsis

OpenAI continues to push the boundaries of AI technology with the introduction of groundbreaking voice and image capabilities in ChatGPT. These enhancements are poised to revolutionize user interactions with the AI model, ushering in a more intuitive and immersive experience.

Voice Conversations with ChatGPT:

A standout feature of this update is the ability to engage in voice conversations with ChatGPT, enabling real-time, back-and-forth dialogues with the AI assistant. This enhancement offers a world of possibilities, whether you're seeking on-the-go assistance, a bedtime story for your family, or resolution for a dinner table debate.

To begin using voice, navigate to the mobile app's Settings menu, select "New Features," and opt into voice conversations. Once activated, tap the headphone icon on the top-right corner of the home screen to choose from five distinct voices, crafted by professional voice actors for a human-like audio experience.

Whisper, OpenAI's open-source speech recognition system, further enhances conversation quality by transcribing spoken words into text.

Image Interaction with ChatGPT:

Another game-changing feature is the ability to share images with ChatGPT. Users can now present one or more images to ChatGPT for troubleshooting, content exploration, or complex data analysis. Whether you're trying to diagnose a malfunctioning grill, plan a meal based on fridge contents, or decipher a data graph for work,

ChatGPT is equipped to assist. To utilize this feature, tap the photo button to capture or select an image. On iOS or Android, begin by tapping the plus button to add multiple images or employ the drawing tool for guidance.

These image capabilities leverage multimodal models, including GPT-3.5 and GPT-4, applying language reasoning skills to various visual content like photos, screenshots, and documents containing text and images.

Gradual Deployment for Safety and Responsiveness:

The rollout of voice and image capabilities is a gradual process, initially available to Plus and Enterprise users over the next two weeks. Voice functionality spans iOS and Android platforms and can be enabled via settings, while image capabilities will be accessible across all platforms. OpenAI prioritizes safety considerations for these advanced features.

Voice chat, for instance, has been developed in collaboration with voice actors to ensure authenticity and safety. Notably, Spotify employs this technology for its Voice Translation feature, expanding podcast accessibility by translating content into multiple languages using podcasters' voices.

In the realm of image input, OpenAI has implemented measures to restrict ChatGPT from making direct statements about individuals, safeguarding privacy. Continuous real-world usage and user feedback will be instrumental in further refining these safeguards while preserving the tool's utility.

Additional information

ChatGPT's new voice and image capabilities have the potential to revolutionize how people interact with AI. For example, students could use ChatGPT to have voice conversations about complex concepts or to analyze images of historical artifacts.

Professionals could use ChatGPT to troubleshoot technical problems, collaborate on projects, or brainstorm ideas. And anyone could use ChatGPT to explore new content, learn new things, or simply have a more engaging conversation with their AI assistant.

OpenAI's gradual rollout and safety measures demonstrate the company's commitment to responsible AI development. By carefully considering the potential risks and benefits of these new capabilities, OpenAI is helping to ensure that ChatGPT remains a safe and reliable tool for users of all ages and backgrounds.

TWN Special