ChatGPT’s New Capabilities: Seeing, Hearing, and Speaking

Introduction:
OpenAI has recently unveiled an exciting enhancement to their ChatGPT model – it can now not only process text but also engage with voice and images. This breakthrough promises a richer and more immersive interaction experience for users.

What’s New with ChatGPT:

Voice and Image Interaction:
ChatGPT has expanded its capabilities to offer a more intuitive type of interface. Users can now have voice conversations or visually show ChatGPT the subject matter, making it a more holistic virtual assistant.
Real-Life Applications:
Travel: Snap a photo of a landmark and have a live conversation about its significance.
Cooking: Capture the contents of your fridge and pantry to brainstorm dinner ideas.
Education: Assist with math problems by taking a photo, highlighting the issue, and getting hints or solutions.
Availability:
These features will be rolling out to Plus and Enterprise users over the next two weeks, with voice available on iOS and Android platforms and image capabilities accessible across all platforms.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023

Deep Dive into Features:

Voice Conversations:
Engage in dynamic conversations with ChatGPT.
Setup: Navigate to Settings → New Features on the mobile app and activate voice conversations.
Unique Voices: Choose from five different voice options.
Behind the Scenes: The voice feature is powered by a new text-to-speech model, producing lifelike audio from minimal sample speech. Whisper, OpenAI’s open-source speech recognition system, assists in converting spoken words to text.
Image Conversations:
Share one or more images with ChatGPT for analysis or discussion.
Setup: Tap the photo button to capture or select an image.
Image Processing: Powered by multimodal versions of GPT-3.5 and GPT-4, ChatGPT can understand a wide range of images, from photographs to complex documents.

Safety and Limitations:

OpenAI is committed to ensuring that its tools are both powerful and safe. While the new voice technology offers numerous creative possibilities, it also presents potential challenges, like impersonation risks. Similarly, vision-based models introduce challenges such as the model’s interpretation of images in crucial domains. OpenAI is implementing technical measures to enhance safety while retaining the tool’s usefulness.

Future Rollout:
OpenAI plans to expand access to these new features to other user groups, including developers, in the near future.

Conclusion:

OpenAI’s ChatGPT is evolving rapidly, with the integration of voice and image capabilities marking a significant step forward. As users get to experience these features, it promises to redefine the boundaries of human-computer interaction.

Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.

Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023

ChatGPT’s New Capabilities: Seeing, Hearing, and Speaking

Editors’ Choice

Reactions

Reactions

Nobody liked ?