2 min read

Introducing Voice and Image Features for ChatGPT

Introducing Voice and Image Features for ChatGPT

OpenAI is enhancing ChatGPT with brand-new capabilities, namely voice and image support.

These additions will provide users with more dynamic ways to interact with the chatbot, reducing reliance on traditional text-based prompts.

ChatGPT Voice Functionality

ChatGPT's voice functionality enables users to engage in natural, spoken conversations with the chatbot.

Whether you're on the move, seeking a bedtime story for your family, or settling a friendly dinner table debate, ChatGPT can now respond to your voice commands.

To access this feature, follow these steps:

  • Navigate to the "Settings" menu.
  • Scroll down and select "New Features" within the mobile app.
  • Opt in to voice conversations.
  • Tap the headphone icon located in the top-right corner of the home screen.
  • Choose your preferred voice from a selection of five distinctive voices.

This new voice capability is powered by an advanced text-to-speech model, capable of transforming text inputs into remarkably human-like audio.

OpenAI collaborated with professional voice actors to craft each unique voice.

Additionally, Whisper, an open-source speech recognition system, transcribes your spoken words into text, facilitating seamless voice interactions.

Listening to Voice Samples From OpenAI

OpenAI has provided voice samples to showcase the naturalness and expressiveness of the new voice capability:

Listen to voice samples from OpenAI

Image Functionality for ChatGPT

ChatGPT's image feature enables users to communicate queries by presenting one or more images.

You can even use the drawing tool within the mobile app to highlight specific details within an image.

To get started with image prompts, follow these steps:

  • Tap the camera button to capture or select an image (iOS and Android users, tap the plus button first).
  • Engage in discussions involving multiple images or use the drawing tool to guide your assistant.

Much like Google Lens, ChatGPT's image functionality interprets the content of images and provides relevant responses.

Using Images in ChatGPT

To further clarify your intent, you can speak or type questions alongside your uploaded images.

If ChatGPT's initial response doesn't meet your expectations, you have the option to engage in a conversation with the chatbot to refine the answer, similar to Google's multimodal search approach.

Rollout of ChatGPT Voice and Image Features

These voice and image capabilities offer new opportunities.

You can use them to:

  • Engage in live conversations about landmarks by sharing pictures while traveling.
  • Simplify meal planning by snapping photos of your fridge and pantry, asking follow-up questions, and even receiving step-by-step recipe guidance.
  • Assist your child with math problems by taking a photo of the math question, circling the problem set, and having ChatGPT provide hints.

Who Gets ChatGPT's Voice and Image Functions?

OpenAI plans to roll out these features to Plus and Enterprise users over the next two weeks. Voice functionality will be available on both iOS and Android platforms, with image support accessible on all platforms.

ChatGPT - Continuously Evolving

With these enhancements, ChatGPT becomes a more versatile tool for users, opening up possibilities for various applications in daily life and business.

How to Block OpenAI's New AI Training Web Crawler

How to Block OpenAI's New AI Training Web Crawler

OpenAI, the creator of ChatGPT and other AI systems, has introduced a new web crawler named GPTBot, which is used to train AI models like GPT-5....

Read More
6 Tips: Chatbots for Customer Service

6 Tips: Chatbots for Customer Service

Chatbots are becoming indispensable sales tools for engaging customers and driving revenue. Here are six ways to effectively deploy conversational AI...

Read More
Dialog Flows: How to Write Chatbot Conversations

Dialog Flows: How to Write Chatbot Conversations

The true magic of chatbots lies in crafting well-structured and natural-sounding conversations – known as dialog flows – that guide users seamlessly...

Read More