OpenAI introduces voice and image prompts to ChatGPT

ChatGPT’s introduction of voice and image features has earned mixed reactions online.

Screens displaying the logos of OpenAI and ChatGPT.
OpenAI has addressed some risks associated with the new voice and image features in ChatGPT [File: Lionel Bonaventure/AFP]

OpenAI is bringing audio and image capabilities to ChatGPT.

The platform, which has long been limited to written prompts, will be adding the new features over the next two weeks to paid versions of the app, OpenAI announced in a blog post on Monday.

Everyone else will be receiving the features “soon after”.

What can you do with ChatGPT’s update?

Users can have voice conversations with the chatbot, bringing it closer to popular AI assistants such as Apple’s Siri and Amazon’s Alexa.

ChatGPT’s new voice feature can also narrate bedtime stories, settle debates at the dinner table and speak out loud text input from users.

The technology behind it is being used by Spotify for the platform’s podcasters to translate their content into different languages, OpenAI said.

Users can also upload one or multiple images to the interface, and use the drawing tool to highlight specific parts of the image.

The vision feature can be used to “troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal, or analyze a complex graph for work-related data”.

How have people responded?

OpenAI’s announcement has invited a range of reactions on X, formerly Twitter. While some users have celebrated the new update, others have raised concerns.

In a conversation with WIRED, Trevor Darrell, professor at UC Berkeley and a co-founder of Prompt AI, said that the fear of AI becoming too human-like is described as the “uncanny valley gap”.

While the added functions might make the chatbot feel more natural, some research suggests that complex interfaces that fail to mimic human interaction can feel strange to use, which might make the product harder to use.

Users are raising concerns about the recent lawsuits against OpenAI’s violation of copyright laws and infringement of intellectual property rights, advising others to not use ChatGPT.

Others have also brought up how the updates might replace smaller AI startups, software engineers, and even educators in the future.

AI-generated voices have also raised the threat of deepfakes, voice scams and identity theft.

The malicious use of AI voice generators is on the rise, where AI mimics the voice of a real person and calls their relatives for money. A McAfee report suggests that 77 percent of people targeted by an AI voice scam lost money as a result.

Additionally, the addition of voice recognition might make the feature less accessible to people who do not speak with mainstream accents, said Joel Fischer, who studies human-computer interaction at the University of Nottingham in the UK.

Since the image function allows the AI to recognise images, users are concerned that the bot might be able to bypass image verification CAPTCHA tests on websites.

These tests that require users to prove that they are not bots by transcribing distorted text and recognising images are designed to limit access.

A recent study, that has yet to be peer reviewed, shows that AI bots can solve CAPTCHA tests faster and more accurately than humans.

Has ChatGPT acknowledged these risks?

OpenAI has acknowledged that the voice feature in the new update holds the potential for malicious actors to commit fraud and impersonation. To avoid this, the company said it is “using this technology to power a specific use case”.

This happens to be voice chat created with voice actors the company directly worked with.

The company has also acknowledged the limitations of using images in AI, including image hallucinations where the AI generates false information about the image.

To counter this, OpenAI has taken technical measures to limit ChatGPT’s ability to analyse and make direct statements about people.

Source: Al Jazeera and news agencies

Advertisement