Could OpenAI’s Sora text-to-video generator kill off jobs in Hollywood?

Some say it’s ‘game over’ for creative professionals in the film industry.

OpenAI's latest AI tool which will generate video sequences from text prompts is expected to create a storm in the film industry [File:Dado Ruvic/Illustration/Reuters]

Artificial intelligence startup OpenAI has been teasing its new AI video generator, Sora, on social media in recent weeks. Last week, it revealed that it had also given actors and directors in Hollywood a first look at the technology – and a chance to try it out – before Sora is launched publicly.

OpenAI published a blog post on March 24 titled Sora’s First Impressions, showcasing the work that several creative studios and directors had produced using the video generator.

Some media experts speculate that Sora will be extremely disruptive to the film creative industry.

Al Jazeera spoke to one executive who works in Hollywood, who asked us not to reveal his identity due to the sensitive nature of the subject. When asked what his initial reaction was when he saw Sora’s capability for the first time, he said: “My reaction to Sora was just like everyone else’s – my jaw hit the floor. It was like we were seeing our murderer but it was beautiful at the same time. Just immediately impressive and terrifying.”

The tremors caused by Sora have already been felt by some in the industry.

In an interview with The Hollywood Reporter in February, actor, filmmaker and studio owner Tyler Perry stated he would put his $800m studio expansion in Atlanta on hold after seeing Sora’s video-generating capability.

He added: “So I am very, very concerned that in the near future, a lot of jobs are going to be lost. I really, really feel that very strongly.”

What is Sora?

Sora is OpenAI’s text-to-video generative AI model. Similar to ChatGPT, you enter a text prompt but instead of generating answers to questions or prompts in text form, Sora will generate videos up to one minute long.

A video example of Sora’s capability, which was released by Open AI, can be seen below:

  • Example prompt: “A movie trailer featuring the adventures of the 30-year-old spaceman wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colours.”

Sam Altman, CEO of OpenAI, also posted several examples to his X account, including the below:

  • Example prompt: “An instructional cooking session for homemade gnocchi hosted by a grandmother social media influencer set in a rustic Tuscan country kitchen with cinematic lighting.”

Sora is far from perfect. If you look closely at the “instructional cooking session” video, the spoon in the right hand disappears after the “grandmother” stops mixing. Although hyper-realistic, the ability to spot fakes is still present in some of the videos that Sora produces.

This raised another question of how well a product like this would work in the industry.

Our Hollywood insider stated the following, “We will come out on the other side bigger and better because humans will have figured out their place atop a technology that is clearly more powerful than we can currently imagine. But the desire to act, write, direct, compose, collaborate, etc is deeply innate in humans. It’s not going anywhere. So is it bad for the humans in the industry? The answer is no, then yes, then no. Is it good for the industry itself? Yes.

How does Sora work?

As with ChatGPT, users type in a text command, question or prompt and the AI responds – in the case of Sora, with generated video sequences.

To do this, Sora uses a combination of machine learning and natural language processing (NLP) to generate a video sequence. NLP is a form of artificial intelligence that understands the interaction between computers and human language. Machine learning allows Sora to get better over time while improving its responses through patterns and feedback.

Sora uses “computer vision” to understand and interpret visual information from images or videos. Computer vision is a software framework which tells Sora to “recognise” visual representations of real-world objects, people and environments from text descriptions which include visual language. For example, the prompts “cat moving” or “waves crashing in an ocean” indicate certain attributes and characteristics. Sora needs this visual language to interpret the text prompt, and then accurately present a visual depiction of an object.

Sora can harvest incomplete or partial data and transform it into comprehensible video content that looks very real. Sora works like a super-powered zoom tool. It starts with large, blurry blocks of colour or objects and then refines them into smaller, more defined shapes based on your prompt.

What does Sora mean for creative jobs in the film industry?

It is still unclear which if any, tasks normally undertaken by human creators could be taken over by Sora. The AI’s ability to replicate camera shots, lighting and characters on the fly makes for unchartered territory for directors and filmmakers. However, film professionals expect it to shake up the industry considerably.

One Hollywood insider who spoke to Al Jazeera on condition of anonymity, said: “I don’t see it as a threat to production so much as a threat to the way production is done as we currently know it. We’ve seen events like this in the past, particularly in post-production when folks started editing on laptops instead of the big expensive post houses. Lots of people got wiped out in that transition while others could suddenly afford a proper editor without the overhead a post house demands.”

When asked whose jobs could be replaced by AI generators, he added: “Maybe asking ‘who will get replaced’ is the wrong question. I think it’s the system that will get chipped away at and replaced. In a couple of years, maybe the term ‘director’ will refer to the guy who prompts the AI, and the rest is done completely digitally. And if that approach is accepted by audiences, and makes money, and makes people feel human emotion – then it’s game over for most of us.”

Sora pulls content from already existing images and videos, then recreates a video based on the user’s prompt. Who exactly owns that regenerated video? Should a fee be paid to each of the photo and video creators and characters whose work Sora draws upon to create the final video? These are questions which have yet to be fully answered. 

At the root of many of the above questions is how to track the originator of any content generated, including the individuals who have been included in the final video.

Speaking on his YouTube channel, technology lawyer Paul Haswell, explained: “If someone’s just using an AI model and then it inadvertently somehow sucks in some data that then ends up looking like you, what are your rights – is your personal data actually being misused? How can you prove that your data was used to create that likeness?”

He added: “Suddenly you find yourself an actor in a completely AI-generated soap. You may be world famous, yet get no credit for it. You might have a squeaky voice rather than a deep voice but your face would be the same. For example, you would have no credit because you’ve essentially been used by AI, hoovered up and regurgitated into another format.”

There are also international considerations as copyright law is different depending on the country. If the video originated in one country and is distributed in another, whose copyright law applies?

On his blog, Wallace Collins, an entertainment lawyer who specialises in copyright and trademark law, warned that Sora would expand all these problems “exponentially” and could even lead to social unrest or other forms of social disruption.

“AI has already disrupted copyright law for creators, particularly in the music space, and has challenged established copyright and intellectual property norms in the entertainment world. Without some type of common sense regulations in place, Sora could be used by the most vile of individuals to create videos that could defile, mislead and scare people, or even instigate riots based on the appearance of something that is completely fabricated but entirely realistic in appearance.”

How will these issues be decided?

A significant portion of the legal discussion surrounding generative AI revolves around the issue of who should be considered the author of what these tools generate, as it relates to “fair use”. Fair use copyright laws allow for limited uses of copyrighted materials or for transforming the copyrighted work into a different piece of work.

At present, there is no legal precedent covering the current advances in text-to-video generation. However, in December last year, the New York Times filed a federal lawsuit against OpenAI’s ChatGPT (a text-to-text generation tool) and Microsoft’s Copilot for copyright infringement in the Southern District of New York (the federal district court in Manhattan). The Times alleges that OpenAI’s ChatGPT provides users with the exact same content that the Times has already provided.

Ian Crosby, a lawyer for the Times, said: “Defendants seek to free-ride on the Times’s massive investment in its journalism by using it to build substitutive products without permission or payment. That’s not fair use by any measure.”

In February, OpenAI filed a motion to dismiss the Times’ case in federal court.

Two more copyright infringement cases were filed in the Manhattan court against OpenAI  – one by The Intercept and the other a joint case filed by Raw Story and AlterNet – in February.

Source: Al Jazeera