YouTube warns OpenAI against using its videos to train models
ChatGPT might not be a popular topic anymore, but over a year after the launch of the chatbot, OpenAI is back in the spotlight. This time, it's with a text-to-video model called Sora. As the introductory video showcases, the model can create lifelike videos, which could easily fool someone.
But all that technological advancement has raised many questions, most of which surround the training data of OpenAI Sora. The Wall Street Journal's recent interview with the CTO, Mira Murati, has even made things worse. Mira couldn't give a concrete answer when asked if the text-to-video model was trained on YouTube videos.
As a matter of fact, the CTO wasn't clear about where Sora's training data was coming from. Following that, in an interview with Bloomberg Originals, Neal Mohan, the CEO of YouTube, warned OpenAI that it's not allowed to use videos from the platform to train models.
Mohan explains, "From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations." The YouTube CEO continues, "One of those expectations is that the terms of service is going to be abided by." OpenAI has yet to respond on the matter.
There is much controversy and uncertainty surrounding how OpenAI trains its models, including DALL-E, ChatGPT, and Sora. The Wall Street Journal has even reported that the company is planning to use transcripts from YouTube videos to train GPT-5. That, again, would go against the terms of service of the platform.
Speaking of which, Google's multimodal AI Gemini also requires similar data. However, as Mohan states, the model is only trained on certain videos, and it depends on the permissions given in the creator's contract.