Here we go again? OpenAI’s CTO claims to not know what data Sora has been trained with

Every time a technology company launches a new artificial intelligence, the first question that arises is “where do the training data come from?”. AI models are trained using large datasets, which help the model learn to recognize patterns, make predictions, or understand language.

ChatGPT DOWNLOAD

And it is not few the AI that have been trained with data obtained illicitly or at least dubiously, including the popular ChatGPT from the company OpenAI. For this same reason, it is at least surprising that the CTO of this company, Mira Murati, is not clear about the source of the data used to train Sora, the new AI from the company capable of generating videos.

During an interview with The Wall Street Journal published on March 13th, Murati offered vague answers when asked about the source of data for OpenAI’s Sora model, which is capable of generating videos from text instructions. “We use publicly available data and licensed data,” Murati responded regarding how the company is training its upcoming model.

Joanna Stern, a journalist from WSJ, then asked if Sora had been trained with data from platforms like YouTube, Instagram or Facebook, to which Murati replied: “I’m not sure about that”, adding: “You know, if they were available to the public – available to the public to use. But I’m not sure. I’m not sure about it”.

Before moving on to another topic, Stern mentioned OpenAI’s partnership with the stock image company Shutterstock, asking if their data could be used to train Sora. “I’m not going to go into details about the data that was used. But they were public or licensed data,” Murati added. Later, the executive confirmed to the WSJ that indeed, Shutterstock data was used to train Sora.

ChatGPT DOWNLOAD

Author: Pedro Domínguez

{ "de-DE": "", "en-US": "Publicist and audiovisual producer in love with social networks. I spend more time thinking about which videogames I will play than playing them.", "es-ES": "Publicista y productor audiovisual enamorado de las redes sociales. Paso más tiempo pensando a qué videojuegos voy a jugar que jugándolos.", "fr-FR": "Publicitaire et producteur audiovisuel passionné par les réseaux sociaux. Je passe plus de temps à penser aux jeux vidéo auxquels je jouerai qu'à y jouer.", "it-IT": "", "ja-JP": "", "nl-NL": "", "pl-PL": "", "pt-BR": "", "social": { "email": "", "facebook": "", "twitter": "", "linkedin": "" } }