{"id":275693,"date":"2024-03-18T08:02:17","date_gmt":"2024-03-18T15:02:17","guid":{"rendered":"https:\/\/sftarticles.wpenginepowered.com\/es\/?p=328308"},"modified":"2025-07-01T16:55:06","modified_gmt":"2025-07-01T23:55:06","slug":"shall-we-go-back-to-square-one-the-cto-of-openai-claims-to-not-know-what-data-sora-has-been-trained-with","status":"publish","type":"post","link":"https:\/\/cms-articles.softonic.io\/en\/shall-we-go-back-to-square-one-the-cto-of-openai-claims-to-not-know-what-data-sora-has-been-trained-with\/","title":{"rendered":"Here we go again? OpenAI&#8217;s CTO claims to not know what data Sora has been trained with"},"content":{"rendered":"\n<p>Every time a technology company launches a new artificial intelligence, the first question that arises is &#8220;where do the training data come from?&#8221;. AI models <strong>are trained using large datasets<\/strong>, which help the model learn to recognize patterns, make predictions, or understand language.<\/p>\n\n\n<div class=\"sc-card-program\">\r\n  <div class=\"sc-card-program__body\">\r\n    <div class=\"sc-card-program__row clearfix\">\r\n      <div class=\"sc-card-program__col-logo\">\r\n        <img decoding=\"async\" class=\"sc-card-program__img\" alt=\"ChatGPT\" src=\"https:\/\/images.sftcdn.net\/images\/t_app-icon-s\/p\/47ef1772-2a82-4750-b97a-354b13dbd112\/3647786732\/chatgpt-ChatGPT-icon.png\" width=\"100px\" height=\"100px\">\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-title\">\r\n        <span class=\"sc-card-program__title\">ChatGPT<\/span>\r\n        <a class=\"sc-card-program__button sc-card-program-internal\" href=\"https:\/\/chatgpt.en.softonic.com\/iphone\" target=\"_self\" rel=\"noopener noreferrer\">DOWNLOAD<\/a>\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-rating\">\r\n        <svg class=\"rating-score__content\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" version=\"1.1\" x=\"0\" y=\"0\" viewbox=\"0 0 50 50\" enable-background=\"new 0 0 50 50\" xml:space=\"preserve\"><path class=\"rating-score__background rating-score--good\" fill=\"none\" stroke-width=\"6\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><path class=\"rating-score__value rating-score__value--0\" fill=\"none\" stroke-width=\"6\" stroke-dashoffset=\"0\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><text class=\"rating-score__number\" content=\"\" text-anchor=\"middle\" transform=\"matrix(1 0 0 1 25 31.0837)\" data-auto=\"app-user-score\"><\/text><\/svg>\r\n      <\/div>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <span class=\"sc-card-program__description\"><\/span>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <img decoding=\"async\" class=\"sc-card-program__bigpic\" src=\"\" onerror=\"this.style.display='none'\">\r\n    <\/div>\r\n    <a class=\"sc-card-program__link track-link sc-card-program-internal\" href=\"https:\/\/chatgpt.en.softonic.com\/iphone\" target=\"_self\" rel=\"noopener noreferrer\"><\/a>\r\n  <\/div>\r\n<\/div>\n\n\n\n<p>And it is not few the AI that have been trained with data obtained illicitly or at least dubiously, <a href=\"https:\/\/en.softonic.com\/articles\/openai-and-microsoft-sued-by-the-new-york-times\" target=\"_blank\" rel=\"noopener\" title=\"\">including the popular <strong>ChatGPT<\/strong><\/a> from the company <strong>OpenAI<\/strong>. For this same reason, it is at least surprising that the CTO of this company, <strong>Mira Murati<\/strong>, is not clear about the source of the data used to train <a href=\"https:\/\/en.softonic.com\/articles\/sora-arrives-the-new-ai-video-generator-from-the-creators-of-chatgpt\" target=\"_blank\" rel=\"noopener\" title=\"\"><strong>Sora<\/strong>, the new AI from the company capable of generating videos<\/a>.<\/p>\n\n\n\n<p>During an interview with <strong><a href=\"https:\/\/www.wsj.com\/tech\/personal-tech\/openai-cto-sora-generative-video-interview-b66320bb\" target=\"_blank\" rel=\"noopener nofollow\" title=\"\">The Wall Street Journal<\/a><\/strong> published on March 13th, Murati offered <strong>vague answers<\/strong> when asked about the source of data for OpenAI&#8217;s Sora model, which is capable of generating videos from text instructions. &#8220;We use publicly available data and licensed data,&#8221; Murati responded regarding how the company is training its upcoming model.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/articles-img.sftcdn.net\/sft\/articles\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1-1024x683.jpg\" alt=\"\" class=\"wp-image-275697\" srcset=\"https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1-1024x683.jpg 1024w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1-300x200.jpg 300w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1-768x512.jpg 768w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1-150x100.jpg 150w, https:\/\/articles-img.sftcdn.net\/auto-mapping-folder\/sites\/3\/2024\/03\/Mira-Murati-OpenAI-180324-1024x683-1.jpg 1079w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p><strong>Joanna Stern<\/strong>, a journalist from WSJ, then asked if <a href=\"https:\/\/en.softonic.com\/articles\/will-sora-the-new-ai-from-openai-be-safe-this-is-what-the-company-says\" target=\"_blank\" rel=\"noopener\" title=\"\">Sora<\/a> had been trained with data from platforms like <strong>YouTube, Instagram or Facebook<\/strong>, to which Murati replied: <strong>&#8220;I&#8217;m not sure about that&#8221;<\/strong>, adding: &#8220;You know, if they were available to the public &#8211; available to the public to use. But I&#8217;m not sure. I&#8217;m not sure about it&#8221;.<\/p>\n\n\n\n<p>Before moving on to another topic, Stern mentioned OpenAI&#8217;s partnership with the stock image company <strong>Shutterstock<\/strong>, asking if their data could be used to train Sora. &#8220;I&#8217;m not going to go into details about the data that was used. But they were public or licensed data,&#8221; Murati added. Later, <strong>the executive confirmed to the WSJ that indeed, Shutterstock data was used to train Sora<\/strong>.<\/p>\n\n\n<div class=\"sc-card-program\">\r\n  <div class=\"sc-card-program__body\">\r\n    <div class=\"sc-card-program__row clearfix\">\r\n      <div class=\"sc-card-program__col-logo\">\r\n        <img decoding=\"async\" class=\"sc-card-program__img\" alt=\"ChatGPT\" src=\"https:\/\/images.sftcdn.net\/images\/t_app-icon-s\/p\/47ef1772-2a82-4750-b97a-354b13dbd112\/3647786732\/chatgpt-ChatGPT-icon.png\" width=\"100px\" height=\"100px\">\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-title\">\r\n        <span class=\"sc-card-program__title\">ChatGPT<\/span>\r\n        <a class=\"sc-card-program__button sc-card-program-internal\" href=\"https:\/\/chatgpt.en.softonic.com\/iphone\" target=\"_self\" rel=\"noopener noreferrer\">DOWNLOAD<\/a>\r\n      <\/div>\r\n      <div class=\"sc-card-program__col-rating\">\r\n        <svg class=\"rating-score__content\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" version=\"1.1\" x=\"0\" y=\"0\" viewbox=\"0 0 50 50\" enable-background=\"new 0 0 50 50\" xml:space=\"preserve\"><path class=\"rating-score__background rating-score--good\" fill=\"none\" stroke-width=\"6\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><path class=\"rating-score__value rating-score__value--0\" fill=\"none\" stroke-width=\"6\" stroke-dashoffset=\"0\" stroke-miterlimit=\"10\" d=\"M40 40c8.3-8.3 8.3-21.7 0-30s-21.7-8.3-30 0 -8.3 21.7 0 30\"><\/path><text class=\"rating-score__number\" content=\"\" text-anchor=\"middle\" transform=\"matrix(1 0 0 1 25 31.0837)\" data-auto=\"app-user-score\"><\/text><\/svg>\r\n      <\/div>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <span class=\"sc-card-program__description\"><\/span>\r\n    <\/div>\r\n    <div class=\"sc-card-program__row\">\r\n      <img decoding=\"async\" class=\"sc-card-program__bigpic\" src=\"\" onerror=\"this.style.display='none'\">\r\n    <\/div>\r\n    <a class=\"sc-card-program__link track-link sc-card-program-internal\" href=\"https:\/\/chatgpt.en.softonic.com\/iphone\" target=\"_self\" rel=\"noopener noreferrer\"><\/a>\r\n  <\/div>\r\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Every time a technology company launches a new artificial intelligence, the first question that arises is &#8220;where do the training data come from?&#8221;. AI models are trained using large datasets, which help the model learn to recognize patterns, make predictions, or understand language. And it is not few the AI that have been trained with &hellip; <a href=\"https:\/\/cms-articles.softonic.io\/en\/shall-we-go-back-to-square-one-the-cto-of-openai-claims-to-not-know-what-data-sora-has-been-trained-with\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Here we go again? OpenAI&#8217;s CTO claims to not know what data Sora has been trained with&#8221;<\/span><\/a><\/p>\n","protected":false},"author":9256,"featured_media":275695,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","wpcf-pageviews":1},"categories":[1015],"tags":[],"usertag":[],"vertical":[],"content-category":[],"class_list":["post-275693","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/275693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/users\/9256"}],"replies":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/comments?post=275693"}],"version-history":[{"count":1,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/275693\/revisions"}],"predecessor-version":[{"id":314361,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/275693\/revisions\/314361"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media\/275695"}],"wp:attachment":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media?parent=275693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/categories?post=275693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/tags?post=275693"},{"taxonomy":"usertag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/usertag?post=275693"},{"taxonomy":"vertical","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/vertical?post=275693"},{"taxonomy":"content-category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/content-category?post=275693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}