{"id":306933,"date":"2025-06-21T03:40:58","date_gmt":"2025-06-21T10:40:58","guid":{"rendered":"https:\/\/cms-articles.softonic.io\/en\/?p=306933"},"modified":"2025-07-01T14:20:18","modified_gmt":"2025-07-01T21:20:18","slug":"ai-doesnt-copy-but-metas-ai-remembers-42-of-the-first-harry-potter-book-without-missing-a-comma","status":"publish","type":"post","link":"https:\/\/cms-articles.softonic.io\/en\/ai-doesnt-copy-but-metas-ai-remembers-42-of-the-first-harry-potter-book-without-missing-a-comma\/","title":{"rendered":"AI doesn\u2019t copy, but Meta\u2019s AI remembers 42% of the first Harry Potter book without missing a comma"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">A new study has shaken one of the tech industry\u2019s core defenses: that&nbsp;<strong>large language models don\u2019t memorize copyrighted content<\/strong>. Researchers have found that Meta\u2019s LLaMa 3.1 model can reproduce&nbsp;<strong>up to 42% of&nbsp;<em>Harry Potter and the Philosopher\u2019s Stone<\/em><\/strong>, word for word, raising fresh legal and ethical concerns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LLaMa 3.1 shows an unprecedented level of memorization<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The researchers evaluated LLaMa 3.1 by feeding it 100-token sequences and checking if it could&nbsp;<strong>predict the next 50 tokens with over 50% certainty<\/strong>. When successful, this indicates not just pattern recognition, but&nbsp;<strong>almost exact memory recall of the original text<\/strong>. On average, LLaMa 3.1 assigned a&nbsp;<strong>98.5% probability to each correct continuation<\/strong>, suggesting it had internalized a large portion of the book.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Popular books are remembered far more than obscure ones<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This behavior isn\u2019t uniform across all texts. LLaMa 3.1 tends to memorize&nbsp;<strong>very popular titles like&nbsp;<em>The Hobbit<\/em>&nbsp;or&nbsp;<em>1984<\/em><\/strong>, but performs poorly with lesser-known books. For example, it retained just&nbsp;<strong>0.13% of&nbsp;<em>Sandman Slim<\/em><\/strong>, a 2009 novel by Richard Kadrey, who ironically is suing Meta over training practices.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Legal risks are mounting for AI training practices<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The findings could support the argument that&nbsp;<strong>AI models may contain infringing material in their internal weights<\/strong>. The U.S. Copyright Office recently stated that if models reproduce \u201crelevant portions\u201d of protected works, those internal weights might&nbsp;<strong>constitute illegal copies<\/strong>. This undermines tech companies\u2019 claims that memorization is marginal.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Meta faces fallout beyond legal pressure<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Internally, Meta is dealing with major setbacks:&nbsp;<strong>the loss of key engineers, delayed model launches<\/strong>, and a&nbsp;<strong>14 billion USD investment<\/strong>&nbsp;in data sourcing. These revelations add pressure as the company prepares to defend its methods in court.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new study has shaken one of the tech industry\u2019s core defenses: that&nbsp;large language models don\u2019t memorize copyrighted content. Researchers have found that Meta\u2019s LLaMa 3.1 model can reproduce&nbsp;up to 42% of&nbsp;Harry Potter and the Philosopher\u2019s Stone, word for word, raising fresh legal and ethical concerns. LLaMa 3.1 shows an unprecedented level of memorization The &hellip; <a href=\"https:\/\/cms-articles.softonic.io\/en\/ai-doesnt-copy-but-metas-ai-remembers-42-of-the-first-harry-potter-book-without-missing-a-comma\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;AI doesn\u2019t copy, but Meta\u2019s AI remembers 42% of the first Harry Potter book without missing a comma&#8221;<\/span><\/a><\/p>\n","protected":false},"author":9317,"featured_media":306934,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","wpcf-pageviews":0},"categories":[1015],"tags":[],"usertag":[],"vertical":[],"content-category":[6771],"class_list":["post-306933","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","content-category-ai"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/306933","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/users\/9317"}],"replies":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/comments?post=306933"}],"version-history":[{"count":1,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/306933\/revisions"}],"predecessor-version":[{"id":306935,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/posts\/306933\/revisions\/306935"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media\/306934"}],"wp:attachment":[{"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/media?parent=306933"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/categories?post=306933"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/tags?post=306933"},{"taxonomy":"usertag","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/usertag?post=306933"},{"taxonomy":"vertical","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/vertical?post=306933"},{"taxonomy":"content-category","embeddable":true,"href":"https:\/\/cms-articles.softonic.io\/en\/wp-json\/wp\/v2\/content-category?post=306933"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}