In my recent post – Postscript to my AI Series – Why Not Use the DMCA? – I discussed early developments in two cases pending against OpenAI in the Federal District Court for the Southern District of New York (SNDY). Both cases focus on the claim that in the process of training its AI models OpenAI illegally removed “copyright management information.”
What Is Copyright Management Information?
I’ve discovered that many people – including many lawyers – are unfamiliar with an obscure section of the Digital Millennium Copyright Act (DMCA) that addresses “copyright management information, or “CMI.”
CMI includes copyright notices, information identifying the author, and details about the terms of use or rights associated with the work. It can be visible directly on the work, or be metadata.
Section 1202(b)(1) of the DMCA is a “double scienter” law, requiring that a plaintiff prove that (1) CMI was intentionally removed from a copyrighted work, and (2) that the alleged infringer knew or had reasonable grounds to know that the removal of CMI would “induce, enable, facilitate, or conceal” copyright infringement.
Here is an example of how this law might work.
Assume that I have copied a work and that I have a legitimate fair use defense. However, assume further that I duplicated the work, removed the copyright notice and published the work without it. I have a fair use defense as to duplication and distribution, but could I still be liable for CMI removal?
The answer is yes. Violation of the DMCA is independent of my fair use defense. And, the penalty is not trivial. Liability for CMI removal can result in statutory damages ranging from $2,500 to $25,000 per violation, as well as attorneys’ fees and injunctive relief. Moreover, unlike infringement actions, a claim for CMI removal does not require prior registration of the copyright.
All of this adds up to a powerful tool for copyright plaintiffs, a fact that has not been lost on plaintiffs’ counsel in AI litigation.
Raw Story Media v. OpenAI
In the first of the two SDNY cases, Raw Story Media v. OpenAI, federal district court judge Colleen McMahon dismissed Raw Story’s claim that when training ChatGPT OpenAI had illegally removed CMI.
At the heart of Judge McMahon’s decision was her reasoning on Article III standing. Judge McMahon found that the plaintiffs failed to provide sufficient evidence to demonstrate that their content had been used to train ChatGPT or that, even if it had, such use would harm Raw Story since “the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote” based on “the quantity of information contained in the repository.”
The Intercept Media v. OpenAI
In the second case, The Intercept Media v. OpenAI, The Intercept made the same allegation – it asserted that OpenAI had intentionally removed CMI from its articles used in training the ChatGPT AI model.
However, in The Intercept Media Judge Jed Rakoff came to the opposite conclusion. In November 2024 he issued a brief order declining to dismiss plaintiff’s CMI claim and stated that an opinion explaining his rationale would be forthcoming.
That opinion was issued on February 20, 2025.
As I noted in my earlier post, Section 1202(b)(1) of the DMCA prohibits the intentional removal of CMI. In this new ruling Judge Rakoff emphasized that The Intercept must prove that (1) OpenAI had intentionally removed CMI and (2) that it knew or had reason to know that the removal of CMI could induce, enable, or conceal copyright infringement.
At this stage of the case (before discovery or trial) the judge found that The Intercept met the “double scienter” standard. As to the first part of the test, The Intercept alleged that the algorithm that OpenAI uses to build its AI training data sets can only capture an article’s main text, which excludes CMI. This satisfied the intentional removal element.
As to the second component of the standard, the court was persuaded by The Intercept’s theory of “downstream infringement,” which argues that OpenAI’s model might enable users to generate content based on The Intercept’s copyrighted works without proper attribution.
Importantly, the district court held that a copyright injury “does not require publication to a third party,” finding unpersuasive OpenAI’s argument that the Intercept failed to demonstrate a concrete injury because it had not conclusively established that users had actually accessed The Intercept’s articles via ChatGPT.
Curiously, Judge Rakoff’s decision failed to mention the earlier ruling in Raw Story Media, Inc. v. OpenAI, where Judge McMahon held, on similar facts, that the plaintiffs lacked standing to assert removal of CMI claims. Both cases were decided by SDNY district court judges. However, unlike the ruling in Raw Story Media Judge Rakoff concluded that The Intercept’s alleged injury was closely related to the property-based harms typically protected under copyright law, satisfying the Article III standing requirement.
Thus, while Raw Story’s CMI claims against OpenAI have been dismissed, The Intercept’s CMI removal case against OpenAI will proceed.