Select Page
D.C. Court of Appeals Reaffirms Human Authorship Requirement in Thaler v. Perlmutter

D.C. Court of Appeals Reaffirms Human Authorship Requirement in Thaler v. Perlmutter

In 2019, Stephen Thaler filed an unusual copyright application. Instead of submitting traditional artwork, the piece—titled “A Recent Entrance to Paradise” (image at top)—identified an unusual “creator”: the “Creativity Machine.” The Creativity Machine is an AI system invented by Thaler. In his application for registration, Thaler informed the Copyright Office that the work was “created autonomously by machine,” and he claimed the copyright based on his “ownership of the machine.”

After appealing the Copyright Office denial of registration to the District Court and losing, Thaler appealed to the U.S. Court of Appeals for the District of Columbia.

On March 18, 2025, the D.C. Circuit upheld the Copyright Office as well as the District Court, holding that copyright protection under the 1976 Act cannot be granted to a work generated solely by artificial intelligence.

Notably, this ruling does not exclude AI-assisted works from protection; it merely confirms that a human must exercise genuine creative control. The key question now is how much human input is necessary to qualify as the author—a point the court left open for future clarification.

Here are the key takeaways:

Human Authorship Is Mandatory. The court held that the Copyright Act of 1976 requires an “author” to be a human being. Works generated solely by AI—where AI is listed as the sole creator—do not qualify. Under the Copyright Act “author” means human. A machine, including an AI system, is not a legal creator.

AI-Assisted Works May Still Be Protected. The court underscored that human creators remain free to use AI tools. Such works can be copyrighted, provided a person (not just AI) exercises creative control. This is consistent with the recently released Copyright Office Report on ‘Copyright and Artificial Intelligence (Part 2), which confirms that the use of AI tools to assist human creativity is not a bar for copyright protection of the output as long as there is sufficient human control over the expressive elements.

A Single Piece of American Cheese

In fact, on January 30, 2025, the Copyright Office registered A Single Piece of American Cheese, based on the “selection, coordination, and arrangement of material generated by artificial intelligence”.  (Image at left). See How We Received The First Copyright for a Single Image Created Entirely with AI-Generated Material.

Work-Made-for-Hire Doesn’t Save AI-Only Authorship. Dr. Thaler’s argument that AI could be his “employee” under the work-for-hire doctrine failed because the underlying creation must still have a human author in the first place.

Waived Argument. Dr. Thaler tried to claim he was effectively the author by directing the AI. The court found he had not properly raised this argument at the administrative level and therefore declined to consider it. This might have been his best argument, had he made it.

Policy Questions Left to Congress. While noting that new AI technologies could raise important policy issues, the court emphasized that it is for Congress, not the judiciary, to expand copyright beyond human authors.

Thaler v. Perlmutter (D.C. Cir. Mar. 20, 2025)

 

(For an earlier post on this case see: Court Denies Copyright Protection to AI Generated Artwork.)

A Podcast on Oracle v. Google (courtesy of NotebookLM)

In October 2024 I created (probably not the right word – delivered?) a podcast using NotebookLM: An Experiment: An AI Generated Podcast on Artificial Intelligence and Copyright Law. The podcast that NotebookLM created was quite good, so I thought I’d try another one.

This is in the nature of experimentation, simply to explore this unusual AI tool.

This time the topic is the Oracle v. Google copyright litigation. I thought this would be a good topic to experiment with, since it is a complex topic and there are decisions by federal district court judge William Alsup (link), two Federal Circuit opinions (1, 2), and finally the Supreme Court decision. So, here goes.

Google v. Oracle: Copyright and Fair Use of Software APIs

. . .  (May load a bit slowly – give it time).

SDNY Courts Split Over Copyright Management Information in AI Cases

SDNY Courts Split Over Copyright Management Information in AI Cases

In my recent post—Postscript to my AI Series – Why Not Use the DMCA?—I discussed early developments in two cases pending against OpenAI in the U.S. District Federal District Court for the Southern District of New York (SDNY). Both cases focus on the claim that in the process of training its AI models, OpenAI illegally removed “copyright management information.” And, as I discuss below, they reach different outcomes.

What Is Copyright Management Information?

Many people who are familiar with the Digital Millennium Copyright Act’s (DMCA) “notice and takedown” provisions are unfamiliar with a part of the DMCA that makes it illegal to remove “copyright management information,” or “CMI.”

CMI includes copyright notices, information identifying the author, and details about the terms of use or rights associated with the work. It can be visible directly on the work, or metadata in the underlying code. 

The CMI removal statuteSection 1202(b)(1) of the DMCA—is a “double scienter” law, requiring that a plaintiff prove that (1) CMI was intentionally removed from a copyrighted work, and (2) that the alleged infringer knew or had reasonable grounds to know that the removal of CMI would “induce, enable, facilitate, or conceal” copyright infringement.

Here is an example of how this law might work. 

Assume that I have copied a work and that I have a legitimate fair use defense. However, assume further that I duplicated the work, removed the copyright notice and published the work without it. I have a fair use defense as to duplication and distribution, but could I still be liable for CMI removal?

The answer is yes. A violation of the DMCA is independent of my fair use defense. And, the penalty is not trivial. Liability for CMI removal can result in statutory damages ranging from $2,500 to $25,000 per violation, as well as attorneys’ fees and injunctive relief. Moreover, unlike infringement actions, a claim for CMI removal does  not require prior registration of the copyright.

All of this adds up to a powerful tool for copyright plaintiffs, a fact that has not been lost on plaintiffs’ counsel in AI litigation.  

CMI – Why Don’t AI Companies Want To Include It?

AI companies’ removal of CMI during training stems from both technical necessities and strategic considerations. From a technical perspective, large language model training requires standardized data preparation processes that typically strip metadata, formatting, and peripheral information to create uniform training examples. This preprocessing is fundamental to how neural networks learn from text—they require clean, consistent inputs to identify linguistic patterns effectively.

The computational overhead is also significant. Preserving and processing CMI for billions of training examples would increase storage requirements and computational costs. AI companies argue that this additional information provides minimal benefit to model performance while significantly increasing training complexity.

Content owners, however, contend that these technical justifications mask more strategic motivations. They argue that AI companies deliberately eliminate attribution information to obscure the provenance of training data, making it difficult to detect when copyrighted material has been incorporated into models. This removal, they claim, facilitates a form of “laundering” copyrighted content through AI systems, where original sources become untraceable.

More pointedly, content creators assert that CMI removal directly enables downstream infringement by making it impossible for users to identify when an AI output derives from or reproduces copyrighted works. Without embedded attribution information, neither the AI company nor end users can properly credit or license content that appears in generated outputs.

The technical reality and legal implications of this process sit at the heart of these emerging cases, with courts now being asked to determine whether standard machine learning preprocessing constitutes intentional CMI removal under the DMCA’s “double scienter” standard.

Raw Story Media v. OpenAI

In the first of the two SDNY cases, Raw Story Media v. OpenAI, federal district court judge Colleen McMahon dismissed Raw Story’s claim that when training ChatGPT, OpenAI had illegally removed CMI.  

At the heart of Judge McMahon’s decision was her observation that although OpenAI removed CMI from Raw Story articles, Raw Story was unable to allege that the works from which CMI had been removed had ever been disseminated by ChatGPT to anyone. On these facts, Judge McMahon held that Raw Story lacked standing under the  Article III standing principles established by the Supreme Court in Transunion v. Ramirez (2021). It’s worth noting her observation that “the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote” based on “the quantity of information contained in the [AI model].”

The Intercept Media v. OpenAI

In the second case, The Intercept Media v. OpenAI, The Intercept made the same allegation. It asserted that OpenAI had intentionally removed CMI—in this case authors, copyright notices, terms of use and title information—from its AI training set.  

However, in this case Judge Jed Rakoff came to the opposite conclusion. In November 2024 he issued a bottom-line order declining to dismiss  plaintiff’s CMI claim and stated that an opinion explaining his rationale would be forthcoming.

That opinion was issued on February 20, 2025.  

At this early stage of the case (before discovery or trial) the judge found that The Intercept met the “double scienter” standard. As to the first part of the test, The Intercept alleged that the algorithm that OpenAI uses to build its AI training data sets can only capture an article’s main text, which excludes CMI. This satisfied the intentional removal element.

As to the second component of the standard, the court was persuaded by The Intercept’s theory of “downstream infringement,” which argues that OpenAI’s model might enable users to generate content based on The Intercept’s copyrighted works without proper attribution. And importantly, unlike in Raw Story, The Intercept was able to provide examples of verbatim regurgitation of its content from ChatGPT based on prompts from The Intercept’s data scientist.

The district court held that a copyright injury “does not require publication to a third party,” finding unpersuasive OpenAI’s argument that the Intercept failed to demonstrate a concrete injury because it had not conclusively established that users had actually accessed The Intercept’s articles via ChatGPT.

Curiously, Judge Rakoff’s decision failed to mention the earlier ruling in Raw Story Media, Inc. v. OpenAI, where Judge McMahon held, on similar facts, that the plaintiffs lacked standing to assert removal of CMI claims. Both cases were decided by SDNY district court judges. However, unlike the ruling in Raw Story Media Judge Rakoff concluded that The Intercept’s alleged injury was closely related to the property-based harms typically protected under copyright law, satisfying the Article III standing requirement.  

Thus, while Raw Story’s CMI claims against OpenAI have been dismissed, The Intercept’s CMI removal case against OpenAI will proceed. 

A New Chapter in AI Copyright Litigation: Thomson Reuters v. Ross Intelligence and Its Ripple Effects on GenAI

A New Chapter in AI Copyright Litigation: Thomson Reuters v. Ross Intelligence and Its Ripple Effects on GenAI

The community of copyright AI watchers has been eagerly awaiting the first case to evaluate the legality of using copyright-protected works as training data. We finally have it, and it has a lot of copyright law experts scratching their heads and wondering what it means for the AI industry. 

On February 11, 2025, Third Circuit federal appeals court Judge Stephanos Bibas—sitting by designation in the U.S. District Court for the District of Delaware—issued a decision that is likely to shape the future of AI copyright litigation. By granting partial summary judgment to Thomson Reuters Enterprise Centre GmbH (“Thomson Reuters”) against Ross Intelligence Inc. (“Ross”), the court revisited and reversed its earlier 2023 opinion and rejected Ross’s fair use defense. Although this case involves a non-generative AI application, the reasoning has implications for the more than 30 ongoing AI copyright cases currently being litigated.

Case Overview

The Ross litigation centers on allegations that Ross used copyrighted material from Thomson Reuters’ Westlaw—a leading legal research platform—to train its AI-driven legal research tool. Ross wanted to use the Westlaw headnotes to train its AI model, but Thomson Reuters would not grant Ross a license. Instead, Ross commissioned “Bulk Memos” from a third-party provider. These memos, designed to simulate legal questions and answers, closely mirrored Westlaw headnotes—concise summaries that encapsulate judicial opinions. After determining that 2,243 headnotes were substantially similar to the Westlaw headnotes the court held that this was direct copyright infringement and rejected Ross’s fair use defense.

Breaking Down the Fair Use Analysis

The court evaluated the four statutory fair use factors, with two—“purpose and character” and “market effect”—proving decisive:

1 – Purpose and Character of the Use: The court found that Ross’s use was commercial and aimed at developing a product that directly competes with Westlaw. Despite Ross’s argument that its copying was merely an “intermediate step” in a broader process, the judge rejected the intermediate copying cases (discussed below), emphasizing that “Ross was using Thomson Reuters’s headnotes as AI data to create a legal research tool to compete with Westlaw.” Importantly, the court’s analysis was informed by the framework established in the recent Supreme Court decision in Warhol v. Goldsmith, which stressed that reproduction fails to constitute a transformative use if the copying serves a similar market function as the original. The Warhol precedent underlines that transformation requires a “further purpose or different character” from the original work, a requirement Ross did not meet.

2 – Market Effect: The market effect factor proved even more influential. By positioning itself as a direct substitute for Westlaw, Ross both disrupted the existing market and undercut potential licensing markets for Thomson Reuters’s content (notwithstanding that Thomson refused to license to Ross). The court noted that any harm to this market—“undoubtedly the single most important element of fair use”—weighed decisively against Ross.

While the factors addressing the nature of the copyrighted work and the amount used modestly favored Ross, they were insufficient to overcome the adverse findings regarding the purpose of the use and market harm.

The Court’s 2023 Ruling vs. The Current Ruling

It’s worth noting the struggle the judge went through in deciding the fair use issue in this case. Judges rarely reverse themselves on major rulings, but that’s what happened here. 

As I noted, the judge in this case had issued a 2023 decision on the fair use issue. There, he held that the question of whether Ross’s use of the West headnotes was fair use to be a jury issue. 

In the current decision he reversed himself. 

Here’s what the judge said in 2023:

If Ross’s characterization of its activities is accurate, it translated human language into something understandable by a computer as a step in the process of trying to develop a “wholly new,” albeit competing, product—a search tool that would produce highly relevant quotations from judicial opinions in response to natural language questions. This also means that Ross’s final product would not contain or output infringing material. Under Sega [v. Accolade] and Sony [v. Connectix], this is transformative intermediate copying.

And here is what he said in his 2025 decision:

My prior opinion wrongly concluded that I had to send this factor to a jury. I based that conclusion on Sony and Sega. Since then, I have realized that the intermediate-copying cases [Sony, Sega] (1) are computer-programming copying cases; and (2) depend in part on the need to copy to reach the underlying ideas. Neither is true here. Because of that, this case fits more neatly into the newer framework advanced by Warhol. I thus look to the broad purpose and character of Ross’s use. Ross took the headnotes to make it easier to develop a competing legal research tool. So Ross’s use is not transformative. Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.

This was a major change in direction, and it reflects the challenge the judge perceived in applying copyright fair use to artificial intelligence under the facts in this case. 

Implications for Generative AI Litigation

The question on the minds of most copyright AI observers is, “what does this mean for the more than 30 copyright cases against frontier AI model developers—OpenAI, Google, Anthropic, Facebook, X/Twitter, and many others”?

My answer? In most cases, likely not much.

The 2025 Ross decision underscores that even intermediate copying can fall outside fair use when it ultimately facilitates the creation of a product that directly competes with the copyrighted work. For example, unlike Authors Guild v. Google Books, where the transformation involved enabled a unique search function without substituting for the original works, Ross’s use of headnotes was aimed squarely at developing an AI legal research tool that encroaches on Westlaw’s market. This market harm—central to fair use analysis—undermines the fair use defense by establishing that the copying, even if temporary or intermediate, has a direct commercial impact. The ruling aligns with recent precedents like Warhol, which require a truly transformative purpose rather than mere replication, thereby narrowing the scope of permissible intermediate copying in AI training contexts.

However, the case may not have much significance for most of the pending AI copyright cases. While the Ross decision tightens the fair use framework in situations where the end product directly competes with the original work, most current generative AI cases do not involve direct competition. Most generative AI systems produce entirely new content rather than serving as a substitute for the copyrighted materials used during training. As a result, the market harm and competitive concerns central to the Ross ruling may not be as relevant in these cases, and its impact on the broader generative AI landscape may be limited.

Conclusion

The ruling in Thomson Reuters v. Ross Intelligence sets an important precedent for how courts may evaluate the use of copyrighted works in AI training. Although fact-specific and limited to a non-generative AI context, the decision’s reliance on principles from the Warhol case—particularly the need for a transformative purpose and the critical weight of market impact—will likely influence future disputes, including those involving frontier generative AI models, particularly where the AI model competes with the owner of the training data.

Developers and content owners alike should take note: as the legal landscape adapts to the realities of AI, robust data sourcing strategies and a clear understanding of copyright limitations will be crucial. For companies working on generative AI, the challenge will be to innovate without replicating the competitive functions of existing copyrighted works—a balancing act that this decision has now brought into focus.

It’s also important to note that this ruling doesn’t end the case. There are remaining issues of fact that the judge reserved for trial. However, it appears that Ross Intelligence is bankrupt, and therefore may not have the financial resources to continue to trial. And, of course, Ross could appeal the trial judge’s rulings at the conclusion of the case, although it is questionable whether it will be able to do so for the same reason. It seems likely that this case will end here. 

Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. (D. Del. Feb. 11, 2025)

Second Circuit Revisits DMCA “Red Flag” Safe Harbor, Building on YouTube Legacy

Second Circuit Revisits DMCA “Red Flag” Safe Harbor, Building on YouTube Legacy

There was a period from roughly 2010 to 2016 when it seemed like I was posting on the DMCA take-down system every few months. Many of these posts focused on the Viacom v. Youtube litigation in the Second Circuit. See here, here, here and here. This massive litigation ended with a settlement in 2014. Nevertheless, before the case settled the Second Circuit issued a significant decision, establishing an important precedent on application of the Digital Millennium Copyright Act. 

The Second Circuit’s January 13, 2025 decision in Capitol Records v. Vimeo – written by Judge Pierre Leval, the Second Circuit’s widely acknowledged authority on copyright law – feels like déjà vu. Fifteen years after Capitol Records filed suit the court has reaffirmed and expanded upon the DMCA safe harbor principles it established thirteen years ago in YouTube. Yet the Vimeo decision addresses novel issues that highlight how both technology and legal doctrine have evolved since the YouTube era.

Building on Youtube’s Foundation

In its 2012 decision in Viacom v. YouTube, the Second Circuit ruled that to overcome an internet provider’s DMCA safe harbor protection requires copyright owners to show either that a platform had actual knowledge of specific infringements or that infringement would be “obvious to a reasonable person” – so-called “red flag knowledge.” Generalized awareness that infringement has occurred on a platform wasn’t enough. This framework has served as primary guidance during the explosive growth of user-generated content over the past decade.

Vimeo: New Technology, New Challenges

The Vimeo case presented similar issues but in a transformed technological landscape. Capitol Records asserted that Vimeo lost safe harbor protection because its employees interacted with 281 user-posted videos containing copyrighted music. While YouTube dealt with a nascent video-sharing platform, Vimeo involved a sophisticated service with established content moderation practices.

The court’s analysis of “red flag” knowledge builds on YouTube while providing important new guidance. Employee interaction with content through likes, comments, or featuring videos doesn’t create red flag knowledge. Copyright owners must now prove “specialized knowledge,” and basic copyright training or work experience isn’t enough to establish the expertise needed for this level of knowledge. Even obvious use of copyrighted music doesn’t create red flag knowledge given the complexity of fair use determinations, with the court specifically citing the recent Warhol case where copyright experts split on fair use analysis.

While YouTube focused primarily on knowledge standards, Vimeo tackles a critical question for modern platforms: how much content moderation is too much? The court held that basic curation—like featuring videos in “Staff Picks” or maintaining community standards—won’t strip safe harbor protection. It left open whether more aggressive moderation or encouraging specific types of potentially infringing content might cross the line.

See No Evil, Hear No Evil

However, the decision also creates incentives for platforms to minimize their oversight of copyrighted uploads to avoid triggering red flag liability: by limiting active monitoring or interaction with user-generated content, platforms can reduce the risk of being deemed to have actual or red flag knowledge of infringement. This has the effect of reinforcing the DMCA’s notice-and-takedown framework as the primary mechanism for addressing copyright infringement. Platforms like Vimeo are likely to choose to rely more heavily on this reactive system rather than implementing robust preemptive measures. 

AI and the Future of Safe Harbor

The Vimeo decision leaves open an increasingly important question: how will courts apply these standards as platforms adopt artificial intelligence for content moderation? While the court focused on human knowledge and interaction, modern platforms increasingly rely on automated systems to identify potential infringement. Future litigation will likely need to address whether AI-powered content recognition creates the kind of “specialized knowledge” that might lead to red flag awareness, and whether algorithmic promotion of certain content categories could constitute “substantial influence.”

While Vimeo expands on YouTube’s framework, both cases highlight a fundamental flaw in the DMCA safe harbor: the time and cost of litigation effectively nullifies its protections. YouTube took three years to resolve; Vimeo took fifteen. Without legislative clarity on key terms like “red flag knowledge” and “substantial influence,” copyright owners can continue using litigation costs as a weapon against small and mid-sized platforms—exactly what the DMCA was meant to prevent.

As technology advances, particularly in AI-powered content moderation, platforms must carefully balance robust content management with safe harbor compliance. The Vimeo decision provides valuable guidance while highlighting the need for continued evolution in DMCA safe harbor doctrine.

Capitol Records, LLC v. Vimeo, Inc. (2d Cir. Jan. 13, 2025)