Encyclopedia Britannica Takes OpenAI to Court Over AI Training Data

Encyclopedia Britannica, along with its dictionary subsidiary Merriam‑Webster, has filed a lawsuit against OpenAI in Manhattan federal court, accusing the Microsoft‑backed company of unlawfully using its reference materials to train artificial intelligence models.

TECH NEWS

AllComputerss

3/15/20262 min read

Encyclopedia Britannica Takes OpenAI to Court Over AI Training Data

Encyclopedia Britannica, along with its dictionary subsidiary Merriam‑Webster, has filed a lawsuit against OpenAI in Manhattan federal court, accusing the Microsoft‑backed company of unlawfully using its reference materials to train artificial intelligence models.

According to the complaint, OpenAI allegedly copied and ingested nearly 100,000 encyclopedia articles and dictionary entries into its training datasets for ChatGPT and other large language models.

Britannica argues that this practice not only infringes on its copyrights but also undermines its business by diverting web traffic. The lawsuit claims that ChatGPT often produces “near‑verbatim” reproductions of Britannica’s content, effectively replacing the need for users to visit Britannica’s own websites.

Allegations of Trademark Misuse and False Attribution

Beyond copyright concerns, Britannica also accuses OpenAI of trademark infringement. The complaint states that ChatGPT has, at times, implied Britannica’s endorsement or permission to reproduce its material. In some cases, the AI system allegedly cited Britannica in fabricated outputs commonly referred to as “hallucinations” which the company says damages its reputation and misleads users.

Britannica is seeking unspecified monetary damages as well as a court order to block further infringement, underscoring the seriousness of its claims.

Part of a Larger Legal Battle Over AI Training

This lawsuit is one of many currently unfolding in the U.S. and abroad, as publishers, authors, and media outlets challenge AI companies over the use of copyrighted material in training datasets. Similar cases have been filed against OpenAI by news organizations, book authors, and other content creators, all questioning whether AI training constitutes “fair use” or unlawful copying.

Britannica itself has already taken legal action against Perplexity AI, another AI startup, in a separate case that remains ongoing.

OpenAI and other AI developers have consistently argued that their systems transform existing content into something new, thereby qualifying as fair use under U.S. copyright law. However, courts have yet to deliver definitive rulings on these complex issues, leaving the legal landscape uncertain.

Why This Case Matters

The dispute highlights a critical tension in the AI industry: balancing innovation with respect for intellectual property. Britannica, a trusted source of knowledge for centuries, sees its carefully curated content as the foundation of its business. If AI models can freely reproduce that content without permission, Britannica argues, it risks losing both traffic and credibility.

For OpenAI, the case represents another major legal challenge at a time when its technology is under intense scrutiny. The outcome could set important precedents for how AI companies source and use training data, potentially reshaping the future of generative AI.

Looking Ahead

As of now, neither Britannica nor OpenAI has issued detailed public comments on the lawsuit. The case will likely take months, if not years, to resolve, but its implications extend far beyond the two companies involved. At stake is not only the protection of copyrighted works but also the question of how AI systems should be built and governed in an era where information is both abundant and fiercely protected.