“Meta treated the so-called ‘public availability’ of shadow datasets as a get-out-of-jail-free card, despite internal Meta records showing all relevant decision-makers at Meta, up to and including its CEO, Mark Zuckerberg, knew that LibGen ‘a data set that we know to be pirated,’” the plaintiffs state in this motion. (Originally filed in late 2024, the motion is a motion to file a third amended complaint.)
In addition to the plaintiffs’ submissions, one other submission was not redacted in response to Chhabri’s order—Metin opposition to the motion to file an amended lawsuit. It claims the authors’ attempts to add additional claims to the case are an “eleventh-hour gambit based on a false and inflammatory premise” and denies Meta waited to reveal key information in discovery. Instead, Meta claims it first disclosed to prosecutors that it was using the LibGen dataset in July 2024. (Because most of the discovery material remains confidential, it’s difficult for WIRED to confirm that claim.)
Meta’s argument hinges on his contention that plaintiffs already knew about LibGen’s use and should not have been given additional time to file a third amended claim when they had plenty of time to do so before discovery ended in December 2024. “Plaintiffs knew for Meta’s download and use of LibGen and other alleged ‘shadow libraries’ from at least mid-July 2024,” the tech giant’s lawyers discuss.
In November 2023, Chhabria granted Meta’s motion to dismiss some of the lawsuit’s claims, including a claim that Meta’s alleged use of the author’s work to train artificial intelligence infringed Digital Millennium Copyright Acta US law introduced in 1998 to prevent people from selling or duplicating copyrighted works on the Internet. At that time the judge agreed with Meta’s position that the plaintiffs did not provide enough evidence to prove that the company removed what is known as “copyright management information,” such as the name of the author and the title of the work.
The unredacted documents argue that the plaintiffs should be allowed to amend their complaint, saying the information Meta disclosed is evidence that the DMCA claim was justified. They also say the discovery process revealed reasons for adding the new charges. “Target, through a corporate representative who testified on November 20, 2024, has now admitted under oath that it has uploaded (commonly known as ‘seeding’) pirated files containing plaintiff’s works to ‘torrent’ sites,” the motion states. (Seeding is when torrented files are shared with other peers after they have finished downloading.)
“This torrenting activity turned Meta itself into a distributor of the same pirated copyrighted material that it was also downloading for use in its commercially available AI models,” claims one of the recently redacted documents, stating that Meta, in other words, was not only using copyrighted material. rights without permission, but he also spread it.
LibGen, an online book archive that originated in Russia around 2008, is one of the largest and most controversial “shadow libraries” in the world. In 2015, a New York judge ordered preliminary injunction against the site, a measure designed in theory to temporarily shut down the archive, but its anonymous administrators simply changed its domain. In September 2024, another New York judge ordered LibGen will pay $30 million to rights holders for infringing their copyrights, despite not knowing who actually runs the piracy hub.
Meta’s problems with discovering this case are not over either. In the same order, Chhabria warned the tech giant against any overly broad redaction requests in the future: “If Meta files an unreasonably broad unsealing request again, all materials will simply be unsealed,” he wrote.