25. March 2025
Ai Training Scandal: Tech Giant Meta Accused Of Copyright Infringement Over Pirated Book Dataset

The Use of Pirated Books in AI Training: A Concern for Authors and Writers
A controversy has emerged surrounding the use of pirated books to train artificial intelligence (AI) models. Meta, the parent company of Facebook and Instagram, is facing allegations of copyright infringement after using an allegedly pirated dataset of books to train its AI. The issue has sparked outrage among authors in Australia, who feel that their work was used without their consent and that the company’s actions are both unethical and illegal.
The controversy began when court filings were made in the United States, alleging that Meta had been sued by authors such as Ta-Nehisi Coates and Sarah Silverman for copyright infringement. According to these allegations, Mark Zuckerberg, Meta’s CEO, approved the use of the LibGen dataset, an online archive of books, to train the company’s AI models despite warnings from his AI executive team that it was a pirated dataset.
The LibGen dataset includes books published by many Australian authors, including former prime ministers Malcolm Turnbull, Kevin Rudd, Julia Gillard, and John Howard. Authors such as Holden Sheppard, who wrote the young adult novel “Invisible Boys,” have expressed their outrage at discovering that two of his books and two short stories were included in the dataset.
“I am furious to learn my books have been again pirated and used without my consent to train a generative AI system which is not only unethical and illegal in its current form, but something I am vehemently opposed to,” Sheppard said. “No consent has been obtained from any of the thousands of authors who have had our work taken, and not a single cent has been paid to any of us.”
Sheppard believes that Meta is in a financial position to compensate authors fairly and that the company’s actions are not above the law. He calls for AI-specific legislation to be introduced in Australia that requires generative AI developers or deployers to obtain consent from authors before using their work.
Journalist and author Tracey Spicer has also spoken out against Meta’s use of pirated books, stating that she feels “violated” when she realized her two books were included in the dataset. Spicer notes that many authors do not make a living wage, especially in small markets like Australia, and that it is unfair for companies like Meta to profit from their work without permission.
“This is peak technocapitalism,” Spicer said. “It’s a bit rich for big tech to cry poor. These companies can afford to pay for content, or they can create synthetic datasets.”
Alexandra Heller-Nicholas, an award-winning film critic and author of ten books on cult movies, has also spoken out against Meta’s use of pirated books. Eight of her books, including those she co-edited, were included in the dataset.
“It is no understatement that this is my lifetime’s work,” Heller-Nicholas said. “I’m upset, angry, but mostly exhausted.” She notes that many authors pour their hearts and souls into their work, only to see it used without permission or compensation.
The Australian Society of Authors has put out a call for authors to get in touch with the organization to advocate on their behalf against the use of their works. The society’s chair, Sophie Cunningham, notes that massive corporations like Meta are profiting from writers’ work and reducing them to “serfs.”
“Most writers are lucky to get $18,000 per year,” Cunningham said. “And they’re not even having the right to be involved in which work [is used].” She believes that Meta is treating writers with contempt.
Meta has declined to comment on the issue, citing the ongoing litigation. However, the company has reportedly lobbied the Trump administration to declare, via executive order, that training AI on copyrighted data is fair use.
In contrast, some AI companies have begun entering into agreements with publishers for the use of their work, including OpenAI, which signed a deal with The Guardian in February for use of Guardian content in ChatGPT. This approach highlights the need for clear guidelines and regulations around the use of copyrighted materials in AI training.
As the debate surrounding the use of pirated books in AI training continues to unfold, it is essential that authors, writers, and publishers advocate for their rights and interests. The use of AI has the potential to revolutionize many industries, but only if it is done responsibly and with respect for intellectual property rights.
The controversy raises important questions about copyright infringement, consent, and the ethics of using copyrighted materials in AI development. As we move forward, it is crucial that we prioritize the rights and interests of authors and writers, ensuring that their work is protected and respected.
Moreover, this issue highlights the need for clear guidelines and regulations around the use of copyrighted materials in AI training. It also underscores the importance of transparency and accountability in the development and deployment of AI systems.
Ultimately, the future of AI development will depend on its ability to balance innovation with respect for intellectual property rights. By prioritizing the rights and interests of authors and writers, we can ensure that AI is developed in a responsible and ethical manner.