Law and Politics Science and Technology

No ‘free ride’ for surging OpenAI

A cascade of lawsuits is seeking compensation from OpenAI for culling copyrighted data and information. University experts assess the impact of the pending legal rulings on the fast-rising technology.
The OpenAI logo is seen displayed on a cell phone with an image on a computer screen generated by ChatGPT's Dall-E text-to-image model, Friday, Dec. 8, 2023, in Boston. Artificial intelligence went mainstream in 2023 — it was a long time coming and has a long way to go for the technology to match people's science fiction fantasies of human-like machines. (AP Photo/Michael Dwyer)
The OpenAI logo is displayed on a cell phone with an image on a computer screen generated by ChatGPT's Dall-E text-to-image model. Photo: The Associated Press

In late December, The New York Times filed a lawsuit against OpenAI and its parent company Microsoft seeking billions of dollars in “statutory and actual damages.” The Times’ claim joined a rash of other lawsuits filed by authors and media content creators all alleging that OpenAI’s ChatGPT chatbot has misappropriated copyrighted content to train their large language models (LLMs). 

Andres Sawicki, professor and director of the Business of Innovation, Law, and Technology Concentration (BILT) for the University of Miami School of Law, noted that case is just getting started and it’s too soon to evaluate the parties’ arguments. The case is not likely to go before a jury until 2025 at the earliest. 

“The biggest challenge in these cases is trying to evaluate whether the ‘training’ of this kind of AI is the kind of activity that constitutes infringement of a copyright owner’s exclusive rights or, instead, is either non-infringing (because the copyright owner’s exclusive rights don’t extend to this kind of activity) or is protected by the fair use doctrine (which limits the copyright owner’s exclusive rights in particular contexts when doing so would further the underlying goals of the copyright system).  

“On top of that difficult legal question, the courts will have to grapple with some fairly complex technology—not typically a strong suit of our legal system,” Sawicki noted. 

Given the current state of the technology, a Times’ win would be “a very bad thing” for the generative AI industry, Sawicki ventured. 

“The quantity of text required to train a large language model is so large that it would be prohibitively expensive, not to mention inordinately complicated, to obtain a license from enough copyright owners. And even if it is possible, it would be limited to a very small handful of firms with massive cash reserves, which would limit entry into a field that is still in its nascent stages,” Sawicki said.

Still, he added that some research indicates that it is possible to train LLMs on “synthetic” data (i.e., text that is itself generated by a generative AI model), which would allow firms to train models without needing access to large amounts of copyrighted text. If these initiatives pan out, then the industry might be able to develop reasonably well even if the Times wins. 

Nicholas Tsinoremas, vice provost for Research Computing and Data and founding director of the Frost Institute for Data Science and Computing (IDSC), welcomed the lawsuits that lend publicity and media attention and spur learning about this new technology.

“The awareness that it is happening is the best outcome. Whether it’s because of the court rulings or webinars about AI or conversations at universities, the attention brings us all into the conversation,” said Tsinoremas. “We cannot allow things to move ahead as usually happens with technologies and try to fix it later.

“These technologies are here to stay; they’re going to mature and progress. So, it’s better that we try to set limits and seek a good balance with safeguards and controls. I’m glad that it’s happening earlier than later so that we’re not waiting until three to four years down the road and realizing that some information has already been misused and led to ‘X,’” Tsinoremas added.

Other AI industry firms including Meta and Stability AI have already faced legal claims concerning the alleged misuse of copyrighted material. 

According to Sawicki, these claims have had only preliminary rulings—and nothing that would impact the Times’ lawsuit, which is the biggest punch thrown against OpenAI to date.   

The most relevant precedents are Authors Guild v. Google, Inc. (the Google Books case) and Andy Warhol Foundation for Visual Arts, Inc. v. Goldsmith, Sawicki noted. In the Google Books case, Google scanned books under copyright protection -to make them searchable by its users; Google showed users only portions of these books. The court held that the fair use doctrine protected this activity, in part because Google’s service did not provide a substitute for consumers who might instead buy the entire books. 

The more recent Goldsmith case, meanwhile, may have changed how courts evaluate another factor in the fair use analysis (i.e., whether a work is “transformative”). Both of these factors—market substitution and transformative—are implicated in The New York Times v. OpenAI litigation, according to Sawicki.    

Tsinoremas called for a more expansive conversation about the new technology and especially for more trusted and “neutral” people to get involved.

“Everybody needs to be educated. We have the experts, but you can’t just leave it to the experts, you have to engage the community,” he said. “We especially need people who are honest brokers, experts who don’t have a stake in AI’s financial success.

“We need forums and the exchange of ideas—this will take time and needs to include a lot of stakeholders. When we create these technologies, we also have to understand that we have the responsibility to manage what their impact will be,” Tsinoremas added.