xi's moments
Home | Opinion Line

AI in the dock for copyright violations

China Daily | Updated: 2024-01-17 08:15

FILE PHOTO: OpenAI logo is seen in this illustration taken, Feb 3, 2023. [Photo/Agencies]

The New York Times filed a lawsuit against OpenAI and Microsoft in December alleging that the companies illegally used millions of its articles to train its large language artificial intelligence models. To support its case, The New York Times provided over 100 examples where the output from ChatGPT was highly similar to its articles.

In response, OpenAI issued a statement on Jan 8 saying that using publicly available internet materials to train AI models was reasonable, and OpenAI provides the option to opt out. It suggested that AI's "copying" and regurgitation of the original text, as demonstrated by The New York Times in its lawsuit, was a deliberate manipulation of prompt words by the newspaper, including the use of lengthy summaries of articles, in order to have the models spit out entire parts of specific pieces of content or articles. Although it also said that such regurgitation "is a rare bug that we are working to drive to zero".

In a deeper sense, their disagreement is more about the ethics of the language learning models. AI companies such as OpenAI argue that the training of LLMs, which refer to large language models that can generate humanlike responses to natural language queries based on massive data sets, is fundamentally different from copying. They say the learning and training process for AI models should be understood in accordance with the growth mechanism of people. That is, learning public information, acquiring knowledge reserves, and developing and improving in the interaction with those it serves.

Media organizations such as The New York Times, on the other hand, as well as seeing the technology as a competitor and threat, believe that the LLMs are plagiaristic and violate media ethics.

Whatever the lawsuit's outcome is, it will not only set a precedent on whether companies developing LLMs have to pay high copyright fees for their data sources, but also decide which definition of LLMs will legally prevail.

Global Edition
BACK TO THE TOP
Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
License for publishing multimedia online 0108263

Registration Number: 130349