Copyright | A lawsuit targets the way AI is designed

At the end of June, Microsoft presented a new type of artificial intelligence (AI) technology capable of generating its own computer code.


Called Copilot, this tool was designed to speed up the work of professional programmers. As they typed on their laptops, it offered ready-made blocks of computer code that they could instantly add to their own code.

Many programmers loved the new tool or were at least intrigued by it. But Matthew Butterick, programmer, designer, writer and lawyer in Los Angeles, was not one of them. This month, he, along with a team of other attorneys, filed a lawsuit seeking to launch a class action lawsuit against Microsoft and the other big-name companies that designed and deployed Copilot.

A first challenge

Like many cutting-edge AI technologies, Copilot developed its skills by analyzing large amounts of data. In this case, he relied on billions of lines of computer code published on the internet. Butterick, 52, likens this process to hacking, because the system does not recognize its debt to existing work. In his lawsuit, he claims that Microsoft and its collaborators violated the legal rights of millions of programmers who spent years writing the original code.

It would be the first legal challenge to a design technique called “AI training,” which is a way to build AI and is set to shake up the tech industry. In recent years, many artists, writers, experts and privacy advocates have complained that companies that train their AI systems do so using data that does not belong to them.

This remedy has echoes in the history of the technology industry. In the 1990s and 2000s, Microsoft fought against the rise of free software (open-source), viewing them as an existential threat to the company’s future. As the importance of these software grew, Microsoft embraced them and even acquired GitHub, a platform where free software programmers build and store their code.

Almost all new generations of technologies — even online search engines — have faced similar legal challenges. Often, “there’s no law or case law that covers them,” said Bradley J. Hulbert, an intellectual property attorney dedicated to this increasingly important area of ​​law.

The complaint is part of a wave of concern about AI. Artists, writers, composers and other creators are increasingly concerned that companies and researchers are using their work to create new technologies without their consent and without providing them with compensation. These companies are thus driving a wide variety of AI-based tools, including art generators, voice recognition systems like Siri and Alexa, and even driverless cars.

OpenAI, at the forefront

Copilot is based on technology developed by OpenAI, an artificial intelligence lab in San Francisco funded to the tune of US$1 billion by Microsoft. OpenAI is at the forefront of increasingly widespread efforts to train AI technologies from digital data.

After Microsoft and GitHub introduced Copilot, Nat Friedman, CEO of GitHub, tweeted that using existing code to train the system constitutes “fair use” of the material under copyright law. author, an argument often used by companies and researchers who have built these systems. But no court case has yet tested this argument.

“Microsoft and OpenAI’s ambitions go far beyond GitHub and Copilot,” Butterick said in an interview. They want to train on any data, anywhere, for free, without consent, forever. »

In 2020, OpenAI unveiled a system called GPT-3. Researchers trained the system using massive amounts of digital text, including thousands of books, Wikipedia articles, chat logs (cats) and other data published on the internet.

By spotting patterns in all of these texts, the system learned to predict the next word in a sequence. When someone typed a few words into this “large language model,” they could complete the thought with entire paragraphs of text. This way, the system could write its own Twitter posts, speeches, poems, and news articles.

To the surprise of the researchers who built the system, it could even write computer programs, having apparently learned from countless programs posted on the internet.

So OpenAI went a step further by training a new system, Codex, on a new collection of data that specifically contained code. According to the lab, in a research paper detailing the technology, at least some of that code came from GitHub, a popular programming service owned and operated by Microsoft.

This new system became the underlying technology for Copilot, which Microsoft distributed to programmers through GitHub. After being tested by a relatively small number of programmers for about a year, Copilot was made available to all coders on GitHub in July.

Butterick defines himself as a free software programmer, part of the community of programmers who openly share their code with the world. Over the past 30 years, open source software has fueled most of the technologies consumers use every day, including web browsers, smart phones, and mobile apps.

Although free software is designed to be shared freely between coders and companies, this sharing is governed by licenses designed to ensure that it is used in a way that benefits the entire programming community. Butterick believes that Copilot violated these licenses and that as it improves, it will make free software coders obsolete.

Beginning of the legal process

After months of publicly complaining about the problem, he filed his case with a handful of other lawyers. The complaint is still in its infancy and the court has not yet granted class action status.

To the surprise of many legal experts, Butterick’s complaint does not accuse Microsoft, GitHub, and OpenAI of copyright infringement. She takes a different approach, arguing that the companies violated GitHub’s terms of service and privacy policies, while violating a federal law that requires companies to display copyright information when using material. .

Butterick and another attorney behind the case, Joe Saveri, said the lawsuit could eventually address the copyright issue.

Asked about the company’s ability to discuss the lawsuit, a GitHub spokesperson declined to comment, before stating via email that the company is “committed to responsible innovation with Copilot since the beginning, and will continue to evolve the product to better serve developers around the world.” Microsoft and OpenAI declined to comment on the lawsuit.

This article was originally published in the New York Times.


source site-55

Latest