InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News First Open Source Copyright Lawsuit Challenges GitHub Copilot

Development

First Open Source Copyright Lawsuit Challenges GitHub Copilot

Nov 18, 2022 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

A class-action lawsuit has been filed in a US federal court challenging the legality of GitHub Copilot and the related OpenAI Codex. The suit against GitHub, Microsoft, and OpenAI claims violation of open-source licenses and could have a wide impact in the world of artificial intelligence.

GitHub previewed Copilot, an OpenAI-powered coding assistant, in the summer of 2021 and announced its general availability last July. Powered by the artificial intelligence model OpenAI Codex, the service is a cloud-based tool to assist developers in writing new code by analyzing existing code and comments on GitHub.

The litigation was submitted by Matthew Butterick, programmer and lawyer, and the law firm Joseph Saveri, a group specialized in antitrust and class actions. According to the pursuers, by training their AI systems on public repositories the defendants have violated the rights of many developers who posted code under different open-source licenses that require attribution, including the MIT, GPL, and Apache licenses.

In a previous article, Butterick questions how the service had been trained with machine learning using billions of lines of code written by human programmers and argues that the solution should not be a new open-source license:

Some have suggested creating an open-source license that forbids AI training. But this kind of usage-based restriction has never been part of the open-source ethos. (...) By the same token, it does not make sense to hold AI systems to a different standard than we would hold human users. Widespread open-source license violations should not be shrugged off as an unavoidable cost.

Alex Champandard, artificial intelligence expert and co-founder of creative.ai, assesses the case:

Reading through the GitHub CoPilot litigation submitted; although it was pulled off quickly — it's a solid piece of work! The defendants (...) are in a very bad position. The documents show how Codex and CoPilot act like databases; they have three different examples of JS code that is recited verbatim — with mistakes — from licensed sources. (...) The documents then proceed to cast doubt on the claim of FairUse, that even if it was applicable here, it wouldn't help circumvent (a) the breach of contract, (b) the privacy issues, and (c) the DMCA.

In a Twitter thread, Giuseppe Bertone, developer advocate at Swirlds Labs, disagrees:

Developers are liable for what they use: their brain, copy from Slack Overflow, AI tools, pen & paper, etc. GitHub Copilot is just a tool - a toy, currently - like many others. Sue developers that use copyrighted code incorrectly, regardless of why and how they did it.

The litigation is considered the first class-action case challenging the training and output of AI systems and the impacts might not affect only Copilot. Microsoft and GitHub are not the only companies working on ML-powered coding assistants, with AWS unveiling the preview of Amazon CodeWhisperer earlier this year.

According to the Authors Alliance, the lawsuit raises important questions about how researchers can use AI to train and produce outputs using datasets based on copyrighted materials. Jeremy Daly, author of the weekly serverless newsletter Off-by-none, comments:

Who would have thought that AI-generated code that learned from private repositories would result in a lawsuit alleging "software piracy on an unprecedented scale"

Butterick created a separate website with some background information about the case. GitHub, Microsoft, and OpenAI have not yet commented on the lawsuit.

About the Author

Renato Losio

Renato has extensive experience as a cloud architect, tech lead, and cloud services specialist. Currently, he lives in Berlin and works remotely as a principal cloud architect. His primary areas of interest include cloud services and relational databases. He is an editor at InfoQ and a recognized AWS Data Hero. You can connect with him on LinkedIn.

Show moreShow less

This content is in the Source Code topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

First Open Source Copyright Lawsuit Chal­lenges GitHub Copi­lot

Write for InfoQ

About the Author

Renato Losio

This content is in the Source Code topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter

First Open Source Copyright Lawsuit Challenges GitHub Copilot