It is now proven that copilot returns code from codebases with non-permissive licenses [1].
I'm curious - what are the legal implications of this going forward? I've so many questions.
1. Will Microsoft ever face lawsuits for these license violations?
2. If so, who/how? Class-action?
3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.
4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?
"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.
Even if it's somehow available under an MIT license (which is questionable on the part of jethrodaniel), there's still infringement. MIT isn't public domain, it still has
> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Replicating it without complying with those terms is still infringement.
this. People are being willfully blind here, like cult members looking dead-eyed at their leader and chanting "This is great" as they drink the kool-aid.
And from Microsoft no less, once outcast for mass poisoning.
Actually the legal system is evidence based. Microsoft has evidence that the code they are producing is licensed under MIT as far as they can reasonably know. There's no definitive way to know that who actually owns the original copyright. I could grant permission to use my repo, but maybe I got that code from someone else, who then got it from someone else and so on and so forth. It's a similar situation with stolen goods, if you unknowingly purchase stolen goods you usually cannot be charged for theft as long as there aren't obvious signs that it's stolen such as the goods being priced far below market value.
Microsoft has evidence that the code they are reproducing is MIT licensed, so are they intentionally violating that license or does this AI thing include the license and attribution in every snippet it generates?
Major aspects of copyright infringement are strict liability, like a lot of civil actions around damages. It doesn't matter if you thought it was OK, there's still a damaged party that needs compensation according to the law. At best you'll simply avoid the criminal and punitive penalties.
No, PornHub doesn't have liability in a lot of cases because of 17 § 512, but has still had to deal with liability in general, which is why they nuked some 80% of their library not backed by verified individuals a while back.
A huge part of 17§512 is the DMCA takedown process mainly in 17§512(c)(3). Does Microsoft even have the ability to truly remove training data from the model? Or do they have to retrain on each DMCA takedown?
I personally don't want to have to upload proof of identity to GitHub and a signed document swearing that I own the copyright to all the code I upload to GitHub, or proof that I coded it. We need to be careful what we wish for.
> THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
If they had a reasonable basis for believing they had a license they're in the clear. "I didn't know" might not be enough but "I had good reasons to think otherwise" is.
I’m not a lawyer but my understanding these are torts so all you have to prove is Microsoft has liability. I think this would be easy to prove due to the way neural networks work since it’s just a way of performing a search.
Since it’s a tort I don’t think you have to prove they should have know it would return copyrighted code, the fact that it does is enough to have liability.
IANAL. My understanding is that the general legal precedent in the US is that a) datamining text has no copyright implications (in the same way that reading a book has no copyright implications) and b) it is not a copyright violation to use a small amount of copyrighted material provided the context is sufficiently transformative. This might seem silly or unfair to you, but that is the current legal reality.
But even ignoring that, everybody uploading code to GitHub has given GitHub the right to analyze that code as per the GitHub ToS. This is the same mechanism by which you can't upload code to GitHub with a license that says "nobody is allowed to display this code on the internet" and then sue GitHub.
I can't imagine a scenario in which any lawyer would consider granting Github the right to "analyze" code anywhere close to granting Github the right to spit out that same code verbatim without your copyright notice (even if laundered by AI).
Here's Kate Downing, an IP lawyer specializing in software license:
> According to Downing, the answer depends to a certain extent on where that code is hosted. If it’s on GitHub, there very clearly would not be copyright infringement.
> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”
Downing cautions that copilot output of large chunks of code complete with comments are more questionable to use, but that for the most part it looks above board.
> The licence is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a licence for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory licence grant in its terms covers this as against the uploader.
To me regardless if it is technically legal, it certainly doesn’t feel right. Furthermore, contracts rely on people understanding what they are agreeing to, and I don’t think many developers would agree to letting the code be used outside the terms of the license they uploaded it under.
I am very surprised there hasn’t been a legal challenge to it.
“I’m sorry your honor I didn’t understand what I was signing” I don’t think has ever been a valid reason in a courtroom, similar to “I’m sorry I didn’t know I was committing a crime” is not a valid defense.
Courts interpret the intended and understood meaning of contracts and terms all the time. Research the term "meeting of the minds" and case law around it.
When the terms were written, it's exceedingly unlikely that they intended it or anyone understood it to be blanket permission to allow a trained AI to copy code for others and no user would have interpreted it that way. Microsoft/Github can't necessarily unilaterally increase the intended range without making it clear in the terms.
If it got to a court case, and both sides could afford it, it could be a lengthy one.
(This comment is not legal advice. I am not a lawyer.)
How does "[allowing] a trained AI to copy code" change the interpretation of the ToS?
By uploading your code, you give Github an exclusive license to use it to improve their services. Copilot is such a service. Just because it's an AI and it provides others code does not somehow invalidate the license you gave.
Again, research "meeting of the minds". It's a standard legal term directly relevant to all contracts and terms. Also, "transparency" is another important one.
Many online services have very wide terms around what they can do with your data, which most people who bother to read them interpret as being what is required for them to handle the service for you without breaking copyright law. In that context, being able to use and analyse your data to improve their services could be another catch-all that lets them do specific performance optimisation on their backend.
One party instead deciding they've got blanket permission to do whatever they like with your work, including selling it to others, may well not hold up in court.
Contracts aren't programs and one party tricking the other rarely works out in court - courts world-wide tend to rule against trickery and deception.
> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”
That's assuming that all code on GitHub is uploaded in good faith by the copyright owner, which is not always going to be the case.
Many repositories on Github were put there by people that do not own the copyright and never agreed to GitHub's Terms of Service.
Linux, for example, does not require copyright assignment. The original contributor of a change owns the copyright for that code and may have never used Github.
5. Even if it is illegal, is it actually bad?
No one can possibly sell code snippets, the transaction costs are many orders of magnitude greater than any reasonable price.
In my opinion, at least in this case the benefits massively outweigh the costs and the law should not apply here.
I really, REALLY like the idea of Copilot. I think it is a glance at what the future of AI can bring to improve programming. I understand where all the litigation and "uneasiness" is coming from, both from commercial and open-source projects.
I've not installed or used it for the same reason (don't want to use AGPL or GPLd code by accident, and don't want my closed source code to be used accidentally as well), but the thought of Copilot being "killed" due to litigation/copyright/licensing issues is sad.
For me, It's kind of like when MP3 first appeared: Sharing music in Napster or downloading Mp3s from Geocities was just amazing. The idea of having such things at your fingertips. Even though I understood the issue the authors had with the unpaid distribution of their music... still, the idea of "what could be..." made it amazing.
I guess Microsoft could be a bit forward thinking, and implement the "Spotify" model in code: Pay OpenSource developers (whoever owns the repo, or whoever made a commit?) a small amount whenever their code gets used through Copilot.
I'm super excited by how "Copilot" related services will look like in 10 years. And I really really hope that the technology/idea doesn't get killed by litigation.
Microsoft could have trained this on their own code and there would be no issue. The problem is instead of doing that they knew full well the approach would reproduce the code and they decided they would rather breach GPL than expose their own code. But I bet Microsoft has more than enough lines to train an AI, there was a clear choice to breach other peoples licenses in preference.
Huh... These comments have given me an idea: MS needs to be forced to train a model to compensate (pay) code authors and codebases based on snippet suggestions given by their tool: the Spotify model replacing Napster!
Some people won’t let you use their copyrighted work no matter how much you pay, that’s reasonable.
By all means allow repos to opt in, although if it’s licensed under something like GPL there’s no way to convert it to non gpl without permission from every contributor. I for one am not interested in Microsoft or anyone else paying me to close my code.
Allowing people to pay $xxx to copy my copyrighted work without my agreement is simple piracy.
Either they international agreement to drop copyright as a concept, or obey the law.
Of course it's bad. Noone who put up their work as open source wants some huge company taking it and selling it to get even more competitive advantage and influence in the world. And that's without mentioning the people who put that into their license pretty much explicitly. Taking GPL code and getting away with it is a failure of our justice system, and that can't be made right with throwing pennies at developers.
Is there any leaked Microsoft code on GitHub? Someone should check if Copilot regurgitates that as well, then see how Microsoft reacts when someone slaps an AGPL license on that…
It seems like Microsoft could be in the clear on the basis of it being essentially "search". But it also seems like anyone who uses it could be risking to a high degree getting infected with copyright violating code.
My question is, if it isn't a copyright infringement issue to use copilot in its current form right now, why not just claim copilot was used whenever accused of copyright infringement hence forth?
> why not just claim copilot was used whenever accused of copyright infringement hence forth?
Without speaking to the particulars of copilot, this situation where laws seem toothless because of the ease of plausible deniability is actually fairly common. And in many such cases, the law is not as toothless as it seems, because
1. Getting multiple people to stick to a script under oath is difficult and dangerous.
2. Criminals frequently send each other messages like
A: "lol I just crimed, hope nobody figures it out."
B: "lol just say you used copilot".
A: "lolol yeah fuck the law"
Obviously this only gets the worst criminals, but there seems to be lots and lots of them.
Microsoft is trying to legally position Copilot like StackOverflow. It is possible to post copyright-infringing code on SO even though their TOS requires a CC BY-SA 4.0 grant to the company and its users.
> It is now proven that copilot returns code from codebases with non-permissive licenses [1].
That same Quake example from last year is repeated every single time.
Aside from the fact that GitHub has since added a protection for this, that this example gets repeated time and time again instead of a *list of examples leads me to believe this is (and was not) a common occurrence.
3) Not likely. Worst case a judgement will go against them, they'll effectively pay a fine and then they'll retrain it on a more restricted set of source code.
4) OSS has a pretty tragic history re: enforcement. It wins nearly every skirmish but has no interest in the war so from a big picture standpoint, it loses due to apathy.
You don't think a mountain of MSFT lawyers in every state, including partner law firms around the world haven't thought about this? Do you practice law or are you speculating based on emotions?
No, SCO was found in 2002, from Candera Software who was a Linux distributor [0]. How could Microsoft in 1980s own a company that wasn’t founded until 2002?
That doesn't imply ownership and the article [20] that you pointed out doesn't make a specific claim. All good, but MSFT never fully owned or operated SCO at any level is the point I'm trying to make.
You aren't really saying anything at all. SCO would never have existed without Microsoft and Microsoft had a very significant stake in their business and gave it direction.
I'm curious - what are the legal implications of this going forward? I've so many questions.
1. Will Microsoft ever face lawsuits for these license violations?
2. If so, who/how? Class-action?
3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.
4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?
[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code