Apparently chardet got Claude to rewrite the entire codebase from LGPL to MIT?
https://github.com/chardet/chardet/releases/tag/7.0.0
That is one way to launder GPL code I guess?
@scy
US court is leaning towards that LLM generated code is fundamentally not copyrightable.
This is a different problem to the moral issues I have with this.
@Foxboron But does "is not copyrightable" mean that "is not a license violation of its input data"? I highly doubt it.
@scy
A license violation usually implies that there is a copyright violation to begin with.
@Foxboron Yeah but that's what I mean: Just because the end result is not copyrightable, does that automatically mean that it can't be a copyright violation?
Like, changing the format or medium of something is not a copyrightable work.
So, by that logic, if I take a copyrighted MP3 and convert it to AAC and publish that, my AAC is not copyrightable, but it's not a copyright violation to take it and publish it?
That's what I mean.
@scy
I'm not a lawyer so I'm not going to try and debate what is and isn't a copyright violation.
@Foxboron Oh ffs: https://github.com/psf/requests/issues/7223#issuecomment-3993094073
(requests planning to switch to chardet 7+ as it's only character detection library again now that the licensing is MIT.)
@Foxboron, not to mention it doesn't pass its own test suite.
Supreme Court has already dismissed such cases.
So we are getting a precedent in US law. Yet to be settled in any high court in the EU though.
Sure, but we are not really looking at, nor discussing, cases where LLMs spits out something verbatim from another project in this case.
@Foxboron Considering that nobody can hold a copyright on AI-generated stuff, and therefore also can't release it under a different license, doesn't that mean this rewrite is basically public domain?
@dekkia
Public domain is not really a thing in most of the world. So "yes", for US. For EU it's more complicated.
@Foxboron somebody should do this with the leaked Windows source code
@wronglang
That would probably not be litigated under copyright law.
@Foxboron @joshbressers @scy Open-source projects that have sought to be compatible with proprietary software, e.g. Samba trying to be compatible with Windows SMB, etc., have (if I'm not misremembering) taken a "clean room" approach and outright stated they do not want any code from any developer who had even looked at the MSFT code for fear of being accused of infringement.
The copyrightability of LLM output is not relevant here - the only question is whether a court would consider the original license infringed upon in the creation of the output.
As I understand it, though, this is a reimplementation of a codebase by the same contributors -- Dan Blanchard seems to be the primary maintainer before and after the rewrite, so ISTM he'd be able to relicense the project regardless of whether it was passed through an LLM first.
It will be interesting when this happens because a company or person decides "I don't like copyleft, so I'll just run this through the LLM wash until I get a functional copy". But this doesn't seem to be that.
@jzb @Foxboron @joshbressers Maintainers can't just change the license without asking each and every contributor for their approval. In open source projects, contributors usually keep their individual copyright, except when the project has them sign additional terms, or assign copyright to the project or something.
@scy @Foxboron @joshbressers I mean, they _can_ if they rewrite the code in question.
So here - *if* one of the LGPL code contributors is offended by the license change they could look at the new codebase and see if the new code resembles their contribution. Then they'd have to challenge it.
But projects have been relicensed without seeking permission from every contributor and/or by removing contributions if they cannot get approval. I'm not aware of any cases where a contributor has successfully challenged such - but there's always a first time.
Depends.
If you have a permissively licensed project, you can change the source to GPL by just using a poison pill approach.
This is what Forgejo did as an example.
https://forgejo.org/2024-08-gpl/
This works as the MIT license terms are met.
The other way would not work.
@Foxboron @jzb @joshbressers You're right, I should've worded that differently.
They can change the license, if the current license allows it.
Still, everyone keeps their individual copyright.
@Foxboron @scy This means that anything "new" (i.e. nothing) the "AI" brought to the work is not a creative work that you can hold copyright to just because you were the person prompting/using the "AI".
It does NOT mean that the copyright on whatever the AI plagiarized is void. But that's how the industry will try to spin these rulings. We need to point out this distinction and fight their attempts to mislead in order to seize and enclose our work.
@Foxboron @joshbressers @scy verbatim isn’t the question here, the question is infringement. is the output here substantially derivative of previous versions of chardet to the point that it could be considered infringing? US copyright precedent is a muddled mess and I think this could implicate at least one unresolved circuit split. I don’t know what the answer will be but I know I wouldn’t want to be standing in the blast radius of that decision
@Foxboron It looks like this was the PR?
https://github.com/chardet/chardet/pull/322
Even aside from the ethical and moral issues with LLMs, it doesn't seem optimal that a 15k line PR affecting almost a million dependent repos (if GitHub's count is to be believed) was up for three days before getting merged in.
@aud It's at least not systems code, so there's not a lot of potential for buffer overflow and other memory unsafety exploits, but yeah. No. chardet is not a small surface area.
@xgranade There’s just no way that’s a good idea. I’m pretty sure a human who tried to push a 15K rewrite into most libraries would be yelled at forever and the PR rejected, or asked to be broken into smaller PRs, because it’s just such a large change in one go and no one can possibly fit that entire thing into their head.
It doesn’t magically become a good idea just because claude shat it out.
@aud I've made 15k line monolithic PRs before, there's sometimes good reason to do so. But yeah, it's not a great way to review things. It's just way too big to reason about at once.
@xgranade
They have been the upstream maintainer for years, so I don't see any huge issue with that.
I would have done the same probably?
@Foxboron Posted an unkind reply and deleted, sorry. I'm getting frustrated with the whole AI thing today, and I'm not being my best self. I should probably just step offline for a bit.
This is just so... frustrating.
@xgranade
Yes.
But lets not clutch pearls over how a understaffed FOSS project decides to merge their work.
Seems like the original author saw this as well.
https://github.com/chardet/chardet/issues/327
Please do not brigade the project.
I'm just making a claim that we can't fault people for how they on their pull requests work.
For the point you are raising see this issue from 13 years ago.
@Foxboron it's always stealing, just sometimes it looks less like it than this.