Conversation

Apparently chardet got Claude to rewrite the entire codebase from LGPL to MIT?

https://github.com/chardet/chardet/releases/tag/7.0.0

That is one way to launder GPL code I guess?

11
13
0

@Foxboron lol right, because Claude certainly wasn't trained on GPL code

1
0
0

@scy
US court is leaning towards that LLM generated code is fundamentally not copyrightable.

This is a different problem to the moral issues I have with this.

2
0
0

@Foxboron But does "is not copyrightable" mean that "is not a license violation of its input data"? I highly doubt it.

1
0
0

@scy
A license violation usually implies that there is a copyright violation to begin with.

1
0
0

@Foxboron Yeah but that's what I mean: Just because the end result is not copyrightable, does that automatically mean that it can't be a copyright violation?

Like, changing the format or medium of something is not a copyrightable work.

So, by that logic, if I take a copyrighted MP3 and convert it to AAC and publish that, my AAC is not copyrightable, but it's not a copyright violation to take it and publish it?

That's what I mean.

1
0
0

@scy
I'm not a lawyer so I'm not going to try and debate what is and isn't a copyright violation.

3
0
0

@Foxboron Oh ffs: https://github.com/psf/requests/issues/7223#issuecomment-3993094073

(requests planning to switch to chardet 7+ as it's only character detection library again now that the licensing is MIT.)

0
0
0

@Foxboron, not to mention it doesn't pass its own test suite.

1
0
0

@Foxboron @scy

This will have to go through a court case to settle it probably

But if I look at your source code, then I reproduce some of your source exactly, that's a problem

1
0
0

@joshbressers @scy

Supreme Court has already dismissed such cases.

https://www.cnbc.com/2026/03/02/us-supreme-court-declines-to-hear-dispute-over-copyrights-for-ai-generated-material.html

So we are getting a precedent in US law. Yet to be settled in any high court in the EU though.

1
0
0

@Foxboron @scy you could still have an opinion. Discussing legal matters is not a subject to be discussed exclusively by legal professionals as it affects non professionals, too.

1
0
0

@Foxboron @scy

I suspect this is different. That case someone trying to copyright something the AI spit out, not asking if AI can violate a copyright by copying something almost verbatim

Of course I haven't looked to see if the chardet code is mostly a copy, if it's not, then 🤷

1
0
0

@muelli @scy
Sure.

But I'm not going to spend time on a strawman disguised as a logic puzzle. That isn't how laws work nor how they are formed.

0
0
0

@joshbressers @scy

Sure, but we are not really looking at, nor discussing, cases where LLMs spits out something verbatim from another project in this case.

2
0
0

@Foxboron Considering that nobody can hold a copyright on AI-generated stuff, and therefore also can't release it under a different license, doesn't that mean this rewrite is basically public domain?

1
0
0

@dekkia
Public domain is not really a thing in most of the world. So "yes", for US. For EU it's more complicated.

0
0
0

@Foxboron somebody should do this with the leaked Windows source code

1
0
0

@wronglang
That would probably not be litigated under copyright law.

0
0
0

@Foxboron @joshbressers @scy Open-source projects that have sought to be compatible with proprietary software, e.g. Samba trying to be compatible with Windows SMB, etc., have (if I'm not misremembering) taken a "clean room" approach and outright stated they do not want any code from any developer who had even looked at the MSFT code for fear of being accused of infringement.

The copyrightability of LLM output is not relevant here - the only question is whether a court would consider the original license infringed upon in the creation of the output.

As I understand it, though, this is a reimplementation of a codebase by the same contributors -- Dan Blanchard seems to be the primary maintainer before and after the rewrite, so ISTM he'd be able to relicense the project regardless of whether it was passed through an LLM first.

It will be interesting when this happens because a company or person decides "I don't like copyleft, so I'll just run this through the LLM wash until I get a functional copy". But this doesn't seem to be that.

1
0
0

@jzb @Foxboron @joshbressers Maintainers can't just change the license without asking each and every contributor for their approval. In open source projects, contributors usually keep their individual copyright, except when the project has them sign additional terms, or assign copyright to the project or something.

2
0
0

@scy @Foxboron @joshbressers I mean, they _can_ if they rewrite the code in question.

So here - *if* one of the LGPL code contributors is offended by the license change they could look at the new codebase and see if the new code resembles their contribution. Then they'd have to challenge it.

But projects have been relicensed without seeking permission from every contributor and/or by removing contributions if they cannot get approval. I'm not aware of any cases where a contributor has successfully challenged such - but there's always a first time.

0
0
0

@scy @jzb @joshbressers

Depends.

If you have a permissively licensed project, you can change the source to GPL by just using a poison pill approach.

This is what Forgejo did as an example.

https://forgejo.org/2024-08-gpl/

This works as the MIT license terms are met.

The other way would not work.

1
0
0

@Foxboron @jzb @joshbressers You're right, I should've worded that differently.

They can change the license, if the current license allows it.

Still, everyone keeps their individual copyright.

0
0
0

@Foxboron @scy This means that anything "new" (i.e. nothing) the "AI" brought to the work is not a creative work that you can hold copyright to just because you were the person prompting/using the "AI".

It does NOT mean that the copyright on whatever the AI plagiarized is void. But that's how the industry will try to spin these rulings. We need to point out this distinction and fight their attempts to mislead in order to seize and enclose our work.

0
0
0

@Foxboron @scy chances are high that LLM bros suspect it is, that's why they are cutting deals with Big Music. Unfortunately, there's no global-encompassing multi-billion dollar corporation protecting open-source...

0
0
0

@Foxboron that's... not copyrightable, therefore not licensable?

0
0
0

@Foxboron @joshbressers @scy verbatim isn’t the question here, the question is infringement. is the output here substantially derivative of previous versions of chardet to the point that it could be considered infringing? US copyright precedent is a muddled mess and I think this could implicate at least one unresolved circuit split. I don’t know what the answer will be but I know I wouldn’t want to be standing in the blast radius of that decision

0
0
0

@Foxboron It looks like this was the PR?

https://github.com/chardet/chardet/pull/322

Even aside from the ethical and moral issues with LLMs, it doesn't seem optimal that a 15k line PR affecting almost a million dependent repos (if GitHub's count is to be believed) was up for three days before getting merged in.

2
0
0

@xgranade @Foxboron 3 days, little review, 15K lines, on a library that seems to perform operations on text input?

yeah, that’s a big fucking “no” from a security standpoint. How many goddamn security issues are waiting in the wings here?

1
0
0

@aud It's at least not systems code, so there's not a lot of potential for buffer overflow and other memory unsafety exploits, but yeah. No. chardet is not a small surface area.

1
0
0

@xgranade There’s just no way that’s a good idea. I’m pretty sure a human who tried to push a 15K rewrite into most libraries would be yelled at forever and the PR rejected, or asked to be broken into smaller PRs, because it’s just such a large change in one go and no one can possibly fit that entire thing into their head.

It doesn’t magically become a good idea just because claude shat it out.

1
0
0

@aud I've made 15k line monolithic PRs before, there's sometimes good reason to do so. But yeah, it's not a great way to review things. It's just way too big to reason about at once.

0
0
0

@xgranade
They have been the upstream maintainer for years, so I don't see any huge issue with that.

I would have done the same probably?

1
0
0

@Foxboron Posted an unkind reply and deleted, sorry. I'm getting frustrated with the whole AI thing today, and I'm not being my best self. I should probably just step offline for a bit.

This is just so... frustrating.

1
0
0

@xgranade
Yes.

But lets not clutch pearls over how a understaffed FOSS project decides to merge their work.

1
0
0

Seems like the original author saw this as well.

https://github.com/chardet/chardet/issues/327

Please do not brigade the project.

0
0
0

@Foxboron @xgranade If it was solely his work, he could just change the license. He didn't do that - he felt he had to AI-wash it. That suggests there is in fact other people's work in there that he's trying to AI-wash away their copyright.

1
0
0

@davidgerard @xgranade

I'm just making a claim that we can't fault people for how they on their pull requests work.

For the point you are raising see this issue from 13 years ago.

https://github.com/chardet/chardet/issues/36

1
0
0

@Foxboron it's always stealing, just sometimes it looks less like it than this.

0
0
0
@Foxboron you posted the exact thread that's "heated" and then said "but dont do anything with this" you're either being intentionally disingenuous or you're just not very bright. Which is it?
2
0
0

@jack

I wrote "please do not brigade the thread".

Do you want to try again?

1
0
0
@Foxboron the real je ne sais quoi of it is the denigrating of the cancer patient.... and that people don't know.....
0
0
0
@Foxboron so the latter. glad to clarify. don't like to assume malice where it's just incapacity.
1
0
0