social.kernel.org

Conversation

Morten Linderud

Foxboron@chaos.social

2 days ago

Apparently chardet got Claude to rewrite the entire codebase from LGPL to MIT?

https://github.com/chardet/chardet/releases/tag/7.0.0

That is one way to launder GPL code I guess?

scy

scy@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron lol right, because Claude certainly wasn't trained on GPL code

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @scy@chaos.social

@scy
US court is leaning towards that LLM generated code is fundamentally not copyrightable.

This is a different problem to the moral issues I have with this.

scy

scy@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron But does "is not copyrightable" mean that "is not a license violation of its input data"? I highly doubt it.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @scy@chaos.social

@scy
A license violation usually implies that there is a copyright violation to begin with.

scy

scy@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron Yeah but that's what I mean: Just because the end result is not copyrightable, does that automatically mean that it can't be a copyright violation?

Like, changing the format or medium of something is not a copyrightable work.

So, by that logic, if I take a copyrighted MP3 and convert it to AAC and publish that, my AAC is not copyrightable, but it's not a copyright violation to take it and publish it?

That's what I mean.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @scy@chaos.social

@scy
I'm not a lawyer so I'm not going to try and debate what is and isn't a copyright violation.

Bubu

Bubu@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

Edited 2 days ago

@Foxboron Oh ffs: https://github.com/psf/requests/issues/7223#issuecomment-3993094073

(requests planning to switch to chardet 7+ as it's only character detection library again now that the licensing is MIT.)

Jesus Michał "Le Sigh" 🏔 (he)

mgorny@social.treehouse.systems

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron, not to mention it doesn't pass its own test suite.

Josh Bressers

joshbressers@infosec.exchange

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @scy

This will have to go through a court case to settle it probably

But if I look at your source code, then I reproduce some of your source exactly, that's a problem

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @mgorny@social.treehouse.systems

@mgorny
Amazing.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @joshbressers@infosec.exchange

@joshbressers @scy

Supreme Court has already dismissed such cases.

https://www.cnbc.com/2026/03/02/us-supreme-court-declines-to-hear-dispute-over-copyrights-for-ai-generated-material.html

So we are getting a precedent in US law. Yet to be settled in any high court in the EU though.

muelli

muelli@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @scy you could still have an opinion. Discussing legal matters is not a subject to be discussed exclusively by legal professionals as it affects non professionals, too.

Josh Bressers

joshbressers@infosec.exchange

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @scy

I suspect this is different. That case someone trying to copyright something the AI spit out, not asking if AI can violate a copyright by copying something almost verbatim

Of course I haven't looked to see if the chardet code is mostly a copy, if it's not, then 🤷

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @muelli@chaos.social

@muelli @scy
Sure.

But I'm not going to spend time on a strawman disguised as a logic puzzle. That isn't how laws work nor how they are formed.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @joshbressers@infosec.exchange

@joshbressers @scy

Sure, but we are not really looking at, nor discussing, cases where LLMs spits out something verbatim from another project in this case.

Dekkia

dekkia@dekkia.com

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron Considering that nobody can hold a copyright on AI-generated stuff, and therefore also can't release it under a different license, doesn't that mean this rewrite is basically public domain?

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @dekkia@dekkia.com

@dekkia
Public domain is not really a thing in most of the world. So "yes", for US. For EU it's more complicated.

Krzysztof Sakrejda

wronglang@bayes.club

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron somebody should do this with the leaked Windows source code

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @wronglang@bayes.club

@wronglang
That would probably not be litigated under copyright law.

Joe Brockmeier

jzb@hachyderm.io

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @joshbressers @scy Open-source projects that have sought to be compatible with proprietary software, e.g. Samba trying to be compatible with Windows SMB, etc., have (if I'm not misremembering) taken a "clean room" approach and outright stated they do not want any code from any developer who had even looked at the MSFT code for fear of being accused of infringement.

The copyrightability of LLM output is not relevant here - the only question is whether a court would consider the original license infringed upon in the creation of the output.

As I understand it, though, this is a reimplementation of a codebase by the same contributors -- Dan Blanchard seems to be the primary maintainer before and after the rewrite, so ISTM he'd be able to relicense the project regardless of whether it was passed through an LLM first.

It will be interesting when this happens because a company or person decides "I don't like copyleft, so I'll just run this through the LLM wash until I get a functional copy". But this doesn't seem to be that.

scy

scy@chaos.social

2 days ago

Reply to @jzb@hachyderm.io

@jzb @Foxboron @joshbressers Maintainers can't just change the license without asking each and every contributor for their approval. In open source projects, contributors usually keep their individual copyright, except when the project has them sign additional terms, or assign copyright to the project or something.

Joe Brockmeier

jzb@hachyderm.io

2 days ago

Reply to @scy@chaos.social

@scy @Foxboron @joshbressers I mean, they _can_ if they rewrite the code in question.

So here - *if* one of the LGPL code contributors is offended by the license change they could look at the new codebase and see if the new code resembles their contribution. Then they'd have to challenge it.

But projects have been relicensed without seeking permission from every contributor and/or by removing contributions if they cannot get approval. I'm not aware of any cases where a contributor has successfully challenged such - but there's always a first time.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @scy@chaos.social

@scy @jzb @joshbressers

Depends.

If you have a permissively licensed project, you can change the source to GPL by just using a poison pill approach.

This is what Forgejo did as an example.

https://forgejo.org/2024-08-gpl/

This works as the MIT license terms are met.

The other way would not work.

scy

scy@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

Edited 2 days ago

@Foxboron @jzb @joshbressers You're right, I should've worded that differently.

They can change the license, if the current license allows it.

Still, everyone keeps their individual copyright.

Cassandrich

dalias@hachyderm.io

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @scy This means that anything "new" (i.e. nothing) the "AI" brought to the work is not a creative work that you can hold copyright to just because you were the person prompting/using the "AI".

It does NOT mean that the copyright on whatever the AI plagiarized is void. But that's how the industry will try to spin these rulings. We need to point out this distinction and fight their attempts to mislead in order to seize and enclose our work.

Skyr

skyr@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @scy chances are high that LLM bros suspect it is, that's why they are cutting deals with Big Music. Unfortunately, there's no global-encompassing multi-billion dollar corporation protecting open-source...

kat

zkat@toot.cat

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron that's... not copyrightable, therefore not licensable?

Glyph

glyph@mastodon.social

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @joshbressers @scy verbatim isn’t the question here, the question is infringement. is the output here substantially derivative of previous versions of chardet to the point that it could be considered infringing? US copyright precedent is a muddled mess and I think this could implicate at least one unresolved circuit split. I don’t know what the answer will be but I know I wouldn’t want to be standing in the blast radius of that decision

Cassandra is only carbon now

xgranade@wandering.shop

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron It looks like this was the PR?

https://github.com/chardet/chardet/pull/322

Even aside from the ethical and moral issues with LLMs, it doesn't seem optimal that a 15k line PR affecting almost a million dependent repos (if GitHub's count is to be believed) was up for three days before getting merged in.

Asta [AMP]

aud@fire.asta.lgbt

2 days ago

Reply to @xgranade@wandering.shop

@xgranade @Foxboron 3 days, little review, 15K lines, on a library that seems to perform operations on text input?

yeah, that’s a big fucking “no” from a security standpoint. How many goddamn security issues are waiting in the wings here?

Cassandra is only carbon now

xgranade@wandering.shop

2 days ago

Reply to @aud@fire.asta.lgbt

@aud It's at least not systems code, so there's not a lot of potential for buffer overflow and other memory unsafety exploits, but yeah. No. chardet is not a small surface area.

Asta [AMP]

aud@fire.asta.lgbt

2 days ago

Reply to @xgranade@wandering.shop

@xgranade There’s just no way that’s a good idea. I’m pretty sure a human who tried to push a 15K rewrite into most libraries would be yelled at forever and the PR rejected, or asked to be broken into smaller PRs, because it’s just such a large change in one go and no one can possibly fit that entire thing into their head.

It doesn’t magically become a good idea just because claude shat it out.

Cassandra is only carbon now

xgranade@wandering.shop

2 days ago

Reply to @aud@fire.asta.lgbt

@aud I've made 15k line monolithic PRs before, there's sometimes good reason to do so. But yeah, it's not a great way to review things. It's just way too big to reason about at once.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @xgranade@wandering.shop

@xgranade
They have been the upstream maintainer for years, so I don't see any huge issue with that.

I would have done the same probably?

Cassandra is only carbon now

xgranade@wandering.shop

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron Posted an unkind reply and deleted, sorry. I'm getting frustrated with the whole AI thing today, and I'm not being my best self. I should probably just step offline for a bit.

This is just so... frustrating.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @xgranade@wandering.shop

@xgranade
Yes.

But lets not clutch pearls over how a understaffed FOSS project decides to merge their work.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @Foxboron@chaos.social

Seems like the original author saw this as well.

https://github.com/chardet/chardet/issues/327

Please do not brigade the project.

David Gerard

davidgerard@circumstances.run

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @xgranade If it was solely his work, he could just change the license. He didn't do that - he felt he had to AI-wash it. That suggests there is in fact other people's work in there that he's trying to AI-wash away their copyright.

Morten Linderud

Foxboron@chaos.social

2 days ago

Reply to @davidgerard@circumstances.run

@davidgerard @xgranade

I'm just making a claim that we can't fault people for how they on their pull requests work.

For the point you are raising see this issue from 13 years ago.

https://github.com/chardet/chardet/issues/36

David Gerard

davidgerard@circumstances.run

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron @xgranade right, confirming the issue I raised.

Farce Majeure

vathpela@infosec.exchange

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron it's always stealing, just sometimes it looks less like it than this.

Jarkko Sakkinen

jarkko

2 days ago

Reply to @Foxboron@chaos.social

@Foxboron https://social.kernel.org/notice/B3vwLO7kTOKLjpxCee

Jarkko Sakkinen

jarkko

2 days ago

Reply to @jarkko

@Foxboron Like this https://github.com/chardet/chardet/issues/328

jan Ki | 奇

ki@chaos.social

yesterday

Reply to @Foxboron@chaos.social

@Foxboron
this is fucking disgusting

Morten Linderud

Foxboron@chaos.social

yesterday

Reply to

@jack
Excuse me?

google stapler quartermaster

jack@status.sexyferret.science

yesterday

Reply to @Foxboron@chaos.social

@Foxboron you posted the exact thread that's "heated" and then said "but dont do anything with this" you're either being intentionally disingenuous or you're just not very bright. Which is it?