social.kernel.org

Conversation

Josh Triplett

josh@joshtriplett.org

Reply to

The board is incorrect. The OSI has corrupted the term Open Source by allowing those who want to propagate AIs that launder Open Source and proprietary code/data alike to do so under the banner of "Open Source". In particular, the so-called "Open Source AI" definition permits calling an AI "Open Source" even if it was trained on Open Source code or data and the license of its weights and outputs completely ignores the license of its training data.

This is an attempt to normalize the unacceptable practice of letting AI launder away the licenses of its training data, and to continue the practices of establishing "facts on the ground" that augur towards being able to continue ignoring the licenses of training data. The flagrant behavior of current AI training should not be allowed to continue, and should not be treated as a valid negotiating position from which to "compromise". Do not normalize the violation of Open Source licenses.

3

15

1

Michel Lind

michelin@hachyderm.io

Reply to @josh@joshtriplett.org

@josh @osi I am increasingly refusing to use the term "open source" over this, it was a bad name from the beginning (since it means something else outside tech) and is increasingly watered down to be almost meaningless

1

0

1

Josh Triplett

josh@joshtriplett.org

Reply to @michelin@hachyderm.io

On the contrary, I think the term "Open Source" still has a lot of value, if we can counter this dilution. The problem is that this proposal undermines all the efforts people make when trying to fight the terminology battle against companies rugpulling Open Source projects into a no-longer-open license and describing it as "open".

1

0

0

Jarkko Sakkinen

jarkko

Reply to @josh@joshtriplett.org

@josh @osi I had the same thoughts week ago but I tend to be a bit spiky from time to time so I thought maybe I was a bit too judgemental (which happens too often). Good to hear others reflect my first views on this. Thanks for writing this.

https://social.kernel.org/notice/AnPuVswNBKAitj9wxc

1

0

1

Jarkko Sakkinen

jarkko

Reply to @jarkko

@josh @osi For me this looked initially like as some sort of magic spell that a corp can say and then they just continue to do whatever shit they were doing before because they've just been "we're not doing evil stuff" stamped or something.

1

0

2

Cassandrich

dalias@hachyderm.io

Reply to

@osi It does not meet the definition whatsoever and you know it. It promotes violation of every single actually-Open-Source license out there and you know it. But you wanted a piece of the latest scam pie and the people who actually make Open Source (a term I'm likely to stop using now) don't matter in the slightest to you.

1

1

1

Josh Triplett

josh@joshtriplett.org

Reply to @josh@joshtriplett.org

An actionable change that would fix this problem:

"In order to be Open Source, an AI must respect the licenses of all of its training data. For example, if trained on licensed works that include an attribution requirement, the AI must provide the required attribution for each such licensed work; if trained on copyleft works, the AI must be licensed under a compatible license. This requirement applies regardless of legal requirements in any particular jurisdiction."

Note that this *doesn't* say "must be trained entirely on Open Source data". We can handle proprietary training data as a *separate* problem to be solved over time, akin to Debian's "contrib" section. In practice, what this requirement would mean for most AIs would be 1) a long attribution page, but note that the current definition *already* requires documenting the training data, 2) not training on copyleft training data, and 3) not training on unlicensed training data.

2

1

0

INIT_6

INIT6@infosec.exchange

Reply to @josh@joshtriplett.org

I see where you’re coming from, and I appreciate the clarity and depth of your concerns around licensing and Open Source definitions in AI training. I’d love to get your perspective on something I’ve been thinking about, which is how similar AI training data use could be to how humans learn.

When people read or are exposed to various works, even proprietary or confidential information, they incorporate this knowledge broadly rather than attributing specific ideas. In a way, we might even retain key insights from trade secrets or copyrighted material without an explicit obligation to give attribution every time a related idea is expressed.

If AI is working similarly—relying on approximations of knowledge rather than precise lookups—then, arguably, the output isn't a reproduction but more of a unique synthesis or restatement of learned concepts.

Does this approach seem too different from human learning to be applicable to AI? Or do you think AI, by the nature of its structure, necessitates a stricter adherence to the source material in terms of attribution and licensing?

1

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @INIT6@infosec.exchange

I think attempting to apply analogies from human brains to AI is error-prone for multiple reasons, but primarily because we generally have a shared instrumental value that *of course* it would be unconscionable to apply copyright directly to a human brain (related read: https://qntm.org/lena), whereas there's no such instrumental value about artificial neural networks / LLMs.

1

0

0

INIT_6

INIT6@infosec.exchange

Reply to @josh@joshtriplett.org

That's fair. That related read was fascinating. Poor dude. It reminded me of that one AI company where, after a while, their AI gets bored and chooses to do other things, like look at pictures, instead of doing the assigned workload.

I do wish training data was more opt-in; I've also thought some sort of royalty scheme would work well.

I'm still developing my stance on the topic, thanks for the response and I hope you continue communicating your stance.

0

0

0

Schaf

schaf@netzkms.de

Reply to @josh@joshtriplett.org

@josh @osi what about the exact same but "in order to not be illegal due to copyright infringements, ..." 🥲

1

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @schaf@netzkms.de

As much as I *wish* that were the case, some jurisdictions have made this non-infringing, and many companies are proceeding on the assumption that all jurisdictions will do so, so this shouldn't hinge on that.

0

0

0

Jarkko Sakkinen

jarkko

Reply to @jarkko

@josh @osi I'm not sure what was the point of time when open source turned into individuals inventing great things together (or to be totally honest sometimes having a huge flame wars together) into companies making these weird announcements together.

I mean for instance Linux Foundation seems to have almost at least bi-monthly announcement where they say how they are driving innovation in whatever is the hot topic of the day accompanied with endorsements with your "usual suspects" companies from IT, finance etc. business sectors. For me they have turned more like a joke than something I would ever consider to take seriously.

Recently I did "acid test" to LF to see if there is any real meat in these announcements when they launched https://www.lfdecentralizedtrust.org/. I thought that since I'm a long-time kernel maintainer in security and I also work for a company whose founder Gavin Wood literally invented smart contracts and coined up the term "Web3", I would be a great participant to the discussions or possible conference calls.

So I dropped email to their general inquiries address info@lfdecentralizedtrust.org. After three weeks my inbox has been silent :-) This was my expectation as I'm an individual not e.g. VISA. I'm not personally disappointed, but I'm disappointed because my hypothesis realized in this empirical experiment.

I have voting right in e.g. LF TAB elections but I do criticize Finnish politics sometimes too so I guess I can say this ;-) As LF puts it "decentralized innovation built on trust"...

0

0

1

mirabilos🐈‍⬛

mirabilos@toot.mirbsd.org

Reply to @dalias@hachyderm.io

@dalias @josh but what term would be appropriate to use that isn’t already poisoned by the FSF?

1

0

0

Josh Triplett

josh@joshtriplett.org

Reply to @mirabilos@toot.mirbsd.org

I don't think it's too late to attempt to prevent the loss of the term "Open Source".

0

0

0

Mathias Hasselmann

taschenorakel@mastodon.green

Reply to @josh@joshtriplett.org

@josh The term "open source" would be of value if people would realize that a term like "open" only makes sense if you associate it with "open community", but in fact many people say "open source" when they mean "public source".

1

0

0

mirabilos🐈‍⬛

mirabilos@toot.mirbsd.org

Reply to @taschenorakel@mastodon.green

@taschenorakel @josh @michelin @osi I think the standard term for that is “shared source”, i.e. source-available but not under an OSS licence.

1

0

0

Mathias Hasselmann

taschenorakel@mastodon.green

Reply to @mirabilos@toot.mirbsd.org

@mirabilos Sure, but why not use a term that fits better and is easier to understand, once you've heard it?

@josh @michelin @osi

1

0

0

mirabilos🐈‍⬛

mirabilos@toot.mirbsd.org

Reply to @taschenorakel@mastodon.green

@taschenorakel @josh @michelin @osi it’s more ambiguous; “public source” is too close to “public domain”

1

0

0

Michel Lind

michelin@hachyderm.io

Reply to @mirabilos@toot.mirbsd.org

@mirabilos @taschenorakel @josh @osi I prefer FOSS/ FLOSS really. And yeah I'm increasingly against using permissive licensing especially for projects where I have some say

0

0

1

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org