Conversation

Open Source Initiative osi

The board is confident that the process has resulted in a definition that meets the standards of Open Source as defined in the Open Source Definition and the Four Essential Freedoms, and we’re energized about how this definition positions OSI to facilitate meaningful and practical Open Source guidance for the entire industry.” https://opensource.org/blog/the-open-source-initiative-announces-the-release-of-the-industrys-first-open-source-ai-definition

2
0
0
The board is incorrect. The OSI has corrupted the term Open Source by allowing those who want to propagate AIs that launder Open Source and proprietary code/data alike to do so under the banner of "Open Source". In particular, the so-called "Open Source AI" definition permits calling an AI "Open Source" even if it was trained on Open Source code or data and the license of its weights and outputs completely ignores the license of its training data.

This is an attempt to normalize the unacceptable practice of letting AI launder away the licenses of its training data, and to continue the practices of establishing "facts on the ground" that augur towards being able to continue ignoring the licenses of training data. The flagrant behavior of current AI training should not be allowed to continue, and should not be treated as a valid negotiating position from which to "compromise". Do not normalize the violation of Open Source licenses.
3
17
1

@josh @osi I am increasingly refusing to use the term "open source" over this, it was a bad name from the beginning (since it means something else outside tech) and is increasingly watered down to be almost meaningless

1
0
1
On the contrary, I think the term "Open Source" still has a lot of value, if we can counter this dilution. The problem is that this proposal undermines all the efforts people make when trying to fight the terminology battle against companies rugpulling Open Source projects into a no-longer-open license and describing it as "open".
1
0
0
@josh @osi I had the same thoughts week ago but I tend to be a bit spiky from time to time so I thought maybe I was a bit too judgemental (which happens too often). Good to hear others reflect my first views on this. Thanks for writing this.

https://social.kernel.org/notice/AnPuVswNBKAitj9wxc
1
0
1
@josh @osi For me this looked initially like as some sort of magic spell that a corp can say and then they just continue to do whatever shit they were doing before because they've just been "we're not doing evil stuff" stamped or something.
1
0
2

@osi It does not meet the definition whatsoever and you know it. It promotes violation of every single actually-Open-Source license out there and you know it. But you wanted a piece of the latest scam pie and the people who actually make Open Source (a term I'm likely to stop using now) don't matter in the slightest to you.

1
1
1
An actionable change that would fix this problem:

"In order to be Open Source, an AI must respect the licenses of all of its training data. For example, if trained on licensed works that include an attribution requirement, the AI must provide the required attribution for each such licensed work; if trained on copyleft works, the AI must be licensed under a compatible license. This requirement applies regardless of legal requirements in any particular jurisdiction."

Note that this *doesn't* say "must be trained entirely on Open Source data". We can handle proprietary training data as a *separate* problem to be solved over time, akin to Debian's "contrib" section. In practice, what this requirement would mean for most AIs would be 1) a long attribution page, but note that the current definition *already* requires documenting the training data, 2) not training on copyleft training data, and 3) not training on unlicensed training data.
2
1
0

@josh

I see where you’re coming from, and I appreciate the clarity and depth of your concerns around licensing and Open Source definitions in AI training. I’d love to get your perspective on something I’ve been thinking about, which is how similar AI training data use could be to how humans learn.

When people read or are exposed to various works, even proprietary or confidential information, they incorporate this knowledge broadly rather than attributing specific ideas. In a way, we might even retain key insights from trade secrets or copyrighted material without an explicit obligation to give attribution every time a related idea is expressed.

If AI is working similarly—relying on approximations of knowledge rather than precise lookups—then, arguably, the output isn't a reproduction but more of a unique synthesis or restatement of learned concepts.

Does this approach seem too different from human learning to be applicable to AI? Or do you think AI, by the nature of its structure, necessitates a stricter adherence to the source material in terms of attribution and licensing?

1
0
0
I think attempting to apply analogies from human brains to AI is error-prone for multiple reasons, but primarily because we generally have a shared instrumental value that *of course* it would be unconscionable to apply copyright directly to a human brain (related read: https://qntm.org/lena), whereas there's no such instrumental value about artificial neural networks / LLMs.
1
0
0

@josh

That's fair. That related read was fascinating. Poor dude. It reminded me of that one AI company where, after a while, their AI gets bored and chooses to do other things, like look at pictures, instead of doing the assigned workload.

I do wish training data was more opt-in; I've also thought some sort of royalty scheme would work well.

I'm still developing my stance on the topic, thanks for the response and I hope you continue communicating your stance.

0
0
0

@josh @osi what about the exact same but "in order to not be illegal due to copyright infringements, ..." 🥲

1
0
0
As much as I *wish* that were the case, some jurisdictions have made this non-infringing, and many companies are proceeding on the assumption that all jurisdictions will do so, so this shouldn't hinge on that.
0
0
0
@josh @osi I'm not sure what was the point of time when open source turned into individuals inventing great things together (or to be totally honest sometimes having a huge flame wars together) into companies making these weird announcements together.

I mean for instance Linux Foundation seems to have almost at least bi-monthly announcement where they say how they are driving innovation in whatever is the hot topic of the day accompanied with endorsements with your "usual suspects" companies from IT, finance etc. business sectors. For me they have turned more like a joke than something I would ever consider to take seriously.

Recently I did "acid test" to LF to see if there is any real meat in these announcements when they launched https://www.lfdecentralizedtrust.org/. I thought that since I'm a long-time kernel maintainer in security and I also work for a company whose founder Gavin Wood literally invented smart contracts and coined up the term "Web3", I would be a great participant to the discussions or possible conference calls.

So I dropped email to their general inquiries address info@lfdecentralizedtrust.org. After three weeks my inbox has been silent :-) This was my expectation as I'm an individual not e.g. VISA. I'm not personally disappointed, but I'm disappointed because my hypothesis realized in this empirical experiment.

I have voting right in e.g. LF TAB elections but I do criticize Finnish politics sometimes too so I guess I can say this ;-) As LF puts it "decentralized innovation built on trust"...
0
0
1

@dalias @josh but what term would be appropriate to use that isn’t already poisoned by the FSF?

1
0
0
I don't think it's too late to attempt to prevent the loss of the term "Open Source".
0
0
0

@josh The term "open source" would be of value if people would realize that a term like "open" only makes sense if you associate it with "open community", but in fact many people say "open source" when they mean "public source".

@michelin @osi

1
0
0

@taschenorakel @josh @michelin @osi I think the standard term for that is “shared source”, i.e. source-available but not under an OSS licence.

1
0
0

@mirabilos Sure, but why not use a term that fits better and is easier to understand, once you've heard it?

@josh @michelin @osi

1
0
0

@taschenorakel @josh @michelin @osi it’s more ambiguous; “public source” is too close to “public domain”

1
0
0

@mirabilos @taschenorakel @josh @osi I prefer FOSS/ FLOSS really. And yeah I'm increasingly against using permissive licensing especially for projects where I have some say

0
0
1