social.kernel.org

Conversation

Brett Sheffield (he/him)

All Your Base Are Belong to LLM

"The output from an LLM is a derivative work of the data used to train the LLM.

If we fail to recognise this, or are unable to uphold this in law, copyright (and copyleft on which it depends) is dead. Copyright will still be used against us by corporations, but its utility to FOSS to preserve freedom is gone."

https://blog.brettsheffield.com/all-your-base-are-belong-to-llm

#FOSS #OpenSource #FreeSoftware #LLM #AI

libreuser

libreuser@techhub.social

1 year ago

Reply to @dentangle@chaos.social

@dentangle When LLMs get better, it's possible that we will be able to feed them leaked or decompiled proprietary source code and get a legally usable source code out of it. So we will be able to turn proprietary code into free code.

Mishari (EN)

mishari@floss.social

1 year ago

Reply to @dentangle@chaos.social

@dentangle I don't think it's that simple. I was reading a commentary that says with model sizes, it is very unlikely a single byte of the original code is stored in the model in any meaningful way.

I propose we need new thinking about all of this.

Brett Sheffield (he/him)

dentangle@chaos.social

1 year ago

Reply to @mishari@floss.social

@mishari "We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT"

https://arxiv.org/abs/2311.17035

@pettter@social.accum.se

pettter@mastodon.acc.umu.se

1 year ago

Reply to @dentangle@chaos.social

@dentangle @mishari Is this the "repeat X forever" thing or a new attack?

Brett Sheffield (he/him)

dentangle@chaos.social

1 year ago

Reply to @pettter@mastodon.acc.umu.se

Edited 1 year ago

@pettter @mishari This is the paper that published that "repeat" attack, but it's worth reading in full, as the whole process that led them to *try* that approach is fascinating. Their methodology relies on the fact that all these models "memorize" some percentage of their training set and may repeat it verbatim (and can be tricked into doing so).

Pavel Machek

pavel

1 year ago

Reply to @dentangle@chaos.social

@dentangle And author's novel is derivative work of all the texts he ever red? Stop this nonsense. Copyright has enough overreach already.

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org