All it would take for AI to completely collapse is a ruling in the US saying these companies have to licence the content they used to train these tools.
They simply would never reach a sustainable business model if they had to fairly compensate all the people who wrote, drew, edited, sang or just created the content they use.
Simply being forced to respect attribution and licenses would kill them. Will that ruling ever happen? Maybe not. Should it? I think so.
ā
@thelinuxEXP copyright laws are all so outdated (in the US anyway, according to most youtubers I've listened to the topic on).
It'd be good to see a complete overhaul now that everyday people can make content seen by millions.
@ligniform I completely agree. If only because for once, it would also protect small creators and artists, not just giant companies!
@thelinuxEXP
They would just move to other language corpuses, no?
@lepapierblanc They would either have to pay the people who make the content, or use completely copyright free / license free material, which would basically render them pretty useless.
Big companies when they see someone using their 57 years old 2 second long sound effect: GO TO JAIL
Big companies stealing every bit of creative content from the internet without permission from the small creators: 
@mahbub « Itās different, weāre not copying the content, weāre creating something derivative so itās okĀ Ā», they say, as they refuse to acknowledge licenses
@thelinuxEXP what about non American or non-Western entities though? As much as I don't like the idea of American firms scraping everything to produce products using our work without paying us, I'm even less fond of the idea of China taking over and marching ahead without competition.
@sysop408 These companies are mainly US-based, and I would argue the US is the biggest repository of works they use, so this would put a stop to most efforts.
I would also love to see rulings in other areas of the world, though. I live in the EU, and I would be very happy to see the European Commission making it illegal to use EU produced content to train AIs without licensing rights.
@thelinuxEXP To play the devil's advocate a bit here, but people also learn in a similar way. You have to read to learn how to write. You have to listen to music to learn how to make your own, etc.
I think there are at least 2 main differences. The first one is that a human can only produce so much work on their own, while AI can mass produce.
@thelinuxEXP I would be very surprised if that ruling ever came.
@thelinuxEXP their CURRENT business model is unsustainable. They are all losing a lot of money
AI has destroyed the symbiotic relationship that existed between content creators and search engines, there's no retribution loop anymore. The current state of AGI is of parasitism. Without incentives for creating new content, who is going to create new content in the future? The retribution loop needs to be restored somehow.
@thelinuxEXP Not sure how/if it could be implemented but legislation requiring AI scrapers to identify themselves would allow servers to block them.
Web content doesn't make itself. Someone made it and owns it. (That remains true with AI-generated content.) Establishing a right to *not* have your content scraped, and implementing the opt in/opt out switches, would be an excellent approach.
(In my view, the right to *not* have content scraped is inherent in copyright.)
@thelinuxEXP I wish laws applied equally to everyone. If we aren't going to do IP, we should get generic drugs NOW. If we are going to do it, AI should pay for the content.
@thelinuxEXP Nick, you're talking about #capitalism as a system not just the AI bit of it lately come to fruition.
If CAPITALISM had to "fairly compensate" everyone who makes it work it would fall apart.
@thelinuxEXP AI is premature, shouldn't have become mainstream just yet, so it is a *must*
@remenca Thatās not at all what is happening though, is it?
@Abercrombie Sure. But I should get a say if my personal data and health data is used to train this.
And this is one good use case among many pretty bad ones.
@thelinuxEXP The first problem here you will have in a legal sense is to prove that your work was used to train a model. There is pretty much no way to trace original individual training samples from a transformer model. So you lose right thereā¦Even if a law existed that licenses had to be respected, it is unenforceable.
@vartak The NYT proves that pretty competently already, ChatGPT can just spit out entire parts of their articles ;)
@thelinuxEXP This is trickier than you are making it out to be. When an object is used to train a network, it isn't being copied. But information regarding that object is captured in the network 'anonymously' and 'abstractly'. So, as an analogy - you definitely own your beard. But do you also have a right to a picture of your beard that I took in the wild? Or if someone wrote an article describing a beard that looks like yours... Do you also own that article?
@vartak I do own the rights to a picture of my beard that you took, yeah ;) Thatās the general rule for pictures of people and buildings
@thelinuxEXP I agree, but I don't think it will happen. The LLMs have all already been trained on stolen data. It's a knot that can't be undone at this point. There will be a lot of hand wringing and yelling, but in the end the corporations and *their* government lackeys will just hand-wave any grievances and then "promise" not to do it again in the future knowing full well they absolutely will.
In the end we're all to blame though. We clicked "I agree" on every social media platform.
@apemantus Itās not though. There was never any bubble in the first place. There were people who made content for ridiculously small payouts, and a really tiny fraction making a lot of money.
@thelinuxEXP honestly, I don't think that's necessary. Training a LLM isn't the same as using copyright materials. That's like saying if I copy paste your this post into a text file on my computer requires me to pay you for it!
Instead, I'd argue to give incentives to companies to release their LLMs publicly, Like Meta and Mistral do.
Unless you are truly looking for killing generative AI, in which case, we can't have any discussion. But I can say throughout history, every new tech had faced people who thought it was their duty to destroy that technology no matter the cost.
@hirad Thatās not the same at all, though, is it? Because theyāre not just copying content, theyāre selling access to a tool that uses that content, that they grabbed without attribution, without respecting licensing either.
Itās not the same as personal use from an individual ;)
@hirad I donāt want to destroy it, I want these tools to respect what they trained on, which currently they donāt.
Iām not even affected yet, AFAIK, but the argument that itās just like copying a file doesnāt work, and never did. A company selling a product doesnāt use the same rules as an individual for their own use, thatās never been the case :)
@thelinuxEXP @vartak Thatās definitely not the rule, Nick. If itās in public, itās legal to photograph and the photo belongs to whomever took it.
Barbra Streisand learned that rule the hard way.
@remenca Ah yeah, thatās really the biggest AI model or tool everyone is hearing about right now, and absolutely the direction most commercial AI tools are goingā¦
@thelinuxEXP That sounds a lot like you donāt think there should be LLMs at all. At least western ones.
@bouncing Well, no. I think they are useful, but they need to compensate people for using their work, just like every other industry had to do before them. If they canāt make that work as a sustainable business model, then yeah, they can go.
@thelinuxEXP I have used every one of those words in a previous post.
You will be hearing from my lawyers.
It all depends on how the information is reused.
If whole passages are copied, that is copyright infringement. But using a collected works to learn what is proper is basically how humans do it on a much larger time scale.
@i_gvf Except one human doesnāt compare to the scale of a giant model that learns 10000 times faster from millions of sources. You canāt apply the law of one human being to a giant data center, it doesnāt work.
@thelinuxEXP One of the sometimes positive things about Capitalism is that it is an adversarial system, so these decisions don't happen in a vacuum, and it is interesting to wonder whether and why these new AI companies have more leverage/influence/power than media companies.
@thelinuxEXP I would be surprised if it doesn't fall under "fair use" doctrine. We wouldn't want to do away with fair use, which lets us quote each other and learn and apply new techniques without asking permission. Requiring licensing and such for AI training would need to show that the output of that training is derivative and seeing as that it's learning in ways very similar to the way we do...that could be problematic. It's a big, complex issue.
@thelinuxEXP An alternative would be if USPTO decided that generated content could not be copyrighted.
No company or VC form would touch the stuff ever again. It'd live on, but I'm a very diminished manner.
@thelinuxEXP That would collapse OpenAI, but companies could obtain enough legally licensed and useful data to build new models.
ā
@thelinuxEXP it gives power to everyday people who make art/videos/other content so I have my doubts it'll happen, but it'd be a nice change.
AI will let everyday people make better videos, i.e. instead of an artist painting individual images, they can storyboard short films.
fleshing out game worlds takes huge amounts of art.
Also there's plenty of gaps AI can't handle yet, better to focus on those (vs holding back a new capability)
@walter4096 @ligniform Iām not saying thatās bad, Iād love AI assistants to make my hideo production easier and better! I just donāt think this is worth stealing content from people :)
it only works well when it's trained on huge volumes.
the more it's trained on the more general it gets, the less likely it is to be overfit
copyright on specific things still applies, e.g. I can draw an x-wing fighter becsuse my brain learned from seeing it, but I can't sell that. Can't we treat AI the same way?
"everything is a remix" kind of anyway. darth vader=samuri helmet+respirator, xwing =dragster+dart, etc..
@walter4096 @ligniform Just because it only works on stolen content doesnāt make it ok :)
is it really 'stealing' if its just doing what we do - learning from what it sees.
there's already a concept of derived & transformative works in copyright law to account for this sort of thing.
The training process is deriving generative rules from the data rather than copying it.
anyway if you enforce a strict interpretation govt/rival states would still use it against us (propoganda, weapons, general purpose robots).
@walter4096 @ligniform This learning argument is fallacious at best. Itās not like itās one human learning, and using that for themselves.
Itās an automated system doing that at a gigantic scale and built by a company for profit. Not comparable at all ;)
"built by a company for profit", there's opensource AI models aswell, and we could crowdfund training runs.
stable diffusion on my PC could generate 10,000 images per day. you can give it sketches to add detail, more controlable
isn't it better if everyone has this multiplier (images, text, code, motion..)?
@walter4096 @ligniform No, not at the price of the hard work of artists.
@thelinuxEXP I understand that you aren't happy about them using such content but where do they violate licenses? Aren't they using material publicly available on the internet? Licenses maybe forbid to copy or distribute it, but to read it or learn from it? I don't think, that any license forbids that.
@duco The GPL says that all code built upon it needs to be GPL. I would argue all copilot generated code should thus be GPL.
Some licenses require attribution even for derivated works. No AI does any attribution.
@thelinuxEXP Maybe. You could make a pretty persuasive argument that LLM training is fair use, as itās transformative.
There are also examples of society deciding that itās important not to require an explicit individual license: https://en.wikipedia.org/wiki/Compulsory_license
Also worth pointing out, you can opt-out of LLM training with a simple robots.txt entry.
@bouncing Fairness use is a case by case thing, there is no blanket definition of it. So every generated result would have to be judged individually in relation to how transformative in regards to all the works it used :) Basically impossible
We never had this automated remix abiilty in the past. It is a waste of human hands and minds to do things manually that a machine do.
This is a new reality. a PC can generate 10,000 new unique guided images per day.
In that world its not worth doing the same kind of 2D art. but art skills wont go away.
Real artists will get far more out of AI tools than me.
I look forward to their AI movies!
@walter4096 @ligniform Iām not discussing the usefulness.
But « itās so useful and practicalĀ Ā» is not a good argument for appropriating all that content without thinking about the people who created it, who it belongs to, or its license. It was never an argument.
At that point, I could say itās OK to steal a billionaireās money because I would use it to solve world problems. That argument doesnāt work, usefulness doesnāt come before everything else.
this theft argument ..
literally everything everyone does is influenced by traces of what they've seen.
star wars was patterned on 'the hidden fortress', elements of the standard "hero's journey", and a lore lucas wrote because he couldn't get the Flash Gordon license.
it's copyrighted (yes I can't sell x-wing fan art) but to say you can't train on it is silly when the elements all come from elsewhere
@walter4096 @ligniform No, itās not silly at all. Itās absolutely logical and normal to say that itās its own thing, even if itās based on something else.
This is a completely weird argument to make. Yes, everything is based on something else, it doesnāt mean it has no intrinsic value and thus belongs to everyone??
@duco Basicslly « publicly availableĀ Ā» doesnāt mean free of charge or of restrictions to use.
YouTube videos are publicly available, yet youāre not allowed to download them, it breaches the ToS. I can find an image from Getty in Google search, doesnāt mean I can use it freely on my website ;)
if you see the original work reproduced, you can complain.
its pointless fretting about this.
artists skills produce more if they go into 3D (zbrush sculpts), and into storyboarding.
i'd love to see movies of hyperion.. expanse seasons 7-9 .. star wars EU.. re-imaginings of Blakes 7, space 1999.. this can all happen in an AI world where 1man+$2000 can make a film (and $20 can do a 30 sec trailer to generate interest if they dont have $2000)
@walter4096 @ligniform I donāt understand at all this viewpoint, sorry.
I entirely disagree with the premise, and the result š
@thelinuxEXP doesnāt most of this somewhat apply to search engines as well?