Conversation

All it would take for AI to completely collapse is a ruling in the US saying these companies have to licence the content they used to train these tools.

They simply would never reach a sustainable business model if they had to fairly compensate all the people who wrote, drew, edited, sang or just created the content they use.

Simply being forced to respect attribution and licenses would kill them. Will that ruling ever happen? Maybe not. Should it? I think so.

27
11
5

@thelinuxEXP copyright laws are all so outdated (in the US anyway, according to most youtubers I've listened to the topic on).

It'd be good to see a complete overhaul now that everyday people can make content seen by millions.

1
0
0

@ligniform I completely agree. If only because for once, it would also protect small creators and artists, not just giant companies!

1
0
0

@thelinuxEXP
They would just move to other language corpuses, no?

1
0
0

@lepapierblanc They would either have to pay the people who make the content, or use completely copyright free / license free material, which would basically render them pretty useless.

0
0
0

@thelinuxEXP

Big companies when they see someone using their 57 years old 2 second long sound effect: GO TO JAIL

Big companies stealing every bit of creative content from the internet without permission from the small creators: ageblobcat

1
0
0

@mahbub « It’s different, we’re not copying the content, we’re creating something derivative so it’s okĀ Ā», they say, as they refuse to acknowledge licenses

0
0
0

@thelinuxEXP what about non American or non-Western entities though? As much as I don't like the idea of American firms scraping everything to produce products using our work without paying us, I'm even less fond of the idea of China taking over and marching ahead without competition.

1
0
0

@sysop408 These companies are mainly US-based, and I would argue the US is the biggest repository of works they use, so this would put a stop to most efforts.

I would also love to see rulings in other areas of the world, though. I live in the EU, and I would be very happy to see the European Commission making it illegal to use EU produced content to train AIs without licensing rights.

0
0
0

@thelinuxEXP To play the devil's advocate a bit here, but people also learn in a similar way. You have to read to learn how to write. You have to listen to music to learn how to make your own, etc.

I think there are at least 2 main differences. The first one is that a human can only produce so much work on their own, while AI can mass produce.

0
0
0

@thelinuxEXP I would be very surprised if that ruling ever came.

0
0
0

@thelinuxEXP their CURRENT business model is unsustainable. They are all losing a lot of money

0
0
0

AI has destroyed the symbiotic relationship that existed between content creators and search engines, there's no retribution loop anymore. The current state of AGI is of parasitism. Without incentives for creating new content, who is going to create new content in the future? The retribution loop needs to be restored somehow.

0
0
0

@thelinuxEXP

I am in complete agreement with this

0
0
0

@thelinuxEXP Not sure how/if it could be implemented but legislation requiring AI scrapers to identify themselves would allow servers to block them.

Web content doesn't make itself. Someone made it and owns it. (That remains true with AI-generated content.) Establishing a right to *not* have your content scraped, and implementing the opt in/opt out switches, would be an excellent approach.

(In my view, the right to *not* have content scraped is inherent in copyright.)

0
0
0

@thelinuxEXP I wish laws applied equally to everyone. If we aren't going to do IP, we should get generic drugs NOW. If we are going to do it, AI should pay for the content.

0
0
0

Rage Rumbles šŸ“ā€ā˜ ļøšŸ«‚ šŸ”ž

@thelinuxEXP Nick, you're talking about as a system not just the AI bit of it lately come to fruition.

If CAPITALISM had to "fairly compensate" everyone who makes it work it would fall apart.

0
0
0

@thelinuxEXP AI is premature, shouldn't have become mainstream just yet, so it is a *must*

0
0
0

@remenca That’s not at all what is happening though, is it?

0
0
0

@Abercrombie Sure. But I should get a say if my personal data and health data is used to train this.

And this is one good use case among many pretty bad ones.

0
0
0

@thelinuxEXP The first problem here you will have in a legal sense is to prove that your work was used to train a model. There is pretty much no way to trace original individual training samples from a transformer model. So you lose right there…Even if a law existed that licenses had to be respected, it is unenforceable.

1
0
0

@vartak The NYT proves that pretty competently already, ChatGPT can just spit out entire parts of their articles ;)

0
0
0

@thelinuxEXP This is trickier than you are making it out to be. When an object is used to train a network, it isn't being copied. But information regarding that object is captured in the network 'anonymously' and 'abstractly'. So, as an analogy - you definitely own your beard. But do you also have a right to a picture of your beard that I took in the wild? Or if someone wrote an article describing a beard that looks like yours... Do you also own that article?

1
0
0

@vartak I do own the rights to a picture of my beard that you took, yeah ;) That’s the general rule for pictures of people and buildings

1
0
0

@thelinuxEXP I agree, but I don't think it will happen. The LLMs have all already been trained on stolen data. It's a knot that can't be undone at this point. There will be a lot of hand wringing and yelling, but in the end the corporations and *their* government lackeys will just hand-wave any grievances and then "promise" not to do it again in the future knowing full well they absolutely will.

In the end we're all to blame though. We clicked "I agree" on every social media platform.

0
0
0

@thelinuxEXP The ā€œcontent creatorā€ bubble is bursting.

1
0
0

@apemantus It’s not though. There was never any bubble in the first place. There were people who made content for ridiculously small payouts, and a really tiny fraction making a lot of money.

0
0
0

@thelinuxEXP honestly, I don't think that's necessary. Training a LLM isn't the same as using copyright materials. That's like saying if I copy paste your this post into a text file on my computer requires me to pay you for it!
Instead, I'd argue to give incentives to companies to release their LLMs publicly, Like Meta and Mistral do.
Unless you are truly looking for killing generative AI, in which case, we can't have any discussion. But I can say throughout history, every new tech had faced people who thought it was their duty to destroy that technology no matter the cost.

2
0
0

@hirad That’s not the same at all, though, is it? Because they’re not just copying content, they’re selling access to a tool that uses that content, that they grabbed without attribution, without respecting licensing either.

It’s not the same as personal use from an individual ;)

0
0
0

@hirad I don’t want to destroy it, I want these tools to respect what they trained on, which currently they don’t.

I’m not even affected yet, AFAIK, but the argument that it’s just like copying a file doesn’t work, and never did. A company selling a product doesn’t use the same rules as an individual for their own use, that’s never been the case :)

0
0
0

@thelinuxEXP @vartak That’s definitely not the rule, Nick. If it’s in public, it’s legal to photograph and the photo belongs to whomever took it.

Barbra Streisand learned that rule the hard way.

1
0
0

@bouncing @vartak Nope. Try to sell a picture of the Eiffel Tower, or a painting displayed publicly, or to publish a video of people walking in the street without their consent, and see how fast you’ll have to pay damages ;)

0
0
0

@remenca Ah yeah, that’s really the biggest AI model or tool everyone is hearing about right now, and absolutely the direction most commercial AI tools are going…

0
0
0

@thelinuxEXP That sounds a lot like you don’t think there should be LLMs at all. At least western ones.

1
0
0

@bouncing Well, no. I think they are useful, but they need to compensate people for using their work, just like every other industry had to do before them. If they can’t make that work as a sustainable business model, then yeah, they can go.

1
0
0

@thelinuxEXP I have used every one of those words in a previous post.

You will be hearing from my lawyers.

It all depends on how the information is reused.

If whole passages are copied, that is copyright infringement. But using a collected works to learn what is proper is basically how humans do it on a much larger time scale.

1
0
0

@i_gvf Except one human doesn’t compare to the scale of a giant model that learns 10000 times faster from millions of sources. You can’t apply the law of one human being to a giant data center, it doesn’t work.

0
0
0

@ikanreed That’s very likely

0
0
0

@thelinuxEXP One of the sometimes positive things about Capitalism is that it is an adversarial system, so these decisions don't happen in a vacuum, and it is interesting to wonder whether and why these new AI companies have more leverage/influence/power than media companies.

0
0
0

@thelinuxEXP I would be surprised if it doesn't fall under "fair use" doctrine. We wouldn't want to do away with fair use, which lets us quote each other and learn and apply new techniques without asking permission. Requiring licensing and such for AI training would need to show that the output of that training is derivative and seeing as that it's learning in ways very similar to the way we do...that could be problematic. It's a big, complex issue.

0
0
0

@thelinuxEXP An alternative would be if USPTO decided that generated content could not be copyrighted.

No company or VC form would touch the stuff ever again. It'd live on, but I'm a very diminished manner.

0
0
0

@thelinuxEXP That would collapse OpenAI, but companies could obtain enough legally licensed and useful data to build new models.

1
0
0

@not2b And that would be much better!

0
0
0

@thelinuxEXP it gives power to everyday people who make art/videos/other content so I have my doubts it'll happen, but it'd be a nice change.

1
0
0

@ligniform @thelinuxEXP

AI will let everyday people make better videos, i.e. instead of an artist painting individual images, they can storyboard short films.
fleshing out game worlds takes huge amounts of art.

Also there's plenty of gaps AI can't handle yet, better to focus on those (vs holding back a new capability)

1
0
0

@walter4096 @ligniform I’m not saying that’s bad, I’d love AI assistants to make my hideo production easier and better! I just don’t think this is worth stealing content from people :)

1
0
0

@thelinuxEXP @ligniform

it only works well when it's trained on huge volumes.
the more it's trained on the more general it gets, the less likely it is to be overfit

copyright on specific things still applies, e.g. I can draw an x-wing fighter becsuse my brain learned from seeing it, but I can't sell that. Can't we treat AI the same way?

"everything is a remix" kind of anyway. darth vader=samuri helmet+respirator, xwing =dragster+dart, etc..

1
0
0

@walter4096 @ligniform Just because it only works on stolen content doesn’t make it ok :)

2
0
0
@thelinuxEXP Come on, copyright is overreaching enough already. Plus, this would effectively give Facebook and China monopoly on big language models. Does not sound great.
0
0
0

@thelinuxEXP @ligniform

is it really 'stealing' if its just doing what we do - learning from what it sees.

there's already a concept of derived & transformative works in copyright law to account for this sort of thing.

The training process is deriving generative rules from the data rather than copying it.

anyway if you enforce a strict interpretation govt/rival states would still use it against us (propoganda, weapons, general purpose robots).

1
0
0

@walter4096 @ligniform This learning argument is fallacious at best. It’s not like it’s one human learning, and using that for themselves.

It’s an automated system doing that at a gigantic scale and built by a company for profit. Not comparable at all ;)

2
0
0

@thelinuxEXP @ligniform

"built by a company for profit", there's opensource AI models aswell, and we could crowdfund training runs.

stable diffusion on my PC could generate 10,000 images per day. you can give it sketches to add detail, more controlable

isn't it better if everyone has this multiplier (images, text, code, motion..)?

1
0
0

@walter4096 @ligniform No, not at the price of the hard work of artists.

0
0
0

@thelinuxEXP I understand that you aren't happy about them using such content but where do they violate licenses? Aren't they using material publicly available on the internet? Licenses maybe forbid to copy or distribute it, but to read it or learn from it? I don't think, that any license forbids that.

2
0
0

@duco The GPL says that all code built upon it needs to be GPL. I would argue all copilot generated code should thus be GPL.

Some licenses require attribution even for derivated works. No AI does any attribution.

0
0
0

@thelinuxEXP Maybe. You could make a pretty persuasive argument that LLM training is fair use, as it’s transformative.

There are also examples of society deciding that it’s important not to require an explicit individual license: https://en.wikipedia.org/wiki/Compulsory_license

Also worth pointing out, you can opt-out of LLM training with a simple robots.txt entry.

1
0
0

@bouncing Fairness use is a case by case thing, there is no blanket definition of it. So every generated result would have to be judged individually in relation to how transformative in regards to all the works it used :) Basically impossible

0
0
0

@thelinuxEXP @ligniform

We never had this automated remix abiilty in the past. It is a waste of human hands and minds to do things manually that a machine do.

This is a new reality. a PC can generate 10,000 new unique guided images per day.

In that world its not worth doing the same kind of 2D art. but art skills wont go away.

Real artists will get far more out of AI tools than me.
I look forward to their AI movies!

1
0
0

@walter4096 @ligniform I’m not discussing the usefulness.

But « it’s so useful and practicalĀ Ā» is not a good argument for appropriating all that content without thinking about the people who created it, who it belongs to, or its license. It was never an argument.

At that point, I could say it’s OK to steal a billionaire’s money because I would use it to solve world problems. That argument doesn’t work, usefulness doesn’t come before everything else.

1
0
0

@thelinuxEXP @ligniform

this theft argument ..
literally everything everyone does is influenced by traces of what they've seen.

star wars was patterned on 'the hidden fortress', elements of the standard "hero's journey", and a lore lucas wrote because he couldn't get the Flash Gordon license.

it's copyrighted (yes I can't sell x-wing fan art) but to say you can't train on it is silly when the elements all come from elsewhere

1
0
0

@walter4096 @ligniform No, it’s not silly at all. It’s absolutely logical and normal to say that it’s its own thing, even if it’s based on something else.

This is a completely weird argument to make. Yes, everything is based on something else, it doesn’t mean it has no intrinsic value and thus belongs to everyone??

0
0
0

@duco Basicslly « publicly availableĀ Ā» doesn’t mean free of charge or of restrictions to use.

YouTube videos are publicly available, yet you’re not allowed to download them, it breaches the ToS. I can find an image from Getty in Google search, doesn’t mean I can use it freely on my website ;)

0
0
0

@thelinuxEXP @ligniform

if you see the original work reproduced, you can complain.

its pointless fretting about this.

artists skills produce more if they go into 3D (zbrush sculpts), and into storyboarding.
i'd love to see movies of hyperion.. expanse seasons 7-9 .. star wars EU.. re-imaginings of Blakes 7, space 1999.. this can all happen in an AI world where 1man+$2000 can make a film (and $20 can do a 30 sec trailer to generate interest if they dont have $2000)

1
0
0

@walter4096 @ligniform I don’t understand at all this viewpoint, sorry.

I entirely disagree with the premise, and the result šŸ˜‚

0
0
0

@thelinuxEXP doesn’t most of this somewhat apply to search engines as well?

0
0
0