Conversation

Stack Overflow announced that they are partnering with OpenAI, so I tried to delete my highest-rated answers.

Stack Overflow does not let you delete questions that have accepted answers and many upvotes because it would remove knowledge from the community.

So instead I changed my highest-rated answers to a protest message.

Within an hour mods had changed the questions back and suspended my account for 7 days.

8
44
2

I'm requesting that my questions and answers be permanently deleted under GDPR.

2
4
1

It's just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you.

4
3
0

@ben (and unfortunately the fediverse)

1
0
0

@ben Feels like the Enclosures (Tragedy of the Commons).

0
0
0

@ben Stack Overflow has already been monetizing your answers with ads for years. If “used for profit” is your main complaint, you’re a little late.

1
0
0

@mighty_orbot @ben @mighty_orbot @ben The argument isn't about profit, which is pretty clearly outlined. OpenAI's explicit and ultimate intent is to replace people and in the meantime it's spitting out garbage information.

1
0
0

@andrewfelix @mighty_orbot @ben
And their software is laundering the original source of the information from which their AI training data was derived. Doesn't the original author deserve some credit for when ChatGPT regurgitates a lossy paraphrasing of a post scraped from the Internet?

1
1
0
@ben Play stupid games, win stupid prices. Why does everyone believe that sabotaging LLM development is cool?
2
0
0
@vbabka @ben Its not. Using LLM to answer questions might not be good idea, but they should work rather well at translations, including translations between programming languates.
1
0
0
@pavel @ben translations are fine but not so sure about the programming languages part. Also, disagreement about using one's own content (created before LLMs took off) for LLM training is not the same thing as sabotaging, IMHO.
1
0
2
@vbabka @pavel @ben hint: LLMs have no understanding of anything, so absolutely aren't suited to programming since they'll hallucinate in (often) subtle ways that fits the syntax and people are notoriously bad at picking up on it.

Also they still work without credit/license etc. The fact they appear to work for a lot of programming situations makes it even more dangerous.

It'd be one thing if people were just using them but acknowledging their limitations, it's quite another in a world where people openly lie about their capabilities.

Totally and completely appropriate to not want your work part of it.
1
1
3

@ben They’re not yours, they’re theirs. Jeff Atwood thanks you for your free labour. (I’m kidding, he doesn’t. Feel grateful he even allowed you to contribute in the first place, serf.)

Speaking of Jeff Atwood, isn’t he the guy helping fund Mastodon now? 🤔

0
1
0

@ben please do, you're awesome! ❤️

0
0
0

@felipe @ben
Particularly your carefully crafted ALT tags.

1
0
0

@Orb2069 @felipe @ben This is something I've thought about ever since I started here. It's great that people here take their time to make the web better for disabled people.

But unfortunately, high-quality image descriptions are a gift to AI companies training text-to-image models. There is no act of altruism these assholes will not exploit.

3
1
0

@datarama @Orb2069 @felipe @ben

Maybe I should add extra stuff to ALT text that would be confusing to AI but amusing to the reader. I'm thinking along the lines of XKCD -- can't imagine how a generative AI trained only on xkcd Alt tags would respond to prompting

2
0
0

@datarama @Orb2069 @felipe @ben

In a perfect world, AI could be used to describe images to vision impaired people.
The real wrong isn't the AI itself, but that its owners use it only for selfish gains.

Kind of like GMOs, we could use them to feed more people for less but Monsanto only uses them to gouge farmers.

2
0
0

@bornach No you should not. This unfortunately is really a situation where doing the right thing is self-sabotage.

You can poinson the image using nightshade, but I would not count on it's effectiveness.

0
0
0

@bornach @datarama @felipe @ben

... That's what I do. Wax poetic or oblique instead of just flatly describing the depicted.

As far as moral obligations, this is social media, not a debfibulator pack interface. Nobody is going to die because they can't tell what your cat is doing in the picture..

0
0
0

@Phosphenes @datarama @felipe @ben
In a perfect world, AI wouldn't "hallucinate" (PR spin/flavor on just being wrong ), and might be useful for that sort of thing.

(Btw: Meta already does this, but their alt tags consist of something like " Image may contain <object>, <text>, <object>" - the data exists because they have to run image analysis for automated moderation anyways - they surface it because it satisfies ADA requirements )

1
0
0

@Orb2069 @Phosphenes @datarama @felipe @ben If the OCR worked better, I’d be able to (probably) tell exactly what that is. Woman/cat/salad with two lines of text is absolutely the Woman Yelling At Cat meme format. It would require some prior knowledge though.
On the flip side, you’d think they would run images through a reverse image search and tag hits on meme templates. I get hits which for it which have the title of the meme in text

0
0
0
@ljs @vbabka @ben Hint: try it. It saved work for me.
2
0
0
@pavel @ben @vbabka sigh you're disappointing me man.

But like all LLM proponents (just like all crypto guys I spoke to before, just like all anti vax guys I spoke to before, just like all [insert religious-style belief] proponents I spoke to before) you won't actually rebut what I say, you'll just assume that 'I don't get it' on some level.

I have tried LLMs dude, thanks for patronising me by assuming I haven't.

Unfollow.
2
0
1

@ben All of this is not going to end well.

0
0
0

@ben I mean

user contributions licensed under CC BY-SA

I'm not a lawyer, but I don't think you can do anything about it, they're technically hosting a copy of your content with attribution to you, which doesn't make you an owner of the data, in particular this clause:

Adapt — remix, transform, and build upon the material for any purpose, even commercially.

gives them right to fuck their userbase in the ass by using the data in other services

1
0
1

@13xforever They're then selling that data to OpenAI which does not abide by this license. I'm not getting attribution there, and they're not licensing it as CC-SA-BY whch is required.

https://m.benui.ca/@ben/112401140834395509

0
0
0

@ben@m.benui.ca Chaotic evil: send in an anti-circumvention DMCA notice for each question. Those have no process for disputing, so they will probably just delete your content and ban you, because it is easier.

0
0
1

@ben@m.benui.ca The enshittification will continue until the morale improves.

0
0
1

Thank you for the replies. As someone pointed out, anything posted on Stack Overflow is covered by CC BY-SA 4.0.

Under this license all usage must attribute the author and must have a similar license. Neither of which OpenAI fulfills.

5
1
0

@ben i haven't read their tos but are you sure that it doesn't include licensing whatever you say to stackoverflow? the last paragraph of the page you shared seems to allude to that
i mean, it's still immoral as heck but i guess that's one of the reasons we're all here instead of on a centralized content farm

0
0
0

@ben if only there were a word for taking things you don't own. 🤔

Gosh it would make talking about gen AI easier if we had a word for that. 🤔

0
0
0

@datarama @Orb2069 @felipe @ben

It's much better if people stop trying to fight the AI companies, and focused on making AI available for everyone.

3
0
0

@AeonCypher @datarama @felipe @ben

Please, mr. Reply guy, tell me about the inevitability of AI.

When you're done, explain to me how you reliably achieve +95% accuracy on k-fold validation without undetectable overfitting - my prof never could provide a simple answer, and 1-out-of-20 seems like really not good odds for a new god.

2
0
1

@Orb2069 @datarama @felipe @ben

What a strange non-sequitor.
I wonder if you're actually trying to understand something, or if I should simply block you.

0
0
0

Also that CC claims that training an AI on data is "fair use". So fuck Creative Commons I guess.
https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/

3
3
0

@ben : all "" is , otherwise you'd be a perpetual to ]whoever made your schoolbooks and created whatever media you ever consumed](
http://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/ ) !

0
0
0

@ben I'm longing for a new set of free (as in beer) software and creative licenses that prevent all this garbage.

I put my software out there so other people can use it, I'm even ok if they make money out of it. But I'm not ok with my work being swallowed by a big machine so that people can print money without even knowing it exists at all.

0
0
0
@ljs @ben @vbabka Well, your arguments were a bit disappointing, too. LMs are useful for trivial tasks, and for easy tasks where you can verify the result. I do both kinds of tasks from time to time.
1
0
0
@pavel @ben @vbabka the ones so disappointing you entirely ignored them (because I guess it's beneath you to rebut them) and just said 'try it' as if I hadn't?

LLMs have uses, I disagree with their use for tasks like programming for the reasons previously stated that you ignored so not going to repeat.
1
0
0
@Orb2069 @AeonCypher @datarama @felipe @ben just trust them, everything will work in the next version!

Don't worry people who stand to make 100's of billions of dollars like Sam Altman say LLMs and deep learning can do things they emphatically cannot because they're just like altruistic or something.
1
0
0

@ljs @ben @pavel @vbabka LLMs often turn one type of work (create) into another type of work (review), consuming lots of energy in the process. For some people, it may be worth it (although if they had to pay the full costs of LLMs, humans might still be cheaper).

1
0
2
@ptesarik @ben @pavel @vbabka the big problem is that people are very very bad at picking up on the kind of errors that an algorithm can generate.

We all implicitly assume errors are 'human shaped' i.e. the kind of errors a human would make.

An LLM can have a very good grasp of the syntax but then interpolates results in effect randomly as the missing component is a dynamic understanding of the system.

As a result, they can introduce very very subtle bugs that'll still compile/run etc.

People are also incredibly bad at assessing how much cost this incurs in practice.

Having something that can generate such errors for only trivial tasks strikes me as being worse than having nothing at all.

And the ongoing 'emperor's new clothes' issues with LLMs is this issue is insoluble. Hallucination is an unavoidable part of how they work.

The whole machinery of the thing is trying to infer patterns from a dataset, so at a fundamental level it's broken by design.

That's before we get on to the fact it's needs human input to work (you start putting LLM generated input in it completely collapses), so the whole thing couldn't work anyway on any long term scale.

That's before we get on to the fact it steals software and ignores license, the carbon costs and monetary costs of compute, and a myriad of other problems...

The whole problem with all this is it's a very very convincing magic trick and works so well that people are blinded to its flaws.

See https://en.wikipedia.org/wiki/ELIZA_effect?useskin=vector
0
5
11

@ljs @ben @pavel @vbabka first they were anti pdp-11 now they are anti vax and next you'll see them get anti alpha to the point they'll start removing support for old alpha processors

2
0
2
@lkundrak @ben @pavel @vbabka first they came for the pdp-11 and I said nothing...
1
0
3

@pavel @ben @ljs @vbabka people said this about heroin too

0
0
2

@pavel @ben i don't think they were sabotaging anything? nobody minds stackoverflow training their models on their own. they just chose not to help them because the conditions were not fair (the original author not having rights to the derived work)

0
0
1

@ljs @ben @pavel @vbabka i'd also come for a pdp-11 if i lived in the u.s. and had a place for it

0
0
2

@lkundrak @ljs @ben @pavel @vbabka alpha considered harmful; if male, a gender stereotype even

0
0
2

@ljs @datarama @Orb2069 @ben @felipe

Are you saying to trust me? I'm not a 'him'.

I'm quite strongly against OpenAI. What you are saying is quite the opposite of what I said.

The comment above continues to be an irrelevancy. A strung together set of jargonizations.

No one builds LLMs with k-fold validation. OpenAIs models are, likely intentionally, overfit. Which is why they are full of exact copies of data.

However, again, whatever you two think you're arguing against it's not related to a position I hold.

1
0
0

@OddDev @datarama @Orb2069 @felipe @ben
Wow, third time in this conversation I've had someone use a typically male gendered word to refer to me.

Keep it up.

0
0
0

@tdr @kkarhan @ben Perhaps I can clarify, as I wrote the article. § 44b UrhG is the German transposition of Art. 4 DSM copyright directive, which I cover in the article: “Since the EU Copyright Directive of 2019, … where commercial uses are concerned, rightsholders who do not want their copyright-protected works to be scraped for data mining must opt-out in machine-readable form”, so although Germany had not adopted §44b yet, the article takes it into account.

0
0
0

@Phosphenes

Alas it is always the Luddite question is it not?
Ask not what the machine does but to whom and for who's benefit?
AI should be creating a better future for the benefit of all, and mostly for those of dire needs. Instead it reaps the benefits for the fat cats above, and indulges in the of our reality.
An you've wrote "in a perfect world" - I don't think this should be considered in such terms. That should be our normal one.

@datarama @Orb2069 @felipe @ben

0
1
0

@kylotan @tdr @kkarhan @ben Correct. That is a problem I’m working on right now. I wouldn’t say it’s deliberately weak, just that implementing and enforcing new regulation takes time and all things considered, this is still a new ruleset.

0
0
0

@ben OpenAI are thieves - everybody knows - just jail 'em

0
0
0
@ben I sincerely hope the same doesn't happen to Wikipedia though. Also, I hope you backed your answers somewhere else, it's understandable if you don't want Stack Overflow to have them, but they should be available elsewhere.
0
0
0

@Orb2069 @ljs @datarama @ben @felipe

Are you accusing me of being a bot. Kindly go fuck yourself.

I actually work with the technology and actively work _against_ the corporate powers trying to monopolize it.

You on the other hand are spewing jargon you do not understand in order to look smart, and fearmongering about something you know nothing about.

1
0
0

@AeonCypher @ljs @datarama @ben @felipe

What a strange non-sequitor.
I wonder if you're actually trying to understand something, or if I should simply block you.

1
0
0

@Orb2069 @ljs @datarama @ben @felipe

Oh, so you're the bot...
It's the only explanation for a verbatim response like this.

0
0
0

@artemis @datarama @Orb2069 @felipe @ben
AI has a _projected_ energy consumption problem. This is the problem of and . Not a problem of as a technology.
You can run a Llama 3 model on a modern consumer graphics card.

0
0
0