Conversation

Jonathan Corbet

So @lwn is currently under the heaviest scraper attack seen yet. It is a DDOS attack involving tens of thousands of addresses, and that is affecting the responsiveness of the site, unfortunately.

There are many things I would like to do with my time. Defending LWN from AI shitheads is rather far from the top of that list. I *really* don't want to put obstacles between LWN and its readers, but it may come to that.

(Another grumpy day, sorry)
14
210
229
Edited yesterday

@corbet @lwn this, combined with search engines prioritising the stolen content!
This is why I think the web is genuinely doomed. It's not enough to steal the content, for search engines to kill click thtoughs and ad revenues, they are literally killing the ability of original authors to serve the traffic to the few real users that might want to see it.
Devastating.

0
0
0

@corbet @lwn
As a avid longtime subscriber and reader, I can only give thanks and hope you will survive also this blast of willfully wrong behaviour. Thank you for your openness.

0
0
0

@corbet @lwn Any inkling which AI (Arsehole Incorporated) it is? The crash can't come soon enough.

1
0
0
@foxylad @lwn There is no way to know who is after the data. The actual attack is likely perpetrated by Bright Data or one of its equally vile competitors.
0
1
1

@corbet @lwn
Just speaking with my user hat on here, but given the circumstances I don't mind the ever-so-slight inconvenience of an challenge.

0
0
0

@corbet @lwn If you need help, email me. I can work with you in case there's low hanging fruit that you missed.

1
0
0

@corbet @lwn Obviously that sucks, but I am super happy with the RSS integration that I get with my lwn subscription. People who are affected by the outage should check that out. Not really a solution, but maybe part of one.

0
0
0

Ayush Agarwal (आयुष अग्रवाल)

@corbet @lwn I'm not sure how people in the kernel community reconcile using LLMs with the effect these LLMs have on small businesses and individuals hosting their websites for fun and it's not as if the kernel community itself isn't affected by these incessant DDoS attacks.

0
0
0

@corbet @lwn subscriber.lwn.net that is only available for subscribers. One can either join the que with AI bots for lwn.net or subscribe and enjoy the snappy subscriber server.

I mean that's not a great solution, but it's the only one that works.

1
0
0
Edited 19 hours ago

@corbet @lwn at this point we might as well be offensive. If the client seems even slightly sus, just send them gibberish data talking about how good Chihuahua muffins are. Ideally LLM-generated (yes, gross) because this doesn't add new information (linear algebra yay) and makes models collapse (aka AI inbreeding).

0
0
0

@corbet feel you. Same with my Podcast Directory

0
0
0

@cadey @corbet @lwn
I recently saw a traffic spike to a small HTML-only website that never had WP on it, but was suddenly getting failed wp-admin logins and hundreds of PHP vuln scans, non stop. All from MSFT IP addresses. Abuse reports were sent, but there was no response, and the abuse kept happening.
So now I'm blocking every MSFT CIDR block that I can find, server-wide.

0
0
0

@corbet @lwn I've been experiencing about 20x more website traffic than normal, myself. It's very likely this scraper bot traffic as well. Things are holding, but only because I took pains to use static site generation (absolutely minimal Javascript, designed to be lightweight).

0
0
0
@suihkulokki @lwn The problem with that solution is that it may well make it harder for us to bring in new subscribers, which is something we definitely want to do. First impressions matter, so giving new folks a poor experience seems ... not great.

It may yet come to that, though.
2
0
0

@corbet @lwn @suihkulokki

Maybe it doesn't need to be subscriber only, just registered users only? Which can also be a PITA, but if there's no enshittification for non-registered users other than the bandwidth being shared with bots, maybe it's tolerable? Could even have a banner about this explaining the benefits of registering, and how LWN won't sell your data.

1
0
0
@jani @lwn @suihkulokki Such things have crossed our minds, certainly. The gotcha there is that we've already had troubles with bots creating accounts; I don't think they would hesitate to do more of that if that would improve their access.

That and, of course, the fact that everybody starts as an unregistered user. As long as we can avoid making the experience worse for them, I think we should.
2
0
3

@corbet @lwn @suihkulokki

Yeah, it's hard to argue against that.

And maybe you weren't seeking for "helpful" advice anyway, but, uh, you know your audience. :)

1
0
0
@jani @lwn @suihkulokki Suggestions are much appreciated! It's not as if we've figured all this stuff out...
1
0
1

@corbet @lwn The "harder to onboard new users" part is certainly one reason why that solution isn't great

I just don't really see anything else working long term. Everything else is just kind whack-a-mole where the mole keeps getting more clever.

0
0
0

@corbet @lwn @jani @suihkulokki I have a simple solution: Stop being so damn relevant!!!

Wait... 🤡

1
0
0

@mupuf @corbet @lwn @suihkulokki

I don't think the scrapers care about that, though.

1
0
0

@jani @corbet @lwn @suihkulokki Sorry, I was being too optimistic... I was thinking they wanted sources with high SNR... But you are probably right...

0
0
0

@corbet @lwn @jani @suihkulokki one day the photocopiers will get busy after the office hours again, but this time it's going to be linux weekly news instead of the punk fanzines

0
0
0