Posts
510
Following
36
Followers
2204
@osm_tech You are definitely not alone: https://lwn.net/Articles/1008897/ The situation is not sustainable but I'm not sure what we do about it beyond waiting for the AI bubble to burst.
1
4
15

K. Ryabitsev-Prime 🍁

The Open Source Summit North America is in May this year, in the lovely Minneapolis, where nothing is happening. Nosiree, nothing that would want a bunch of people think twice about attending.
6
7
21
@gael @lwn https://lwn.net/Articles/1008897/ was written about a year ago, but still pretty much describes the situation.
0
0
1
@dfs_comedy @lwn Take a look at Bright Data's web site sometime. They advertise "automatically avoid anti-bot measures and CAPTCHAs", and "150M+ diverse IPs from real user devices". But be happy because those IPs are "100% ethically-sourced".

They aren't the only ones, and others are surely less overt about what their business is. But it would be a place to start.
1
2
4

Jonathan Corbet

As of the last count, @lwn has been hit by 1.6 million unique IP addresses since yesterday morning. We have managed to stabilize the site against that level of attack, but it is still annoying.

If only we could get them all to subscribe.

I do find myself wondering if there isn't material for a good class-action lawsuit here. We are far from the only ones having to cope with this crap. I'm not normally much of a fan of the US class-action lawsuit machine, but extracting money from the Bright Datas of the world to make some lawyers richer doesn't sound like an entirely bad proposition.
6
28
58
@lwn We're up to nearly 1.2M IPs having attacked our server today. For now we've been able to make some changes and the situation appears to have stabilized; apologies to everybody who was blocked out of the site while this was going on.
1
9
28
@liw @lwn Ah, so you are part of the scraper problem :)

Seriously, though, our content is CC-licensed once it escapes the paywall, so your archive is entirely authorized in truth.

Countermeasures are helping for the moment; I do not expect it to be a long-lasting thing.

Closing in on 1M unique IPs this morning. The net is broken.
0
0
3

Jonathan Corbet

For the curious, today's scraperbot attack on @lwn has run to well over 800,000 unique IP addresses in the last few hours.

We've made some tweaks that are holding it off for now, but it is ongoing and could go bad again at any time.

If you are a real user and are being turned away by the site, could you let me know what your user agent is?
12
67
74
@dmarti @jzb That is indeed an interesting thought. Of course, there's more than just Bright Data out there... Another idea might be an app you could put on a phone that would tell you how much your device is being used to attack others.
1
0
2
@trademark CPU primarily when things get really crazy. More CPU is easily arranged, of course, but it is irritating as hell to have to pay for that to feed our hard-written articles to those people.
1
0
0
@trademark Making things worse for real users is something we have gone far out of our way to avoid. I'm not sure that sharding in that way would help much, though; cache isn't really the problem.
1
0
0
@jani @lwn @suihkulokki Suggestions are much appreciated! It's not as if we've figured all this stuff out...
2
0
2
@jani @lwn @suihkulokki Such things have crossed our minds, certainly. The gotcha there is that we've already had troubles with bots creating accounts; I don't think they would hesitate to do more of that if that would improve their access.

That and, of course, the fact that everybody starts as an unregistered user. As long as we can avoid making the experience worse for them, I think we should.
3
0
4
@suihkulokki @lwn The problem with that solution is that it may well make it harder for us to bring in new subscribers, which is something we definitely want to do. First impressions matter, so giving new folks a poor experience seems ... not great.

It may yet come to that, though.
1
0
1
@bert_hubert @lwn Today's attack on LWN was a good 250K addresses. Gotta download all those articles from 2010, just in case they changed somehow...

Something has to be done about this, but I sure don't know what. They are using other people's devices, so they don't really care about burning some CPU time on Anubis challenges - and they have evidently learned to do that.

Sometimes I think we need to just toss the net and start over.
0
1
5
@foxylad @lwn There is no way to know who is after the data. The actual attack is likely perpetrated by Bright Data or one of its equally vile competitors.
0
1
1

Jonathan Corbet

So @lwn is currently under the heaviest scraper attack seen yet. It is a DDOS attack involving tens of thousands of addresses, and that is affecting the responsiveness of the site, unfortunately.

There are many things I would like to do with my time. Defending LWN from AI shitheads is rather far from the top of that list. I *really* don't want to put obstacles between LWN and its readers, but it may come to that.

(Another grumpy day, sorry)
13
204
255

Jonathan Corbet

So somebody took a couple of LWN's recent conference articles, threw them into an LLM blender with some other stuff, and produced ... this ...

https://www.webpronews.com/linux-kernels-future-tab-integrates-rust-navigates-ai-boosts-collaboration/

Google News propagates that stuff - something they have long refused to do with LWN's original material. But somehow we're supposed to continue to exist to feed material into that machine?

Sorry, having a grumpy day.
7
35
53
@ljs Dunno, you have to get past the docs maintainer ... but more to the point, why is he calling you a toady? :)
1
0
4
Show older