Conversation

Jonathan Corbet

The @lwn web site is currently under the most intense scraper attack I have seen yet. 1.3M unique IP addresses within the last couple of hours, and it's not done yet. The work we have done on defenses appears to be paying off, though; the server is holding up reasonably well — so far.

...just in case anybody wonders why I have a rather dim view of the whole AI industry...
13
256
328

@corbet @lwn

> ...just in case anybody wonders why I have a rather dim view of the whole AI industry...

Me too. Because (and this is just another instance of it), the greed of the few destroy life as it was for the many.

0
0
0

@corbet @lwn if you need more capacity, give a shout, I've a bit i can throw to the cause if you need it

0
0
0

@corbet @lwn same on @blenderartists right now with up to 3.7M requests/hour. Looks more like a DDoS in our case than AI scraping though (which I suspected at first)

1
0
0

@corbet @lwn I hate sites with only a paywall. I like open or your “open later” model far better.

But I’m surprised more sites haven’t gone paywall only at this point. How is anyone supposed to survive this?

(Longtime subscriber 👍)

0
0
0

@corbet @lwn
What's the user agent if the scrapers?
Are you using any captcha or cloud armor?

2
0
0

@corbet @lwn ouch - fingers crossed for you all!

I wonder if this is related to weather.gov going down earlier (and forecast.weather.gov still being down), or just coincidence?

0
0
0

Nick Silkey 📻 N5ILK 🪓🪵 🪣💧

@corbet thank you and the @lwn team for your service. ✌️💙

0
0
0

@corbet @lwn A few months ago one of my clients' sites was getting 5000 requests per second, about 10-20 requests from a single IP, all residential. Server held up quite well until /var filled up because the logs didn't rotate fast enough (daily rotation, /var is 16 GB and normally has around 13 GB free).

0
0
0

@StompyRobot on my site it’s 90% residential proxies that camouflage as some Chrome. Not easy to block. Some pass simple automated challenges. @corbet @lwn

0
0
0

@corbet @lwn Website operators should be especially vocal about this, because too many people who have a positive view of AI companies have no idea..

0
1
0
@StompyRobot @lwn User agent is whatever random fiction they choose to put in there; there is no useful signal there. We really don't want to inflict captchas or cloudflare or any of that onto our readers, so we've had to find other ways to defend the site.
0
0
6
@corbet@social.kernel.org @lwn@lwn.net Amazing. I'd be interested in a post describing what you folks did, lessons learned, etc. (Assuming one doesn't exist already!)
0
0
0

@BartV @corbet @lwn @blenderartists
Who would have a motive to DDoS LWN or Blender? Other than Microsoft and Adobe, of course.

Most likely those IPs are from residential proxies so you can't do an easy filtering rule like "Block all IPs in AWS/GCP/Azure address spaces". There were revelations last week than half of all Smart TV apps include residential proxy SDKs.

0
0
0

@corbet @lwn I am in the process of rolling out the next major version of my WAF and I've connected to the Abuse IP DB, which I now use to short circuit all the rest of the tests if the score is >=75. It's killing about 95% of the incoming traffic, and the WAF is getting about 95% of the rest (largely through ASN-wide blocks; host an AI scraper and you're dead to me unless you're whitelisted.)

0
0
0

@corbet @lwn
I had these few months back on my sites. Had to send them to hell by blocking out.

0
0
0