Conversation

Jonathan Corbet

For the curious, today's scraperbot attack on @lwn has run to well over 800,000 unique IP addresses in the last few hours.

We've made some tweaks that are holding it off for now, but it is ongoing and could go bad again at any time.

If you are a real user and are being turned away by the site, could you let me know what your user agent is?
12
76
62

@corbet @lwn Er, Firefox on MacOSX (work laptop I hasten to add).

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:147.0) Gecko/20100101 Firefox/147.0

0
0
0

@corbet @lwn Ugh, that's really rough. :(

0
0
0

@corbet @lwn "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.2 Safari/605.1.15"

0
0
0
@corbet @lwn "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36"
0
0
0

@corbet @lwn despite the attack your site is blazingly fast. Kudos!

0
0
0

I notice that few hours ago access to lwn timed out.

My user agent is: "Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0"

0
0
0

@corbet @lwn Latest firefox on linux, whatever that string is. Did not get an error, it was slow so I closed the tab.

0
0
0

@corbet @lwn I did not get an error, but the site took so long to open, I went to my private, highly unauthorized, personal archive of LWN weekly issues to grep for the name of a person I was trying to recall.

I'm glad to hear your technical counter measures are helping, even if I was impatient.

1
0
0
@liw @lwn Ah, so you are part of the scraper problem :)

Seriously, though, our content is CC-licensed once it escapes the paywall, so your archive is entirely authorized in truth.

Countermeasures are helping for the moment; I do not expect it to be a long-lasting thing.

Closing in on 1M unique IPs this morning. The net is broken.
1
0
3

@corbet @lwn Not scraping as much as opening the weekly issue every week and saving it to a file with a Firefox extension (Save Page WE), and then putting that in a Git repository.

I admit I went back through the archives to backfill my collection.

Why? I wanted to count how many times you've quote me. I'm that vain.

1
0
0

@corbet @lwn There's one issue where I'm mentioned in an HTML comment. That was a surprise.

0
0
0

@corbet getting a 403 with iPad:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.2 Safari/605.1.15

0
0
0

@corbet @lwn I typically read LWN through my RSS feed (love that LWN provides this service, btw, thank you!). Looks like the user agent is: “Mozilla/5.0 (compatible; Miniflux/2.2.16; +https://miniflux.app)”. Dug this up in the miniflux source code on GitHub. The “2.2.16” string may change depending on the miniflux version. Haven’t had any issues accessing content myself, but providing this info in case it helps others. Sorry about all the scraper attacks :(

0
0
0
@lwn We're up to nearly 1.2M IPs having attacked our server today. For now we've been able to make some changes and the situation appears to have stabilized; apologies to everybody who was blocked out of the site while this was going on.
1
6
21

@corbet @lwn thank you for doing this. I'm sorry that you have to.

0
0
0

@corbet @lwn I just read LWN and noticed nothing out of the ordinary: kudos!

0
0
0