So after the @lwn post on being hammered by scrapers today, I ran an analysis on what I thought was a recent phenomenon: a query from what tries to pass as a browser from an IP address that does exactly *1* query in a 24 hour period. You can't filter an IP address that makes just one visit. Turns out this happens a lot, sometimes 250k unique single use addresses/day!
@bert_hubert @lwn
Is there any pattern to the addresses?
I heard some rogue crawlers use cheaply made "free to play" mobile game apps that mainly serve as bot platform to query from hard to block residential ip space.
@bert_hubert Ah yes. The so-called βresidential proxiesβ aka botnets. I wrote about them a while ago at https://jan.wildeboer.net/2025/04/Web-is-Broken-Botnet-Part-2/ @lwn @kevin
@bert_hubert maybe a lot of self-hosted rss readers? /j
@SolarDavy nope, nothing like that. They also check far more than once a day!
@bert_hubert yeah, it was a very bad joke (I added the joke modifier).
I mean, self-hosted rss is soo niche, I would love it if it wasn't π
@SolarDavy it is not as niche as you might think! I get a shitload of RSS queries!
@bert_hubert Same. Requests for feed.xml are in the thousands per day on my web server hosting my blog. Makes me feel good :) @SolarDavy
@jwildeboer @bert_hubert do you know if they're from self-hosted rss clients (for example miniflux)? Or more stuff like Feedly?
@SolarDavy Majority is other Mastodon servers, NetNewsWire, FreshRSS, Fever, Akregator. @bert_hubert