social.kernel.org

Conversation

K. Ryabitsev 🍁

I know I haven't been able to work on b4 and other tooling as much as I was hoping, but between the Equinix exodus, having to continuously mitigate against LLM bot DDoS'ing our infra, and just general geopolitical sh*t that lives rent-free in my head... it's been difficult. But I have high hopes and lots of good ideas -- that's got to count for something, right?

1

5

25

K. Ryabitsev 🍁

Reply to @monsieuricon

FYI, Drew isn't making it up in this article. At any given time, if you check what I'm doing, chances are I'm trying to figure out ways to deal with bots.

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html

4

49

43

algernon "CBOR" ludd

algernon@come-from.mad-scientist.club

Reply to @monsieuricon

@monsieuricon have you tried routing known bots to a garbage generator to contain them?

To my surprise, doing that had a massive impact on my infra, in a good way. Ratelimiting known bots, serving them a static page didn't work, they came back in disguise. Serving them small amounts of cheaply generated garbage in the form of an infinite maze seems to have satisfied them.

Doesn't catch all of them, but my backend suddenly sees a lot less traffic, about 4-5 million requests less a day - and I'm just a nobody, that's 99% of my traffic, which never reaches the backend.

The perpetrators are the same, though, so maybe something similar might help you, too?

1

0

0

marius

mariusor@metalhead.club

Reply to @monsieuricon

@monsieuricon can these methods be made public, or could that lead to the assholes becoming bigger assholes?

1

0

0

K. Ryabitsev 🍁

Reply to @algernon@come-from.mad-scientist.club

@algernon The gist of the problem is that it is impossible to identify "known bots." Yeah, there's a subset of requests that clearly identify themselves as "LLMWnatnotBot 1.x", but if you read Drew's article, the vast majority of traffic is one-two requests from random IPs with generic browser user-agents. There is no reliable way of telling them apart from legitimate requests. The only viable solution is to put everything behind CloudFlare or Fastly or Akamai and let them protect you against bot traffic, but *that is not a win*. That's capitulating and admitting that the open web has failed.

2

0

4

rails

rails@fosstodon.org

Reply to @monsieuricon

@monsieuricon But our investors and shareholders! You can't block us, think about our profit.

On a serious note: Is it possible to identify who the offending companies are? And somehow take the war to their turf?

1

0

0

algernon "CBOR" ludd

algernon@come-from.mad-scientist.club

Reply to @monsieuricon

@monsieuricon I read it, and that was my experience too (unidentifiable agents from random ips), while I was trying to block them, or serve them static content. The moment I started serving known agents infinite garbage, the unidentifiable random ips pretty much disappeared. That was surprising, but its been like that for two months straight now.

(I did have to special case Alibaba, because they never identified themselves, so I match them by range, which is a bit unfortunate, but sadly neccessary.)

0

0

0

K. Ryabitsev 🍁

Reply to @mariusor@metalhead.club

Edited 4 months ago

@mariusor Everything that used to work no longer does. 🤷 First, we rate-limited by IP, but they switched to using public cloud farms. Next, we banned based on user-agent, but they started using a generic user-agent. Then, we started banning on "the same" user agent per number of requests, but that never really worked very well, and they switched to varied user-agents. Next, we started banning whole subnets and ASNs, but they switched to using residential IPs. This is where we are now -- bots descend on your public resource from tens of thousands of IPs from all over the world, with reasonably recent, varied user-agents, with any one IP sending no more than 1-2 requests. It's clearly all bot traffic, because there's clearly nobody who is going to be suddenly interested in random commits from 5 years ago, or in random conversations on linux-fsdevel from 9 years ago, but it's impossible to turn this logic into a reliable "no, you are a bot, go away" action without turning to fronting services or various anti-bot captchas.

1

4

8

K. Ryabitsev 🍁

Reply to @rails@fosstodon.org

@rails There is not. There is, in fact, no reliable way to identify legitimate requests from bot traffic if you're only looking at logs or packets. The only way to reliably tell is by getting yourself into the page rendering client. E.g. this is what happens when you get CloudFlare's "prove you're not a bot" screen -- they use javascript to collect information about your browser and to watch the pointer behaviour to figure out if you're a bot or not (plus, massive amounts of data they have internally on your IP address).

1

0

0

Esgariot

esgariot@mstdn.social

Reply to @monsieuricon

@monsieuricon @rails would some combination of proof-of-work challenge like Xe’s Anubis, and, for other protocols (or for legitimate clients not using javascript), requiring authentication, be an acceptable tradeoff? I know it’s making the normal users pay the price for the bot traffic, but maybe it’s until it dies down?
(I have no technical expertise in the matter, so it may not make any sense)

1

0

0

K. Ryabitsev 🍁

Reply to @esgariot@mstdn.social

@esgariot @rails Yes, it would work, but would it be acceptable trade-off? That's not clear. Right now, I'm leaning towards setting up separate, authentication-required duplicates for some services that I can give to maintainers and developers, but that, again, is capitulating and admitting that the open web has failed.

1

0

2

Martin Roukala (né Peres)

mupuf@fosstodon.org

Reply to @monsieuricon

@monsieuricon @mariusor There is always the option to poison your pages with links invisible to users that would link crawlers to an infinite amount of random BS, no?

0

0

0

Kees Cook (old account)

kees@fosstodon.org

Reply to @monsieuricon

@monsieuricon @rails @esgariot At some point it feels like it will become admin-time effective to switch from IP blocklist to allowlist. Though, yes, it still gives up on the open web. :(

0

0

0

Clemens

neverpanic@chaos.social

Reply to @monsieuricon

@monsieuricon Same here, really. Fortunately it seems blocking Chrome and Firefox user agents that are seriously out of date by years seems to have addressed the problem for MacPorts for now, but who knows how long this will actually last.

Unfortunately this affects a few legitimate users on legacy systems, too, but that's a price I'm afraid I have to pay now.

0

0

1

Thomas Guyot-Sionnest

dermoth@noc.social

Reply to @monsieuricon

@monsieuricon @algernon what if you just add links in every page to slow them down? I have an instance of Nepenthes that served over 1M pages so far this month from a single link in a personal GH page with literally no traffic (and robots.txt disallow crawl).

Even if they still crawl the real pages as long as they fall for the tarpit URLs they'll waste way more time there, not counting the satisfaction of polluting their model.

1

0

0

Thomas Guyot-Sionnest

dermoth@noc.social

Reply to @dermoth@noc.social

@monsieuricon @algernon I suggested that on a thread about the same article...
https://noc.social/@dermoth/114189890006499596

0

0

0

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org