Posts
1886
Following
223
Followers
2370
Director of Linux Foundation IT. Currently in charge of kernel.org infra.

This account is for Linux/Kernel/FOSS topics in general: #linux, #kernel, #foss, #git, #sysadmin, #infrastructure.

For my personal account, please follow @monsieuricon@castoranxieux.ca.

MontrΓ©al, QuΓ©bec, Canada πŸ‡¨πŸ‡¦πŸ‡ΊπŸ‡¦

K. Ryabitsev 🍁

I'm aware of Anubis and I'm afraid proof-of-work intermediaries are going to become the only way to deal with bots.

However, I don't like Anubis's general approach. I would prefer to have something built into varnish with some more logic that allows for more nuance. If there is a local cached page, allow the request. If there isn't, but the load/RAM usage is low, let the request through. If the load is high or if we're seeing lots of 503's, only then require proof-of-work.
3
3
13

FOSS infrastructure is under attack by AI companies https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ Please boost for awareness, reach and to public shame Microsoft, Meta, OpenAI, Perplexity and other such AI companies.

10
19
1

K. Ryabitsev 🍁

In good news, I figured out what needed to happen so we don't share the same /64 with all other Linode systems in the same datacentre, which gets @spamhaus off our back.
0
0
4
That's why I keep this server: for bitching about life and multilingual dad jokes that only 1-2 people following me would get.
0
0
6

K. Ryabitsev 🍁

All Norwegian birds look fugl-y.
1
0
2
@esgariot @rails Yes, it would work, but would it be acceptable trade-off? That's not clear. Right now, I'm leaning towards setting up separate, authentication-required duplicates for some services that I can give to maintainers and developers, but that, again, is capitulating and admitting that the open web has failed.
1
0
2
@rails There is not. There is, in fact, no reliable way to identify legitimate requests from bot traffic if you're only looking at logs or packets. The only way to reliably tell is by getting yourself into the page rendering client. E.g. this is what happens when you get CloudFlare's "prove you're not a bot" screen -- they use javascript to collect information about your browser and to watch the pointer behaviour to figure out if you're a bot or not (plus, massive amounts of data they have internally on your IP address).
1
0
0
@mariusor Everything that used to work no longer does. 🀷 First, we rate-limited by IP, but they switched to using public cloud farms. Next, we banned based on user-agent, but they started using a generic user-agent. Then, we started banning on "the same" user agent per number of requests, but that never really worked very well, and they switched to varied user-agents. Next, we started banning whole subnets and ASNs, but they switched to using residential IPs. This is where we are now -- bots descend on your public resource from tens of thousands of IPs from all over the world, with reasonably recent, varied user-agents, with any one IP sending no more than 1-2 requests. It's clearly all bot traffic, because there's clearly nobody who is going to be suddenly interested in random commits from 5 years ago, or in random conversations on linux-fsdevel from 9 years ago, but it's impossible to turn this logic into a reliable "no, you are a bot, go away" action without turning to fronting services or various anti-bot captchas.
1
4
8
@algernon The gist of the problem is that it is impossible to identify "known bots." Yeah, there's a subset of requests that clearly identify themselves as "LLMWnatnotBot 1.x", but if you read Drew's article, the vast majority of traffic is one-two requests from random IPs with generic browser user-agents. There is no reliable way of telling them apart from legitimate requests. The only viable solution is to put everything behind CloudFlare or Fastly or Akamai and let them protect you against bot traffic, but *that is not a win*. That's capitulating and admitting that the open web has failed.
2
0
4
FYI, Drew isn't making it up in this article. At any given time, if you check what I'm doing, chances are I'm trying to figure out ways to deal with bots.

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html
4
51
45

K. Ryabitsev 🍁

I know I haven't been able to work on b4 and other tooling as much as I was hoping, but between the Equinix exodus, having to continuously mitigate against LLM bot DDoS'ing our infra, and just general geopolitical sh*t that lives rent-free in my head... it's been difficult. But I have high hopes and lots of good ideas -- that's got to count for something, right?
1
6
25

Supply Chain Attacks on Linux distributions (Fenrisk)

https://lwn.net/Articles/1014741/

0
1
0
Please stop externalizing your costs directly into my face

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html

"Whether it’s cryptocurrency scammers mining with FOSS compute resources or Google engineers too lazy to design their software properly or Silicon Valley ripping off all the data they can get their hands on at everyone else’s expense… I am sick and tired of having all of these costs externalized directly into my fucking face. Do something productive for society or get the hell away from my servers"
0
4
0
@swapgs This talk may help -- it's about things we've thought about. https://www.youtube.com/watch?v=K3SVt1WCheY
0
3
2
@swapgs @1ns0mn1h4ck Free infra assessment? Yes please. Just give me a heads-up first. :)
1
0
4

K. Ryabitsev 🍁

Donald Trump proudly demonstrates all the nothingburgers he got from Russia during his call.
0
2
10

LLM crawlers are aggressively destroying important community infrastructures but sadly there is not an easy fix. Still: Blocking those crawlers should be high on your list of todos

(Original title: LLM crawlers continue to DDoS SourceHut)

https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/

2
7
0

K. Ryabitsev 🍁

Other than FSF Europe, what other free software nonprofits are there that I can send people to if they don't want to contribute to a US-based entity?
10
9
11
Show older