Posts
325
Following
28
Followers
1610

Jonathan Corbet

So I guess I'm famous now :)

https://www.heise.de/en/news/AI-bots-paralyze-Linux-news-site-and-others-10252162.html

To be clear, LWN has never "crashed" as a result of this onslaught. We'll not talk about what happened after I pushed up some code trying to address it...

Most seriously, though: I'm surprised that this situation is surprising to anybody at this point. This is a net-wide problem, it surely is not limited to free-software-oriented sites. But if the problem is starting to get wider attention, that is fine with me...
3
32
54

Jonathan Corbet

A followup for folks who are curious about the whole AI botswarm problem...

Some of these bots are clearly running on a bunch of machines on the same net. I have been able to reduce the traffic significantly by treating everything as a class-C net and doing subnet-level throttling. That and simply blocking a couple of them.

But that leaves a lot of traffic with an interesting characteristic: there are millions of obvious bot hits (following a pattern through the site, for example) that all come from a different IP. An access log with 9M lines as over 1M IP addresses, and few of them appear more than about three times.

So these things are running on widely distributed botnets, likely on compromised computers, and they are doing their best to evade any sort of recognition or throttling. I don't think that any sort of throttling or database of known-bot IPs is going to help here...not quite sure what to do about it.

What a world we have made for ourselves...
11
44
51
@daniel @LWN The problem with restricting reading to logged-in people is that it will surely interfere with our long-term goal to have the entire world reading LWN. We really don't want to put roadblocks in front of the people we are trying to reach.
0
0
3
@DamonHD @kevin So how does Enphase cut off access to a local resource like that? Have they said why such a thing would happen?
1
0
1
@AndresFreundTec @LWN Yes, a lot of really silly traffic. About 1/3 of it results in redirects from bots hitting port 80; you don't see them coming back with TLS, they just keep pounding their head against the same wall.

It is weird; somebody has clearly put some thought into creating a distributed source of traffic that avoid tripping the per-IP circuit breakers. But the rest of it is brainless.
3
0
3
@RonnyAdsetts @LWN The user agent field is pure fiction for most of this traffic.
0
0
2
@adelie @LWN Blocking a subnet is not hard; the harder part is figuring out *which* subnets without just blocking huge parts of the net as a whole.
2
0
1
@gme I assume you're referring to https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/ ?

It would appear to force readers to enable JavaScript, which we don't want to do. Plus it requires running all of our readers through cloudflare, of course...and I suspect that the "free tier" is designed to exclude sites like ours. So probably not a solution for us, but it could well work for others.
1
0
2
@bignose @LWN We have gone far out of our way to never require JavaScript to read LWN; we're not going back on that now.
0
2
11
@johnefrancis @LWN Something like nepenthes (https://zadzmo.org/code/nepenthes/) has crossed my mind; it has its own risks, though. We had a suggestion internally to detect bots and only feed them text suggesting that the solution to every world problem is to buy a subscription to LWN. Tempting.
5
3
37
@beasts @LWN We are indeed seeing that sort of pattern; each IP stays below the thresholds for our existing circuit breakers, but the overload is overwhelming. Any kind of active defense is going to have to figure out how to block subnets rather than individual addresses, and even that may not do the trick.
3
1
3

Jonathan Corbet

Should you be wondering why @LWN #LWN is occasionally sluggish... since the new year, the DDOS onslaughts from AI-scraper bots has picked up considerably. Only a small fraction of our traffic is serving actual human readers at this point. At times, some bot decides to hit us from hundreds of IP addresses at once, clogging the works. They don't identify themselves as bots, and robots.txt is the only thing they *don't* read off the site.

This is beyond unsustainable. We are going to have to put time into deploying some sort of active defenses just to keep the site online. I think I'd even rather be writing about accounting systems than dealing with this crap. And it's not just us, of course; this behavior is going to wreck the net even more than it's already wrecked.

Happy new year :)
45
449
355

Jonathan Corbet

So is there anybody out there who can explain this image?

I bought this card in Korea some years ago after having seen this theme - a tiger and a rabbit seemingly getting stoned together - in a number of places. There must be a story behind it, but my meager search skills have never managed to turn it up. I do still love the image, though...
14
24
31
@selje Enphase info is here:

https://enphase.com/support/sunpower

They informed me that a replacement system would be $700, seemingly including installation. It'll be a little while before I can generate enthusiasm for spending that money, certainly...

Some new form of SunPower resurrecting the current hardware would be nice. I'd say that the chances of them making it work again without demanding more money are pretty small, though. Such is the world we live in - we only *think* we own that device...
1
1
2
@selje For the most part, I followed these instructions here:

https://starreveld.com/PVS6%20Access%20and%20API.pdf

Rather than putting an rPi system in the box, though, I just ran the Ethernet cable to a system I had with both wireless and wired interfaces; the WiFi sits on the home net, while the wired interface does DHCP to get an address from the SunPower box, then polls it to get the data out.

Once that was set up, getting it into Home Assistant was mostly a matter of installing the integration. Figuring out which power signals belonged to which panel took a while; if you don't have it yet, use the SunPower app to make a map of the serial number for each panel and its location.

I'm debating whether to stick with this system, or to take up Enphase on its offer and swap out the SunPower box entirely. The Enphase monitor would be a supported product, and it seemingly has much better Home Assistant support.
2
2
2
@Jesse That was yesterday's data. Just about the low point for the year (not counting the days when the panels are covered with snow, of course).
1
0
1

Jonathan Corbet

Two years ago, I installed solar panels on the roof, and was rewarded with enough power to run the house, charge the car, and even run the heat pump for much of the year.

Another reward was the SunPower monitoring system that lets us track the performance of the system and see how each individual panel is working. Naturally, this system only delivers its data to some proprietary cloud system run by SunPower. Just as naturally, SunPower has gone bankrupt, and the monitoring system is now just a useless brick sitting on the wall.

...or at least it would be, had I not gone through the effort of integrating it with Home Assistant — a mildly difficult task involving hooking into a maintenance port on the device itself. So now I have the data out of the monitoring box stored on a local system, under my control, and I don't need to go scrambling for alternatives. I can obsess over my post-solstice data, waiting for production to reach decent levels again — that happens faster if I stare at it, I'm convinced.

Maybe there's something to this free software idea after all.
15
191
376
@jani @neil Indeed, we have been doing LWN's accounting locally with GnuCash for the last two years now, and I've never looked back. The OFX import is pretty good for bringing in data if you want to do that, but I've just written a set of Python scripts to import data directly and easily.

I really can't imagine trusting such a critical function to somebody else's web platform, both for reliability reasons (as the Bench fiasco has so nicely illustrated) and for privacy reasons as well.
0
0
3
Show older