Conversation

Jonathan Corbet

One of those little details that, probably, only I care about ... a year ago, when dealing with AI scraper problems, I observed that almost all of the traffic came from IPv4 addresses — millions of them. Use of IPv6 was a pretty strong indication that there was a human involved.

Now, when we get a heavy attack wave, it is strongly dominated by IPv6 addresses; the bots seem to actively prefer IPv6.

I wonder if it's because IPv6 addresses are more likely to remain unique through NAT boxes, giving these sleazy people yet more IP addresses to bring down web sites with?
4
4
11

@corbet my current understanding of this problem is more or less "because mobile devices and shady proxy providers".

0
0
0
@corbet Yeah could be more addresses due to really badly done rate-limiters / blocking tools.

Or because it's easier to obtain blocks of IPv6 addresses and they're running out of IPv4 addresses?
1
0
0
@lanodan @corbet I think it's simpler. The large tech companies making all of these AI scrapers have gotten massive and underutilized IPv4 assignments because of "big tech company" privileges. So they simply abuse these massive v4 ranges since the engineers find it easier.

Meanwhile hostile actors probably buy or compromise cheap VPSes all over the place which tend to have much higher IPv6 availability than IPv4.
1
0
1
@akosiaris @lanodan @corbet oh yeah there's sites where you can buy bulks of compromised IPv6s.

Some users even "willingly" give access to their networks. This is what the Hola VPN extension does for example, you get a free VPN and in return they can sell access to traffic through your home to anyone who wants to pay for it.
0
0
0

@corbet Not everyone has access to a large pool of IPv4 addresses. Perhaps the new scrapers are therefore just different entities?

0
0
0
@akosiaris @quad @lanodan Ah I hadn't seen that report, thanks for sharing that! I've been developing an understanding of that economy for a while, but this fills in a lot of blanks.
0
0
0
@quad @lanodan Ah that's something I hadn't thought about. One VPS can easily host a big set of IPv6 addresses, appearing to come from a lot of different sources. *That* seems like a relatively easy thing to detect. Hmm...
0
0
0

@corbet IPv6 privacy "extensions" seem to be always on be default now, which means random addresses for the same host. Can you observe IPv6 addresses clustering in ranges/subnets? Subnets which you could treat as IPv4 addresses in whatever counter measures you use if you can identify them.

0
0
0