x.x.x.x - - [10/Nov/2024:00:02:37 +0000] "GET / HTTP/1.1" 301 162 "-" "okhttp/4.9.0"
You know what’s interesting about this log line? It repeats 56,686,963 times in www.kernel.org logs for yesterday, across 4 nodes. That’s about 700 times a second, and this has been going on for months.
These requests aren’t intentionally malicious – they issue a simple GET /
, receive their 301 redirect, and terminate the connection. From what I can tell, this is some kind of appliance or software installed on mobile clients that uses “can I reach www.kernel.org” as a network test.
This wouldn’t be that big of a deal – a single plaintext “GET /“ that triggers an immediate 301 is very cheap for us to generate, but the number of these requests has been steadily growing.
If you have any idea what this is and how to make it stop, please reach out?
@monsieuricon How to make it stop? Let any requests with this user agent just time out, then the connectivity check becomes useless 😈
@monsieuricon I wonder what happens if you add an iptables reject rule matching on that specific user agent for a week or so 🦧💡
@monsieuricon Perhaps OS fingerprinting with Wireshark can tell you if these are from the same device/OS or not, which could narrow things down?
@monsieuricon id expect this for google.com but not kernel.org. Interesting.
@monsieuricon I instantly think about microsoft and WSL, but that's just prejudice. I recall, however, working for Symantec in 2009, a QA pal being amazed at how many "needless" connections win spawned.
@R1Rail @monsieuricon my predecessor had to figure this kind of thing out a whole ago, got a paper out of it. https://pages.cs.wisc.edu/~plonka/lisa/lisa2003/lisa-netgear-sntp.pdf. happy hunting Mr.Icon
@monsieuricon
If it's an app it's presumably capable of being updated.. I wonder if you can convince it the check has failed after you've seen the request headers? E.g. send an RST? Do that for 10% of requests (so as not to be a complete a-hole) and someone might notice and update...
@bencardoen @tony
@monsieuricon People pick the weirdest things for their connectivity tests. https://phabricator.wikimedia.org/T273741 remains the weirdest I've seen though, where an unnamed app decided to use a random picture of a flower and ended up causing ~20% of the traffic to a Wikimedia Foundation datacenter.
@monsieuricon @djh you could possibly start returning 404 for that user agent.
@monsieuricon
OkHttp appears to be a Curl alternative for Android so I'm guessing that someone used you in an example of how to use it in some guide or other. This is why example.com exists ffs!
@monsieuricon Not saying this is the culprit but this code seems to do the same thing:
https://github.com/TeamNewPipe/NewPlayer/blob/89d6f16872f656dd62e47320d9cfd904f087b601/test-app/src/main/java/net/newpipe/newplayer/testapp/TestMediaRepository.kt#L108
@monsieuricon have you checked if those clients request any other assets from *.kernel.org
Can you tell when exactly this started maybe?
I'd honestly just drop all responses to that user agent. Or delay them by ~30 seconds.
@monsieuricon
Have you seen this? It mentions an okhttp 4.9.0 critical vulnerability? Might be related?
https://github.com/strimzi/strimzi-kafka-operator/issues/6934
@monsieuricon We (Dataplane.org) see lots of the okhhtp agents fetching more than /, particularly from cloud/search companies like microsoft and google.
And github.com/square/okhttp you may have discovered seems to be some web client that "perseveres when the network is troublesome".
@monsieuricon Is there any way to automatically identify these requests & return a 403 instead?
@monsieuricon If it's that common, do you just need to find a friendly company/org with a large wifi setup and ask them to look at which devices are making kernel.org connections and see if they correspond to any start-of-MAC manufacturer codes?
@monsieuricon another option is to let them fail in certain timeframes, maybe every tuesday, and see who is going to start crying 👀
@penguin42 @monsieuricon Most mobile devices default to randomised locally-assigned MACs these days, good for anti-fingerprinting in adversarial situations but makes diagnostics a right pain
@astraleureka @monsieuricon Hmm, that's unfortunately very sensible. I guess then if you're lucky with a large org (or a conference??) you might be able to get back to particular users and ask nice ones. Much trickier though.
@monsieuricon Ooof... that's not fun indeed!
Boosted and hopefully it'll get resolved soon
@monsieuricon I would lie. Keep track of the IPs doing that, and if they keep at it reply 404. That way the devices will misbehave and whoever has shipped that code will get a bug report, snd eventually fix their code.
@monsieuricon might want to check if your upstream provider has DOS protection options available which would blackhole the traffic before they hit your network.
@feld @monsieuricon it's used by the vast majority of android apps, react native or not
@huitema @monsieuricon yeah, and return a response body that explains why
@monsieuricon If you return html with an img tag, does it load it?
Does it run script?
@monsieuricon mhhh I remember from my android dev days that okhttp is a very popular library for HTTP client stuff on android.
But it's not restricted to android. This could potentially be coming from any Java or Kotlin based program.
The growing install base kind of seems to indicate a popular android app though, yes.
On the other hand...it would be fucked up if what you're seeing, is traffic coming from a growing bot net 😅
@justin, but okhttp is just a Java HTTP client library, in particular popular on Android, there's nothing wrong with it per se.
@monsieuricon Drop the requests and see what starts burning.
@monsieuricon Offline the server for half an hour and see who is complaining ?
@johntimaeus @rami @monsieuricon Delaying then by 30s at 700/s would add 20,000 TCP connections at any time, that may be way harder on the system than the GETs themselves. A 429 error with Retry-After would do something similar without that load.
@monsieuricon just blocking it based on the user agent for a few hours as a brown out test. and then fully blocking it later on would be my approach.
similarly why google limits ICMP to 8.8.8.8
@chrysn @johntimaeus @monsieuricon haproxy has a "slient drop" feature for example 😎 https://www.haproxy.com/blog/use-haproxy-response-policies-to-stop-threats
@chrysn @johntimaeus @rami @monsieuricon
Unilaterally close the connection without sending a RST | FIN
@monsieuricon @bladecoder NewPipe dev here: NewPlayer is a standalone lib which is currently under development. It is thought to be NewPipe's next media player framework, but has not been integrated in NewPipe yet. What you have linked here is the test app for the new player. It is not used except by <10 devs to test their changes. If you want me to, I can change the address to something else though.
@monsieuricon This sounds eerily like another consumer device manufacturer thinking your site would be one of those that's always up and running, much like FreeBSD developer Poul-Henning Kamp discovered D-Link had done to his timeserver way back when - see eg https://www.theregister.com/2006/05/11/d-link_time_dispute_settlement/.
Happy bozo hunting!
@monsieuricon Have you talked to the okhttp team see if they have any ideas from their user data who it might be and if they can push a block into their library.
@chrysn @johntimaeus @rami @monsieuricon You only have to delay a small random subset of them by 60 seconds to create a random really annoying response lag in the app. Even more so if you take a small subset and serve them a few characters per second for 5 mins 8)
If it handled 429 with a retry then you could have also issued a 429 and blackholed them for a while. Alas not it seems.
@monsieuricon @josh I cross posted this on r/androiddev and someone suggested that this could coincide with the recent (beta) release of OxygenOS 15, which I doubt because you said this has been going on for months...
However OxygenOS 14 has started its widespread rollout in March 2024. Maybe that fits your timeline better?
I guess some OnePlus user on OxygenOS 14 could check with Wireshark 🤔
@monsieuricon first thing i'd check is if that just something the library does by default, okhttp is this: https://github.com/square/okhttp
@tobigr @monsieuricon @bladecoder I'd recommend using a URL you control for testing purposes. You never know what will happen with something like kernel.org, from causing traffic to changes making your tests break.
@ross @tobigr @monsieuricon @bladecoder Also, devs copy and paste code all the time, so even though YOUR codebase is only directly used by a few people, someone might copy it into a production app. There have been several network overload type issues over the years. The worst I know of is NTP on home gateways which took over a decade to resolve.
@penguin42 @astraleureka @monsieuricon Despite random MAC addresses, device fingerprinting is still very easy, though slightly intrusive. Worse still, iOS "resets" the setting that turns static MAC addresses off with every upgrade, which screws up things like restaurant point-of-sale systems which rely on one iPad being "the master" at a static IP address (vs dynamically finding it)
@pitrh @monsieuricon Oh! I knew about the Linksys NTP bug way back when; I hadn't heard D-Link made a similar mistake! That's just so embarrassing.
@monsieuricon Indeed it's anecdotal and the only thing I could find on public GitHub.
I wouldn't be surprised to learn that it originates from shady Chinese phone firmwares.
@trouble @ross @monsieuricon @bladecoder As I already said, the repo linked is far from a state in which it could be used in production, let alone in a separate app.
Side note: we replaced the reference to kernel.org and now use our own domain
@marshray @monsieuricon Heh. Overnight bot army. Feed it a crypto mining script to help cover the costs.
@puppygirlhornypost2 @monsieuricon Or 1.1.1.1. You want a connectivity test? Ping the thing that boasts itself as one of the fastest DNS endpoints in the world, besides they did say loads of people were already doing exactly that.
Few suggestions:
- we can check what's the HTTP Headers being sent, they might have some patterns
- we can look up into shodan.io search
@monsieuricon @djh The nice part of making these fail is that, if this isn't malicious, it might cause enough of a problem for the people who implemented it that they might be forced to fix it. This reasoning still applies if you do that in Fastly rather than on your server.
@monsieuricon HTTP status code roulette for this specific user agent?
@monsieuricon my guess is that if you'll start dropping such connections (via user agent or any other means) you'll quickly find out what it is 😈
@rami @monsieuricon Or since it's likely to be mobile apps, see how much data you can send in the response.
@monsieuricon let’s start a world wide infrastructure collapse by deadholing all those IPs :) (it’s Java btw)
@monsieuricon @djh total non-geek wild guess here: is there one IP that sends more than others? Maybe that leads you to the mobile app *developer*?
@monsieuricon @djh so, another crazy question: could an AI crunch it? Or would that be ecologically disastrous?
#outsideTheBox 😂
@monsieuricon Easy way to contact them: replace the 301 redirect with a 4xx and a HTML body that says “please do not use this site as a health check”
@monsieuricon
Could you just block this type of request wholesale and wait until someone starts screaming?
@anthropy
@monsieuricon I would just make the matching user agent receive an error, that will make the message go through
@monsieuricon I find it hard to believe I'd be thinking of something you haven't already considered, but, in the off chance you haven't:
1) Do some analysis on the IP addresses causing this - are they coming from a particular geographic area? Are there any patterns, groupings, or clustering of addresses?
2) Dig deeper into the full request being made. Collect the full set of headers and not just the summary line. Does that show any identifying information?
3) Are any sizable portion of these requests subsequently followed by more valid requests from the same address?
4) Is there any pattern to the timing of these requests by location?