social.kernel.org

Conversation

OpenStreetMap Ops Team

osm_tech@en.osm.town

If you write about the messy reality behind "free" internet services: we're seeing #OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks. We're a volunteer-run service and the costs are real. We'd love to talk to a journalist about what we're seeing + how we're responding. #AI #Bots #Abuse

9

50

1

BrianKrebs

briankrebs@infosec.exchange

Reply to @osm_tech@en.osm.town

@osm_tech Hey. Sorry to hear about that. Drop me a line on Signal? username: briankrebs.07. Thanks!

0

0

0

Jonathan Corbet

corbet

Reply to @osm_tech@en.osm.town

@osm_tech You are definitely not alone: https://lwn.net/Articles/1008897/ The situation is not sustainable but I'm not sure what we do about it beyond waiting for the AI bubble to burst.

1

4

15

Codeberg

Codeberg@social.anoxinon.de

Reply to @osm_tech@en.osm.town

@osm_tech You're absolutely not alone, we wish you good luck! ~n

0

0

0

sjvn

sjvn@mastodon.social

Reply to @osm_tech@en.osm.town

@osm_tech Tell me more. You can reach me at sjvn01 <at> gmail.com

1

0

0

Baloo Uriza

BalooUriza@social.tulsa.ok.us

Reply to @osm_tech@en.osm.town

@osm_tech I wonder if there's a way to fail2ban requests coming in faster than typically found in human requests.

1

0

0

OpenStreetMap Ops Team

osm_tech@en.osm.town

Reply to @BalooUriza@social.tulsa.ok.us

@BalooUriza We use fail2ban to handle some of this with custom rules, but eventually fail2ban becomes a bottleneck after 100,000 IP addresses.

1

0

0

Cassandrich

dalias@hachyderm.io

Reply to @osm_tech@en.osm.town

@osm_tech @BalooUriza For IPv4, a bitmask of the entire address space is a viable "efficient" implementation of blocking. I wonder if there are tools that can do it that way rather than needing a gigantic list.

2

1

0

�

utf_7@mastodon.social

Reply to @osm_tech@en.osm.town

@osm_tech what is a embedded-Sdk network?

1

0

0

AliveDevil

AliveDevil@tauri.earth

Reply to @utf_7@mastodon.social

@utf_7 @osm_tech

App developers can embed some "Sdk" into their apps or games.
The developer receives money.
The "Sdk"-provider proxies requests through these apps and games, to gain residential IPs.
And scrapers can buy these services, to tunnel their requests from residential IPs.

1

0

0

AliveDevil

AliveDevil@tauri.earth

Reply to @AliveDevil@tauri.earth

Viewer Discretion, example of these Sdks

Show content

@utf_7 @osm_tech

This gets ugly really fast, if you want to see the full extent: <https://alternativeto.net/software/netnut-proxy-network/> for a list of _known_ residential proxy-providers.

1

0

0

Cassandrich

dalias@hachyderm.io

Reply to @AliveDevil@tauri.earth

@AliveDevil @utf_7 @osm_tech So ridiculous that Google and Apple won't just permaban any developer embedding one of these "SDKs".

1

0

0

InsertUser

InsertUser@en.osm.town

Reply to @osm_tech@en.osm.town

@osm_tech The proxy SDK providers need to be treated like the DDOS providers they are and prosecuted.

1

1

0

Andrew Zonenberg

azonenberg@ioc.exchange

Reply to @InsertUser@en.osm.town

Edited 4 months ago

@InsertUser @osm_tech Pulling them from app stores and banning developers of the SDKs would be a good start. Save the criminal charges for after the damage control is done.

0

0

0

Pietervdvn

pietervdvn@en.osm.town

Reply to @osm_tech@en.osm.town

@osm_tech ugh.... why don't they use the exports...

1

0

0

InsertUser

InsertUser@en.osm.town

Reply to @pietervdvn@en.osm.town

@pietervdvn Because that would involve a human using their brains or having a shred of conscience and those both go against the basic principles of the companies doing this.

1

0

0

Cassandrich

dalias@hachyderm.io

Reply to @InsertUser@en.osm.town

@InsertUser @pietervdvn @osm_tech It goes against their whole ideology. The ideology says trust the machine to do what it copied from scraped Stack Overflow posts. If you try to intervene to make it do better, you're not trusting it.

0

0

0

Cassandrich

dalias@hachyderm.io

Reply to @dalias@hachyderm.io

@osm_tech @BalooUriza Like, a bitmask of IPv4 space is several times smaller than a Chrome instance. 🙃 🤡

0

0

0

AliveDevil

AliveDevil@tauri.earth

Reply to @dalias@hachyderm.io

Edited 4 months ago

@dalias I'd wish for them to enforce policies, but they get Ad- and IAP-revenue, so why bother.

Also, these "Sdks" probably have kill-switches (or rather, delayed activation) built-in, to not immediately contact their C&C servers.

1

0

0

Cassandrich

dalias@hachyderm.io

Reply to @AliveDevil@tauri.earth

@AliveDevil Yes but they could still be banned when caught. A few devs getting banned would be a big deterrent for others to ship this malware.

The right *technical* defense, however, is not to allow apps arbitrary network access unless they're declared in the manifest as a "browser" or other "client software" that the user can use with any service they want (like IRC clients, mail clients, Mastodon clients, etc.).

Instead, the manifest should declare a single domain the app can contact, or multiple if the developer is willing to pay for more intensive vetting of them, and only allow network access to the declared domain(s).

0

0

0

OpenStreetMap Ops Team

osm_tech@en.osm.town

Reply to

@LMieldazis @geerlingguy oooh do we get to show him our out-of-band (remote access) Raspberry Pi with dual power feeds, 4G modem and loads of serial connections? Saved our skin a good few times.

1

0

0

Jeff Geerling

geerlingguy@mastodon.social

Reply to @osm_tech@en.osm.town

@osm_tech @LMieldazis would love to talk maps ops! I've seen many projects wrapping in map data and adding scripts to dl entire regions

0

0

0

MAgetröte

magezwitscher@det.social

Reply to @dalias@hachyderm.io

@dalias @BalooUriza But that is one of the points @osm_tech are making in their post. These crawlers resort to using massive amounts of "scrapers hiding behind residential proxy/embedded-SDK networks" - meaning they are using Adware-infested phones all over the world for their scraping attaks. So banning IP ranges won't help much. Playing cat-and-mouse with these scrapers is resource intensive, which is increasingly hard for FOSS projects and is also driving up cost for commercial offerings.

1

0

0

Cassandrich

dalias@hachyderm.io

Reply to @magezwitscher@det.social

@magezwitscher @BalooUriza @osm_tech Not ranges. Just the single IP, and a short-lived ban. All you need to do is get them down from thousands of requests per minute to one request per hour (because they get banned for an hour each time they start again).

0

0

0

soaproot

soaproot@sfba.social

Reply to @corbet

@corbet @osm_tech I don't have answers either but I hope something emerges because waiting for the bubble to burst still may face the "the market can remain irrational longer than you can remain solvent" problem.

0

0

0

JP

froztbyte@mastodon.social

Reply to @osm_tech@en.osm.town

@osm_tech might be a thing @davidgerard could do on pivot

1

0

0

David Gerard

davidgerard@circumstances.run

Reply to @froztbyte@mastodon.social

@froztbyte @osm_tech yeah i'm getting the same AI assholes

as is @RationalWiki (i'm the sysadmin trying to keep the site up in the face of the hammering - we can either lose Google search listing, or we can be literally unusable for humans)

as is @corbet at Linux Weekly News - OSM might be relevant to LWN, a free content project getting hammered by the AI bots

they botnet suburban Android boxes

covered it a bit previously on Pivot:

https://pivot-to-ai.com/2025/06/02/fighting-the-ai-scraper-bots-at-pivot-to-ai-and-rationalwiki/
https://pivot-to-ai.com/2025/09/07/the-ai-scraper-bots-are-hammering-pivot-to-ai-again-please-test/

2

1

2

David Gerard

davidgerard@circumstances.run

Reply to @sjvn@mastodon.social

@sjvn @osm_tech do contact sjvn!

0

0

0

JP

froztbyte@mastodon.social

Reply to @davidgerard@circumstances.run

@davidgerard @osm_tech @RationalWiki @corbet Also getting and handling them (as you know), but I’d be pretty interested to hear how bigger projects have to handle them

Quick check on latest status since last #iocaine restart: 1.49TB across 1.05B requests served

they never ever stop…

0

0

0

The Orange Theme

theorangetheme@en.osm.town

Reply to @davidgerard@circumstances.run

@davidgerard @froztbyte @osm_tech @RationalWiki @corbet An aside, but I had no idea you keep Rational Wiki running! I love that site. Thank you for all your hard work! I'm sorry the slopbros are trying to ruin it.

1

0

0

David Gerard

davidgerard@circumstances.run

Reply to @theorangetheme@en.osm.town

@theorangetheme @froztbyte @osm_tech @RationalWiki @corbet i quit the sysadmin job nine years ago, so of course i still have it

0

0

0

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org