It feels quite uncomfortable that cloudflare is somewhat openly admitting to analysing login credentials that are going through the reverse proxy, and providing aggregated stats on it (without explicit consent of the user it appears?)
Based on Cloudflare's observed traffic between September - November 2024, 41% of successful logins across websites protected by Cloudflare involve compromised passwords.
Don't get me wrong the results are actually pretty interesting, but I just cannot think of a ethical way of doing this, and it feels kind of jarring that they just "did that"
https://blog.cloudflare.com/password-reuse-rampant-half-user-logins-compromised/
@benjojo This is a thing you have to flick on in the WAF dashboard, it doesn’t happen automatically.
@privateger One of the throwaway test domains I have appears to have "Account abuse detection" enabled on it, at least according to the analytics, I have not opted into such scanning, and this is a free plan domain. I think they enabled it for a free users by default?
@benjojo doesn't this go directly to stats collected for maintenance and operation of the system? I though that was exempted under GDPR.
@mark I don't think this is a GDPR concern, but I am not a lawyer, I am suggesting that a very widely deployed reverse proxy seemingly out of the blue suddenly actively doing analysis of the login credentials passing through it does not provoke very good feelings
@benjojo Oooh, interesting. Can you check the WAF settings on a free account?
@privateger Seems to be yeah, I can load up the dashboard area, I will admit it's been a very long time since I've been in this area of the CF dash (or really in the CF dash at all), I spent maybe 3 years in it when I wrote a lot of the cloudflare WAF from 2015 to 2017 (ish) :)
@benjojo hold the fuck up, does that mean they collect usernames and plaintext passwords?!
Because how else would they know the passwords were compromised?
@amberage to the best of my understanding they have code that:
1) Looks in POST parameters for things that looks like login credentials
2) Runs the password over something that looks like a bloom filter
3) Adds a header to the request (on it's way to the actual server) to tell it that the password is known to be compromised, and I guess they increment some counter on their backend analytics.
If they are recording the username/password into their infra, that would be completely insane and probably a very fast destruction to their business. I would assume they are not doing that.
@benjojo that sounds much more reasonable, but also, this is "willingly hosts nazi forums and doxxing websites" Cloudflare, so I am extending absolutely zero good faith or trust towards them.
@benjojo (plus, I always thought passwords aren't POST sent in plaintext to the actual website; I always assumed the front-end – that is, the website in my browser – already does the hashing and salting and only sends the result from my browser to the server. Distressing to find out that's not the case.)
@amberage almost every website that i have ever looked at sends the password in plain text wrapped in a POST x-www-form-urlencoded.
There is not much point in hashing the password on the browser end, given that hash would just become a proxy of the password, and the channel the POST is sent through is assumed to be secure (https/tls etc).
Also given that a lot of hashing (should) now be things like bcrypt or argon, it would require the website to disclose the salt (which should be random bytes) to the user, and that would probably then also disclose the existence of an account (something that is typically not desirable)
If you have a compromised browser/tls proxy it is already game over, a hashing bit of JS isnt going to help you (and it will just prevent users with javascript disabled from logging into the website)
@benjojo and what of websites that, for reasons passing understanding, still only do HTTP? :o
@amberage none that really matter, Chrome and friends now ring the scary alarm bells if you try and enter data into a HTML form that sends data to a HTTP endpoint
@benjojo I'm pretty sure this is opt-in at the site owner level. https://developers.cloudflare.com/waf/detections/leaked-credentials/
@cthos It appears to be enabled by default for non paying sites, there is a wider discussion in replies here https://benjojo.co.uk/u/benjojo/h/cR4dJWj3KZltPv3rqX
@benjojo Do password hashes count as personal data ? I don't think so, but I may be wrong.
@matclab @benjojo technically they do from a philosophical point of view, at least hashed+salted username pairs plus the site they belong to
but since you are willingly opting in to proxy your traffic through cloudflare and they are providing you a service, i would assume it is legal for them to collect and keep them, at least they can use that argument in court if it ever pops up
I don't think the "leaked credentials detention" product is a red flag per say, Maybe the automatic enablement of it is a can of worms (the reason being is that people do not typically you think that their web proxy is going to snoop credentials, even if it is not storing the full outputs of that snooping)
That's probably another set of discussions to be made about the data source of these leaked credentials inevitably being form actual data breaches of other people's stuff! Though this is basically the commercial exploitation of stolen user data, it is probably for the greater good to use such leaks (however dubiously obtained) to detect leaked credentials in the future.
My wider a comment about all of this is that it seems relatively unsettling for a company to be very confidently showing off data outputs that have been derived from non explicit consensual snooping of passwords, they are almost certainly not storing the passwords themselves (because any breach of that would probably be a company ending event), but it shows a level of hubris which is perhaps a little alarming.
I don't think any of this is a GDPR problem (other than the obvious question of an american owned company snooping the user submitted data of your requests that likely has other PII in it to provide a WAF/etc) but none of this is new to cloudflare.
It's worth stepping back a bit and acknowledging that there is a reason that people use cloudflare. It's because the product is actually kind of good, it's always a bunch of problems of people in a cheap and reasonable way, i don't think there's any foul play going on the widespread adoption of cloudflare, it's more that people will choose what is convenient, and cloudflare is mighty convenient.