Will the webmasters run a script to dereference all the URL shorteners? They can.
@corbet
Good. I think those shorteners created more problems than they solved.
@corbet ripgrep query that finds 'goo.gl' links in a directory, e.g. archive of mail or social media dump:
rg -oNI --no-heading -e 'https?://goo\.gl/[0-9a-zA-Z/]+'
@corbet this makes me wonder whether the Wayback Machine tracks shortened links properly…
since I guess that would be a way to recover them after it gets shut off, albeit in a very annoying way
@tofugolem @corbet sure, but Google must have the resources to simply make it read only rather than breaking all those links
@corbet Recently watched a talk by @textfiles about losing our history. And apparently link shorteners were already a problem back then.
"URL shorteners are the stupidest idea we've come up within the last 10 years."
@clarfonthey @corbet Sort of; URLTeam (part of ArchiveTeam) has been continuously archiving link shorteners: https://wiki.archiveteam.org/index.php/URLTeam, and although not a part of the Internet Archive, the crawls *do* end up there I believe
@clarfonthey @corbet yes, they do work properly inside WBM in most cases. There is a long history of doing this for dead or dying link shorteners: https://wiki.archiveteam.org/index.php/URLTeam
@corbet another Google Graveyard…and another reason to stop using Google products as much as possible.
@corbet I think the vast majority of these are from syzbot emails and many are the same, a link to the syzbot docs. Something like https://lore.kernel.org/all/?q=nq%3Agoo.gl+and+NOT+%28f%3Asyzbot+OR+s%3Asyzbot%29 returns only about ~700 emails
@corbet Developing that interstitial page and writing that blog post has to be more work than keeping the service running forever! Fricking Google.
(Unless another team deprecated the infrastructure it runs on.)
@corbet We deserve this for relying on proprietary services frivolously.
weird. that’s perfect for analytics and they love data
@corbet @darrell73 Wenn da keine Bildbeschreibung dran hängt, mach ich nix damit. Dachte, ich bin das linksperformative Profil durch entfolgung losgeworden. Muss ich scheinbar noch nen Block reinhauen. Schade.
@corbet@social.kernel.org
World needs to take the hint and stop relying on Google for anything.
@corbet I guess there's an "easy" fix. Scrape all of the URLs to get the redirection and edit the links in the history.
@andrewt @tofugolem @corbet They did, they made it read-only 6 years ago. And now they're giving another 12 months warning, adding a little pain to links using it, even more chance for those using it to do something about it.
@corbet FWIW I captured a snapshot of those (non-syzbot) redirects here: https://github.com/vegard/vegard.github.io/blob/master/linux/2024-07-19/goo.txt
Of course some (many?) of the redirected-to URLs are themselves already defunct...
@corbet This seems to happen every month or so. It's amazing to me that people still keep going back for more. 295 products/services & counting!
Google Graveyard - Killed by Google
https://killedbygoogle.com/
@corbet I feel more vindicated self hosting everything I can every passing day.
@hatter @andrewt @tofugolem @corbet the point is it's breaks every archival post ever for posts that used it, it's a huge loss
@shiri @corbet ArchiveTeam has software you can run on your computers to help archive all kinds of services that are about to shut down, and one of their long-term projects (URLTeam) archives URL shorteners (from what I can tell, goo.gl isn’t currently being actively archived, but I assume that’ll change soon)
@corbet hopefully someone resolves all of them and stores them in an index somewhere
@vitriolix @andrewt @tofugolem @corbet Someone maintains goo.gl, someone maintains those archives. When the people still maintaining the shortener stop caring about the shortener, it's time for the archivist to do the work to preserve what they care about. Also, other archivists are doing what they can to preserve all links, regardless of immediate value that anyone else gives to each link. Likely very little will be lost in such a long deprecation cycle, and even less of value will be lost.
@corbet thankfully, ArchiveTeam had a long-going project that collected millions of those URLs, so they won’t be lost.
my infra helped at some point, I was running archiveteam runners for quite a while :3
@a1ba@suya.place I literally always thought they were both a security and a longevity risk, and I'm not glad to see that I'm right. Curse Twitter for making people feel the need to shorten their URLs so much. I've seen several other smaller shortener services dying over the years but this is the worst one.
@corbet I always wonder about the amount of history that would be lost if the old mailing lists from Google Groups were to vanish from one day to another
@corbet Google is intentionally breaking the internet. I wonder what they plan to try and sell us in the smoke of the damage?
@corbet obviously this is a bad faith fuck you to their customer base, with the fuck gradient correllating closely with long term customership.
seems crazy they are capable of wanting to destroy this, but not as crazy as the fact their are still people willingly using google products.
Wonder if they know that this is gonna break google workspace? Lots of the auto-generated URLs there are goo.gl
@oleksandr @corbet respect to the man bringing a sense of fairness to a grudge fight 😂😂💯
@corbet What a rotten thing to do. I’m sure someone else would be willing to take over running that if it’s too hard or too expensive for Google.
Self host url shorts without Google tracking 😁
I recommend https://yourls.org/docs
@corbet Happens when you put the fate of technology in the hands of organizations that only care about short-term profits. This is also why I'm against things like streaming and SaaS. If something requires you to connect with a company's servers every time you use/consume it, it will be gone as soon as it no longer serves that company's bottom line.
@corbet crazy to me, they must be running the service on a $10 a month vps, why can't they just keep it going?
@corbet yet another display of the consequences of dependency on profit-driven organisations.
@corbet You have to be really careful using almost any Google service; Their history of just dumping things on a whim is long.
They just did another update on the GitHub site last week and I don't see anything about end of life but its mostly PHP so, I could handle it.
Some urls on our servers could use it but most of ours are fairly short anyway after i started getting into the rewrite codes.
I'll keep digging to see if they are discontinuing it though, thanks.
All true but my point is not using Google for anything is the best reason.
With any of those corporate assholes using tracking and AI, all the better to take the Internet back 😀
@corbet Filed under glad I didn’t use it and you can’t trust Google for anything.
url shortening was always stupid tho... its hecking stupid to expect google to keep anything around. especially with how shortsighted their cors setup is and has been.
like yes let me make it even easier for the spy company to spy on me -_-
On the one hand, this sucks.
On the other hand, it might help some people realize:
1) It is always a bad idea to expect any Google service to remain in place, given their ... ha ha ha ... track record.
2) Don't use URL shorteners. Ever. WTF is wrong with you, just don't do it.
@corbet I get the feeling this is one of those things that is going to cause mass havoc because some legacy software uses these internally and that when enough people scream about it, Google will be forced to keep the links working.
@corbet
Ah, makes sense. I mean they removed their "Don't be evil" statement. Now they have to act accordingly...
@netzwerkgoettin
@corbet My first thought was that should be no problem for the Internet Archive to back up.
...then I thought, why does a scrappy nonprofit have to do it instead of the $400 billion company keeping them up in the first place?
@corbet this is irresponsible behavior. Old links should be kept alive.
@corbet I always felt that URL shorteners were the wrong solution for a problem that didn't exist. Unless, of course, you want to use them to ensure that people DON'T know what link they are clicking.
@corbet They should release the database into the public domain, so everyone can do what they can do.
Thank you. This was what I was looking for.
Is almost certainly already aware of this, but tagging him here just in case.
[Edit] never mind. He already chimed in down thread.
PrOpRiEtTaRy link shortener service does PrOpRiEtTaRy things.
Use #commons for all that has worth.
@corbet Penny-wise and Pound foolish. One more for "Killed by Google"
@hatter @vitriolix @tofugolem @corbet oh yeah, it's definitely costing them money to run it and they're going about it in as good a way as you can expect — most of the old Twitter-era services just quietly stopped working while nobody was looking. To be fair OP was right, really this is Twitter's fault for creating an artificial need for these silly forwarding services in the first place, although I'm sure analytics services would have normalised it anyway
But it does feel a bit odd. Like, Google's core business is (was?) a constantly updating, publicly searchable live index of almost every page on the internet, and they really find it too expensive to maintain a static index of a billion or so string-string key value pairs you can only look up by the primary key with exactly zero UI that's already set up and presumably doesn't do much traffic any more? It's going to cost them more to shut it down this gracefully than it would to run it for another decade, surely?
@corbet It's a good thing they dropped their former motto as it would be really hypocritical given their current policy of doing as much evil as possible.
Oh what? So goo.gl has been deprecated since 2018, and an automated Google bot still has goo.gl in their email footer? They really don’t know what they are doing.
Up to message 18000 these are just footer links, so maybe 2000 real messages, which includes quotes and whole mail body copies.
@corbet General recommendation: don’t “shorten” URLs. That’s just another gatekeeper/database between readers and your website.
@hatter @vitriolix @andrewt @tofugolem @corbet If no one writes a script for this, I’d say they don't even care about their archives? I mean, I would if I needed to. I bet it could be done with a oneliner!
Google is in the business of harvesting human behavioural data from its users and selling results from human prediction models to the highest bidder. They are also in the business of using their knowledge of said users to influence user decision-making and opinions, also at the behest of the highest bidder.
It is likely that the URL shortening service offers no additional behavioural data for them to harvest. Therefore, it is useless. So, shut it down as it consumes resources above 0 and running anything carries with it operational risks (however minimal) that they can do without.
@tagomago @hatter @vitriolix @tofugolem @corbet I mean they *don't* care about that, we know that. They've clearly long since decided that old services, even well liked and used ones, are going to get shut down and it's up to users to deal with that. And that's mostly fair enough, they used to experiment a lot but that means most of them would fail sooner or later, and I'm sure as much as we complain when they do it, most people don't care and it doesn't really hurt Google's numbers. But I mean, their propensity to sunset everything has got to be a big part of why more businesses don't use their cloud offering. I even run Gmail from behind a forwarder in part because I don't entirely trust it will exist in five years or I'll want to use it if it does.
@Sweetshark @corbet Good idea. @textfiles are you aware of this?
@corbet Ah, great, another step to digital wasteland. Time to admit that the internet is no cultural heritage, but ephemeral.
Truth is, (external/public) link shortening services always were a bad idea. They only exist because of microblogs, where you have to fight for each precious message character.
However, I feel like having created such an abomination should lock you forever into the obligation to keep it alive, until the last referred-to link breaks.
@corbet
Would it possible to take all that 19,000 links, look where they go and make them available under another domain?
@corbet damned good reason for oss-security list to insist on including the important content from websites in posts to the list!
@danderson@hachyderm.io @shiri@foggyminds.com @corbet@social.kernel.org I thought it wasn't because under the warrior projects section, according to the reference at the start of the section pink is currently being scraped, and the row for goo.gl is white
> Today, the time has come to turn off the serving portion of Google URL Shortener.
Not very long ago, everyone who worked at Google would have understood instinctively that there is no such time.
@corbet That's probably a substantial cultural loss. But on the other hand, who would have thought that embedding a link somewhere for the long term while needlessly relying on some specific proprietary third party service is a bad idea? I wish people would be more aware of the fact that all companies they interact with will try to lock them in in their product ecosystem if they can. Letting that happen might be convenient but always comes with a risk.
Archive.org is up for helping...
The original URL shorteners thought about this, and archived their links with archive.org .
https://archive.org/details/301works?tab=about
I hope google joins now, and gives us the host domain so we can make them continue to work (redirect into the wayback machine that would archive the redirect).
please.
@corbet After downloading the mailbox file from the linked search result and poking in it, it appears a lot of those mentions are duplicates; removing duplicates gets it down to about 600~800 unique goo[.]gl links. In the case of LKML, that's fairly easy to archive.
I'm not sure the mailbox file is everything, though, so this may still be off.
Regardless, the closing of the service is still a massive loss.
@brewsterkahle
@c3manu said in https://chaos.social/users/c3manu/statuses/112812473668724559 :
-
@drewdevault if you wanna help without the luxury of getting the db, people are currently organising in #archiveteam-bs (still deciding on a dedicated channel name)
the url shortener project has been running for a while now, including for goo.gl urls
We at MPAQ are always setting up services that the corporates use so that we don't need them anymore.
We have blogs, email, live music and many things. ATM, I'm working on a url shortener 😁
It doesn't matter what corporation it is, Fakebook and TwitterDumb are also on my hate list.
We are even hosting our own social network, Beamship 😁
@brewsterkahle @corbet thank you for having taken over purl.org when the library monopoly, err, cooperative, gave up on it
@corbet can the @internetarchive get a backup of this database for posterity?
@albertcardona @corbet not all of them can, not all of them will. You could also try to query the shortener as much as possible before the shutdown and thus export as much as possible of the mappings.
@corbet hey, I wrote you a short thing: https://git.sr.ht/~gnomon/fetch-goo.gl-shortlink-dereferences
It turns out that the ~19,000 goo.gl shortlinks in that lore search you posted deduplicate down to about 360 unique shortlinks. The script in that repo can pull down about 285 of them. Stuffs 'em in a smol sqlite3 database with a simple index that makes lookups, even in a tight loop, close to instant.
It's only the very simplest proof of concept, but it _does_ work. The Lore picture is not as bad as I expected.
@corbet @monsieuricon indeed! By coincidence I happened to do a deep dive into the public-inbox codebase last week¹; it's part of why that script I wrote is the way it is. The sqlite3 DB fits into the existing prerequisites for that codebase.
While I agree that the git history rewriting would be bad, regenerating the Xapian indices would be even worse.
However I think a _render time_ transformation might work, conceptually like git's mailmap. I am experimenting.
@corbet Maybe Google could ameliorate the pain of killing their Short URL service by setting up a system where you could query one of their short URLs that are going away, and get back a redirect to the URL it originally pointed to.
It's the #1 reason I try to avoid URL-shorteners (reason #2 is that they can insert unwanted stuff in the redirect).
Hopefully @internetarchive, @textfiles or @ArchiveTeam will step in and try to archive as much as possible, and users of googl shortener will expand the shortened URLs in time.
I will track them down the ones used on my blog today and if still there: expand them.
@Sweetshark@chaos.social @corbet@social.kernel.org that will take a while.
@bastelwombat @Sweetshark @corbet been working on it for 10 years so yes