Conversation

Jonathan Corbet

Ah joy ... Google is turning off its URL shortener and breaking every link that ever used it:

https://developers.googleblog.com/en/google-url-shortener-links-will-no-longer-be-available/

A quick search on lore.kernel.org:

https://lore.kernel.org/all/?q=goo.gl%2F

...turns up about 19,000 messages with affected links. That's a lot of history that is going to become harder (or impossible) to find.
73
857
602

@corbet burn them all into archive.org?

2
0
1

@corbet
Irresponsible cultural vandalism.

@bagder

2
0
0

@corbet

Will the webmasters run a script to dereference all the URL shorteners? They can.

2
0
1
@albertcardona I am thinking about hacking together a URL-replacement script for LWN. Doing that on lore, though (or the LWN email archive) would be a rather more painful prospect, to say the least. I would honestly be surprised if it actually got done.
0
2
12

@corbet
Good. I think those shorteners created more problems than they solved.

1
0
0

@corbet ripgrep query that finds 'goo.gl' links in a directory, e.g. archive of mail or social media dump:

rg -oNI --no-heading -e 'https?://goo\.gl/[0-9a-zA-Z/]+'

0
0
0

@corbet this makes me wonder whether the Wayback Machine tracks shortened links properly…

since I guess that would be a way to recover them after it gets shut off, albeit in a very annoying way

2
0
0

@tofugolem @corbet sure, but Google must have the resources to simply make it read only rather than breaking all those links

1
0
1
@corbet I wonder if there's a project to archive those expanded links... wait, does archive.org archive them?
1
0
0

@corbet Recently watched a talk by @textfiles about losing our history. And apparently link shorteners were already a problem back then.

"URL shorteners are the stupidest idea we've come up within the last 10 years."

https://youtube.com/watch?v=tJqZGRIwtxk

1
0
1

@clarfonthey @corbet Sort of; URLTeam (part of ArchiveTeam) has been continuously archiving link shorteners: https://wiki.archiveteam.org/index.php/URLTeam, and although not a part of the Internet Archive, the crawls *do* end up there I believe

0
0
1

@clarfonthey @corbet yes, they do work properly inside WBM in most cases. There is a long history of doing this for dead or dying link shorteners: https://wiki.archiveteam.org/index.php/URLTeam

0
0
0

@corbet another Google Graveyard…and another reason to stop using Google products as much as possible.

0
0
0

@corbet I think the vast majority of these are from syzbot emails and many are the same, a link to the syzbot docs. Something like https://lore.kernel.org/all/?q=nq%3Agoo.gl+and+NOT+%28f%3Asyzbot+OR+s%3Asyzbot%29 returns only about ~700 emails

1
0
2

@corbet Developing that interstitial page and writing that blog post has to be more work than keeping the service running forever! Fricking Google.

(Unless another team deprecated the infrastructure it runs on.)

0
0
0
@vegard Better but still really painful to fix; public-inbox is pretty firmly built around the idea that archived messages do not change.
5
0
5

@corbet @vegard links breaking and needing to be updated could possibly have been foreseen when public-inbox was built.

0
0
0

@corbet We deserve this for relying on proprietary services frivolously.

1
0
1

@corbet @darrell73 Wenn da keine Bildbeschreibung dran hängt, mach ich nix damit. Dachte, ich bin das linksperformative Profil durch entfolgung losgeworden. Muss ich scheinbar noch nen Block reinhauen. Schade.

0
0
0

@corbet@social.kernel.org

World needs to take the hint and stop relying on Google for
anything.

0
0
0

@corbet I guess there's an "easy" fix. Scrape all of the URLs to get the redirection and edit the links in the history.

0
0
0
@corbet Thanks for headsup. I was more curious about the source tree. Seems only a couple of shortened URLs are in the source tree, fortunately.

```
$ git grep goo.gl
Documentation/filesystems/9p.rst: http://goo.gl/3WPDg
net/ipv4/Kconfig: delay gradients." In Networking 2011. Preprint: http://goo.gl/No3vdg
```

I may post patches later, unless others do.
0
0
0

@webmink @corbet @bagder

Google also trashed the Blues by the Bay podcast which made all the old shows unavailable, lost like tears in the rain.

I'm really pissed off about this. If anybody knows where the old shows *are* available, please tell me.

0
0
0

@183231bcb @corbet those use "forms.gle" now so I wouldn't think so.

0
0
0

@andrewt @tofugolem @corbet They did, they made it read-only 6 years ago. And now they're giving another 12 months warning, adding a little pain to links using it, even more chance for those using it to do something about it.

1
0
0

@corbet FWIW I captured a snapshot of those (non-syzbot) redirects here: https://github.com/vegard/vegard.github.io/blob/master/linux/2024-07-19/goo.txt

Of course some (many?) of the redirected-to URLs are themselves already defunct...

0
1
1

@corbet This seems to happen every month or so. It's amazing to me that people still keep going back for more. 295 products/services & counting!

Google Graveyard - Killed by Google
https://killedbygoogle.com/

0
0
0

@corbet I feel more vindicated self hosting everything I can every passing day.

0
0
0

@hatter @andrewt @tofugolem @corbet the point is it's breaks every archival post ever for posts that used it, it's a huge loss

2
0
0
@corbet I just mentioned the other day if you look at recent amicus briefs they use tinyurl in the citations
0
0
1

@shiri @corbet ArchiveTeam has software you can run on your computers to help archive all kinds of services that are about to shut down, and one of their long-term projects (URLTeam) archives URL shorteners (from what I can tell, goo.gl isn’t currently being actively archived, but I assume that’ll change soon)

2
0
3

@corbet hopefully someone resolves all of them and stores them in an index somewhere

0
0
0

@vitriolix @andrewt @tofugolem @corbet Someone maintains goo.gl, someone maintains those archives. When the people still maintaining the shortener stop caring about the shortener, it's time for the archivist to do the work to preserve what they care about. Also, other archivists are doing what they can to preserve all links, regardless of immediate value that anyone else gives to each link. Likely very little will be lost in such a long deprecation cycle, and even less of value will be lost.

2
0
0
Imagine relying on url shorteners in the first place.
2
0
0

@corbet @vegard in this particular case Google's syzbot shouldn't be using that shortener since

... checks their notes ...

since 6 years ago! (circa 2018)

0
0
0

@corbet thankfully, ArchiveTeam had a long-going project that collected millions of those URLs, so they won’t be lost.

my infra helped at some point, I was running archiveteam runners for quite a while :3

0
0
1

@corbet don't rely on google services

0
0
0
@a1ba imagine relying on links..
0
0
1

@a1ba@suya.place I literally always thought they were both a security and a longevity risk, and I'm not glad to see that I'm right. Curse Twitter for making people feel the need to shorten their URLs so much. I've seen several other smaller shortener services dying over the years but this is the worst one.

1
1
1
@Varyag remember when they were also used for ads?
2
0
0

@corbet I always wonder about the amount of history that would be lost if the old mailing lists from Google Groups were to vanish from one day to another

0
0
0

@corbet Google is intentionally breaking the internet. I wonder what they plan to try and sell us in the smoke of the damage?

0
0
0

@corbet obviously this is a bad faith fuck you to their customer base, with the fuck gradient correllating closely with long term customership.

seems crazy they are capable of wanting to destroy this, but not as crazy as the fact their are still people willingly using google products.

0
0
1

@corbet

Wonder if they know that this is gonna break google workspace? Lots of the auto-generated URLs there are goo.gl

0
0
0

@oleksandr @corbet respect to the man bringing a sense of fairness to a grudge fight 😂😂💯

0
0
0

@corbet What a rotten thing to do. I’m sure someone else would be willing to take over running that if it’s too hard or too expensive for Google.

0
0
0
@a1ba @Varyag some minecraft-related stuff still uses those things and it's just super annoying
0
0
1
Edited 4 months ago

@corbet

Self host url shorts without Google tracking 😁
I recommend https://yourls.org/docs

1
0
0

@corbet Happens when you put the fate of technology in the hands of organizations that only care about short-term profits. This is also why I'm against things like streaming and SaaS. If something requires you to connect with a company's servers every time you use/consume it, it will be gone as soon as it no longer serves that company's bottom line.

0
0
0

@corbet crazy to me, they must be running the service on a $10 a month vps, why can't they just keep it going?

0
0
0

@corbet yet another display of the consequences of dependency on profit-driven organisations.

0
0
0

@corbet You have to be really careful using almost any Google service; Their history of just dumping things on a whim is long.

0
0
0

@glitzersachen @corbet

They just did another update on the GitHub site last week and I don't see anything about end of life but its mostly PHP so, I could handle it.

Some urls on our servers could use it but most of ours are fairly short anyway after i started getting into the rewrite codes.

I'll keep digging to see if they are discontinuing it though, thanks.

0
0
0

@glitzersachen @corbet

All true but my point is not using Google for anything is the best reason.

With any of those corporate assholes using tracking and AI, all the better to take the Internet back 😀

0
0
0
@a1ba @Varyag Haha yeah AdFly or something like that, ad blockers FTW :P
0
0
1

@corbet Filed under glad I didn’t use it and you can’t trust Google for anything.

0
0
0

@corbet @vegard Perhaps a system to rewrite the URLs when displaying messages would do?

Something like git's mailmap, but for URLs.

0
0
0

url shortening was always stupid tho... its hecking stupid to expect google to keep anything around. especially with how shortsighted their cors setup is and has been.

1
0
0

like yes let me make it even easier for the spy company to spy on me -_-

0
0
0

@corbet How long until git.io links get destroyed?

0
0
0

@corbet

On the one hand, this sucks.

On the other hand, it might help some people realize:

1) It is always a bad idea to expect any Google service to remain in place, given their ... ha ha ha ... track record.

2) Don't use URL shorteners. Ever. WTF is wrong with you, just don't do it.

0
0
0

@corbet I get the feeling this is one of those things that is going to cause mass havoc because some legacy software uses these internally and that when enough people scream about it, Google will be forced to keep the links working.

0
0
0

@corbet
Ah, makes sense. I mean they removed their "Don't be evil" statement. Now they have to act accordingly...
@netzwerkgoettin

0
0
0

@corbet My first thought was that should be no problem for the Internet Archive to back up.

...then I thought, why does a scrappy nonprofit have to do it instead of the $400 billion company keeping them up in the first place?

0
1
2

@corbet this is irresponsible behavior. Old links should be kept alive.

0
0
0

@corbet I always felt that URL shorteners were the wrong solution for a problem that didn't exist. Unless, of course, you want to use them to ensure that people DON'T know what link they are clicking.

0
0
0

@corbet They should release the database into the public domain, so everyone can do what they can do.

0
0
0

@corbet @vegard public-inbox uses git for storage right? I wonder if it's possible to use git-replace to replace the blobs for those messages with the URLs rewritten.

0
0
0

@luna @shiri @corbet

Thank you. This was what I was looking for.

@textfiles

Is almost certainly already aware of this, but tagging him here just in case.

[Edit] never mind. He already chimed in down thread.

0
0
1

@luna @shiri @corbet URLTeam's been scraping goo.gl since 2019, according to the wiki. Fingers crossed that means things are in hand. But more archiveteam warrior VMs set to "archive team's choice" are always welcome.

1
0
0

PrOpRiEtTaRy link shortener service does PrOpRiEtTaRy things.

Use for all that has worth.

@corbet

0
0
0

@corbet Penny-wise and Pound foolish. One more for "Killed by Google"

https://killedbygoogle.com/

0
0
0

@hatter @vitriolix @tofugolem @corbet oh yeah, it's definitely costing them money to run it and they're going about it in as good a way as you can expect — most of the old Twitter-era services just quietly stopped working while nobody was looking. To be fair OP was right, really this is Twitter's fault for creating an artificial need for these silly forwarding services in the first place, although I'm sure analytics services would have normalised it anyway

But it does feel a bit odd. Like, Google's core business is (was?) a constantly updating, publicly searchable live index of almost every page on the internet, and they really find it too expensive to maintain a static index of a billion or so string-string key value pairs you can only look up by the primary key with exactly zero UI that's already set up and presumably doesn't do much traffic any more? It's going to cost them more to shut it down this gracefully than it would to run it for another decade, surely?

1
0
1

@corbet It's a good thing they dropped their former motto as it would be really hypocritical given their current policy of doing as much evil as possible.

0
0
0

@corbet @jom this is EXACTLY the reason people were warned about using URL-shorteners

"If that service goes away,all your links will be lost!" - "Nah! I'll just use a really big service that'll last longer than my server for sure"

It doesn't get much bigger than Google.

Didn't help.

0
0
0

@corbet

Oh what? So goo.gl has been deprecated since 2018, and an automated Google bot still has goo.gl in their email footer? They really don’t know what they are doing.

Up to message 18000 these are just footer links, so maybe 2000 real messages, which includes quotes and whole mail body copies.

@mvsde

0
0
0

@corbet General recommendation: don’t “shorten” URLs. That’s just another gatekeeper/database between readers and your website.

0
1
1

@corbet That's why I've never used an URL shortener...

0
0
0

@hatter @vitriolix @andrewt @tofugolem @corbet If no one writes a script for this, I’d say they don't even care about their archives? I mean, I would if I needed to. I bet it could be done with a oneliner!

1
0
0

@andrewt

Google is in the business of harvesting human behavioural data from its users and selling results from human prediction models to the highest bidder. They are also in the business of using their knowledge of said users to influence user decision-making and opinions, also at the behest of the highest bidder.

It is likely that the URL shortening service offers no additional behavioural data for them to harvest. Therefore, it is useless. So, shut it down as it consumes resources above 0 and running anything carries with it operational risks (however minimal) that they can do without.

@hatter @vitriolix @tofugolem @corbet

0
0
0

@tagomago @hatter @vitriolix @tofugolem @corbet I mean they *don't* care about that, we know that. They've clearly long since decided that old services, even well liked and used ones, are going to get shut down and it's up to users to deal with that. And that's mostly fair enough, they used to experiment a lot but that means most of them would fail sooner or later, and I'm sure as much as we complain when they do it, most people don't care and it doesn't really hurt Google's numbers. But I mean, their propensity to sunset everything has got to be a big part of why more businesses don't use their cloud offering. I even run Gmail from behind a forwarder in part because I don't entirely trust it will exist in five years or I'll want to use it if it does.

0
0
0

@corbet Ah, great, another step to digital wasteland. Time to admit that the internet is no cultural heritage, but ephemeral.

Truth is, (external/public) link shortening services always were a bad idea. They only exist because of microblogs, where you have to fight for each precious message character.

However, I feel like having created such an abomination should lock you forever into the obligation to keep it alive, until the last referred-to link breaks.

0
0
0

@corbet
Would it possible to take all that 19,000 links, look where they go and make them available under another domain?

0
0
0

@corbet damned good reason for oss-security list to insist on including the important content from websites in posts to the list!

0
0
0

@danderson@hachyderm.io @shiri@foggyminds.com @corbet@social.kernel.org I thought it wasn't because under the warrior projects section, according to the reference at the start of the section pink is currently being scraped, and the row for goo.gl is white

0
0
0

@corbet

> Today, the time has come to turn off the serving portion of Google URL Shortener.

Not very long ago, everyone who worked at Google would have understood instinctively that there is no such time.

0
0
0

@bob @corbet
Was that the one used by the SpaceKaren dot sucks URL shortener? (which failed after a year when the domain wasn't renewed)

1
0
0

@corbet That's probably a substantial cultural loss. But on the other hand, who would have thought that embedding a link somewhere for the long term while needlessly relying on some specific proprietary third party service is a bad idea? I wish people would be more aware of the fact that all companies they interact with will try to lock them in in their product ecosystem if they can. Letting that happen might be convenient but always comes with a risk.

0
0
0

@corbet

Archive.org is up for helping...

The original URL shorteners thought about this, and archived their links with archive.org .

https://archive.org/details/301works?tab=about

I hope google joins now, and gives us the host domain so we can make them continue to work (redirect into the wayback machine that would archive the redirect).

please.

2
4
3

@corbet After downloading the mailbox file from the linked search result and poking in it, it appears a lot of those mentions are duplicates; removing duplicates gets it down to about 600~800 unique goo[.]gl links. In the case of LKML, that's fairly easy to archive.

I'm not sure the mailbox file is everything, though, so this may still be off.

Regardless, the closing of the service is still a massive loss.

0
0
0

@brewsterkahle
@c3manu said in https://chaos.social/users/c3manu/statuses/112812473668724559 :
-
@drewdevault if you wanna help without the luxury of getting the db, people are currently organising in -bs (still deciding on a dedicated channel name)

the url shortener project has been running for a while now, including for goo.gl urls

https://tracker.archiveteam.org:1338/status

@corbet

0
0
0

@corbet is tinyurl still a thing? ;p (insert xkcd here)

0
0
0

@dec23k @corbet

I couldn't say but my thoughts are about those corporate services that track everything including the last time you use the bathroom

0
0
0

@glitzersachen @corbet

We at MPAQ are always setting up services that the corporates use so that we don't need them anymore.

We have blogs, email, live music and many things. ATM, I'm working on a url shortener 😁

0
0
0
Edited 4 months ago

@glitzersachen @corbet

It doesn't matter what corporation it is, Fakebook and TwitterDumb are also on my hate list.

We are even hosting our own social network, Beamship 😁

0
0
0

@cdenesha @corbet nope, that would be too easy! to not make our work any less insane, google also introduced a bunch of silly ratelimits, just to throw us off…

0
0
0

@brewsterkahle @corbet thank you for having taken over purl.org when the library monopoly, err, cooperative, gave up on it

0
0
1

@corbet can the @internetarchive get a backup of this database for posterity?

0
0
0

@albertcardona @corbet not all of them can, not all of them will. You could also try to query the shortener as much as possible before the shutdown and thus export as much as possible of the mappings.

0
0
0

@corbet hey, I wrote you a short thing: https://git.sr.ht/~gnomon/fetch-goo.gl-shortlink-dereferences

It turns out that the ~19,000 goo.gl shortlinks in that lore search you posted deduplicate down to about 360 unique shortlinks. The script in that repo can pull down about 285 of them. Stuffs 'em in a smol sqlite3 database with a simple index that makes lookups, even in a tight loop, close to instant.

It's only the very simplest proof of concept, but it _does_ work. The Lore picture is not as bad as I expected.

1
0
1
@gnomon Replacing URLs isn't that hard, whether there's hundreds of them or thousands - a properly written script doesn't care.

I mentioned lore (and the LWN mailing-list archive) in particular because they are based on public-inbox. That is a great piece of software, but it has some interesting design decisions. Behind public-inbox is a Git repository with a single file called "m". Each message added to the archive goes in as a patch to "m". A mailing-list archive is a long series of Git commits to that one file.

What this means is that changing a message in the archive comes down to a rebase operation. Lots of fun in an archive with millions of messages (and thus millions of commits to rebase) in it. It's doable, but it's not fast or easy. It's not what public-inbox was designed to do. Archived emails aren't meant to change.

Changing URLs in an email will also mess with things like DKIM validation, of course.

This is why I think it's unlikely that the linux-kernel archive (or the LWN archive) will be patched; it's a massive job. But perhaps @monsieuricon has a different view of things...?
1
0
1

@corbet @monsieuricon indeed! By coincidence I happened to do a deep dive into the public-inbox codebase last week¹; it's part of why that script I wrote is the way it is. The sqlite3 DB fits into the existing prerequisites for that codebase.

While I agree that the git history rewriting would be bad, regenerating the Xapian indices would be even worse.

However I think a _render time_ transformation might work, conceptually like git's mailmap. I am experimenting.

¹: https://mastodon.social/@gnomon/112780218127791667

0
0
0

@corbet Maybe Google could ameliorate the pain of killing their Short URL service by setting up a system where you could query one of their short URLs that are going away, and get back a redirect to the URL it originally pointed to.

1
0
0

@corbet I thought they did that a few years ago.

0
0
0

@corbet @netzwerkgoettin

It's the #1 reason I try to avoid URL-shorteners (reason #2 is that they can insert unwanted stuff in the redirect).

Hopefully @internetarchive, @textfiles or @ArchiveTeam will step in and try to archive as much as possible, and users of googl shortener will expand the shortened URLs in time.

I will track them down the ones used on my blog today and if still there: expand them.

0
0
0

clacke: exhausted pixie dream boy 🇸🇪🇭🇰💙💛

@brouhaha @corbet So ... not killing their service?
0
0
0

@corbet
I never used it or any other, anticipating this.

0
0
0