Subject:

URLs


Date: Message-Id: https://www.5snb.club/posts/2020/urls/

My requirements for URLs on websites, and what makes a good URL.

URLs MUST be stable. A link to a specific post, shouldn’t change what it points to 5 years down the line. Nor should it die, if the content’s still available but under a different name.

URLs SHOULD be user editable, and the parts that aren’t should be obviously opaque.

URLs MAY have immediately relevant information in them that isn’t strictly needed to resolve the URL.

Here’s a sampling of URLs

Reddit

Take the reddit url as an example. Splitting it up into parts lets us see

All in all, the URL is fairly descriptive, if not a bit long.

The only actually critical part of the URL is the post id, 6g3sc2. You can just go to https://www.reddit.com/6g3sc2 and it will take you to that post. The rest of the URL is there to show information.

GitHub

The only non-descriptive element here is the issue number, but even then, because it’s all hierarchical, issue numbers tend to be far lower than a reddit post id, so it’s more feasible for someone to remember issue numbers.

And a github URL is very hierarchical. wg-allocators could be completely different repos depending on what user it’s under, and same for the issue number. This helps to keep identifiers short, as they don’t need to be globally unique, just unique under the parent namespace.

All in all, compact and pretty informative as to roughly where you’re going. Maybe adding the issue title would help give more context, but it would just be some text that’s taking up space, since issue titles aren’t identifiers.

Rustdoc Source

There’s very little redundancy here, and all of the information is human readable and understandable.

Rustdoc Main

All in all, readable, and you can have a good chance of guessing how to link to something you’ve not seen the URL for.

Youtube channel

This isn’t all that useful.

Noteworthy is youtube has user pages, which look like https://www.youtube.com/user/NurdRage. This is an informative URL, very high signal to noise ratio. But channel pages get the fun base64. But wait, there’s more! There’s also new style channels, which look like https://www.youtube.com/c/Nighthawkinlight. These are like the user pages with a human readable name, but using /c/ instead of /channel/. Why? No idea.

Youtube video

It is opaque, but there’s not much shorter you can get, there’s not extra shit tacked on for the fun of it. (Youtube has a short link in the form of youtu.be/<video id>)

And you can modify the link to start at a specific timestamp, using t=1337, where 1337 is the number of seconds past the start of the video. I’d prefer it to be a colon delimited timestamp though, as that’s more readable. But even still, anyone with a simple calculator can work out the seconds to start at a given timestamp.

Discord

(Discord URL has been modified, but it doesn’t change the point)

I would not at all be surprised if only the message id is really needed here. And it’s not like the server id and channel id are providing any useful information.

It’s not like you’re really meant to be using these, in any case. They don’t even embed, and using them’s a pretty bad experience.

Amazon

Okay… You ready?

All of this shit isn’t needed, by the way. Just https://amazon.co.uk/dp/B0791RGQW3 works fine. So if you’re sending an amazon link, strip the tracking shit out.

Conclusion

Opaque URLs like youtube’s are good in that they don’t contain any information that might need to change. For instance, if you have a deadname in a URL, perhaps as a github username, that’s a problem. At best, you’re able to change it, and have hard redirects to the new name, but there will still be old links floating around with the old information.

On the other hand, this advantage of not keeping any information has the issue that… the URL provides no information.

A URL doesn’t need multiple opaque identifiers though. If you are going to do that, then either make use of the hierarchy to shorten the URL, or cut leave it as a direct link to the object, and cut out the hierarchy.

Informational text that doesn’t help resolve the URL can be useful, but it should be obvious to a user what’s informational text, so they can strip it out.

Any URL components that are not human readable and that don’t help resolve the page can, and should, just be removed. Or at the very least, if you do feel the need to track users, use a smaller identifier than what amazon uses. And name it something obvious, like ?tracking.

If you’re going to have an identifer that’s intended to be opaque, it should be completely opaque. Don’t use the timestamp or any structured data in it. An exception being using, say, an issue number, because the number there is known to the users and it’s reasonably expected to be public. But seeing a URL should not let anyone who is not the website see any information about the user who created the URL.

If you have anything after a hash, that better take you to a specific part of a page, with the URL without the hash still being valid.