Date: Message-Id: https://www.5snb.club/posts/2020/urls/
My requirements for URLs on websites, and what makes a good URL.
URLs MUST be stable. A link to a specific post, shouldn’t change what it points to 5 years down the line. Nor should it die, if the content’s still available but under a different name.
URLs SHOULD be user editable, and the parts that aren’t should be obviously opaque.
URLs MAY have immediately relevant information in them that isn’t strictly needed to resolve the URL.
Here’s a sampling of URLs
- https://www.reddit.com/r/rust/comments/6g3sc2/best_way_to_multithread_a_simple_function/ (reddit)
- https://github.com/rust-lang/wg-allocators/issues/17 (github)
- https://doc.rust-lang.org/src/std/io/mod.rs.html#502-964 (rustdoc)
- https://www.youtube.com/channel/UC1usFRN4LCMcfIV7UjHNuQg/videos (youtube channel)
- https://www.youtube.com/watch?v=dQw4w9WgXcQ (youtube video)
- https://discordapp.com/channels/729293063826175566/729293064267238942/738076547881893032 (discord)
- https://www.amazon.co.uk/dp/B0791RGQW3/ref=s9_acsd_al_bw_c2_x_0_t?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=merchandised-search-11&pf_rd_r=SQHJ5TXNCSRE9PRP4QAG&pf_rd_t=101&pf_rd_p=eb3feabb-ea62-4002-b39e-5ccc29f387ba&pf_rd_i=14100223031 (amazon)
Take the reddit url as an example. Splitting it up into parts lets us see
https://www.reddit.com/We’re going to reddit
r/We’re going to a subreddit page
rust/We’re going to the rust subreddit
comments/We’re likely to see comments
6g3sc2/Some opaque identifier
best_way_to_multithread_a_simple_function/The post is about multithreading
All in all, the URL is fairly descriptive, if not a bit long.
The only actually critical part of the URL is the post id,
6g3sc2. You can just go to
https://www.reddit.com/6g3sc2 and it will take you to that post. The rest of the URL is there to
https://github.com/We’re going to github
rust-lang/It’s under the
rust-languser or organisation
wg-allocators/The repo is
issues/We’re looking at issues
17The 17th issue.
The only non-descriptive element here is the issue number, but even then, because it’s all hierarchical, issue numbers tend to be far lower than a reddit post id, so it’s more feasible for someone to remember issue numbers.
And a github URL is very hierarchical.
wg-allocators could be completely different repos
depending on what user it’s under, and same for the issue number. This helps to keep identifiers
short, as they don’t need to be globally unique, just unique under the parent namespace.
All in all, compact and pretty informative as to roughly where you’re going. Maybe adding the issue title would help give more context, but it would just be some text that’s taking up space, since issue titles aren’t identifiers.
https://doc.rust-lang.org/We’re going to a documentation page
src/Viewing the source code of something
mod.rsfile, rendered as HTML
#502-964Lines 502 to 964 are highlighted
There’s very little redundancy here, and all of the information is human readable and understandable.
https://doc.rust-lang.org/Again, we’re going to a documentation page
std/io/But not the source code, just the
iomodule in std
trait.Read.htmlWe’re seeing the documentation for a trait named Read
#method.read_to_endAnd going to the
read_to_endmethod on it.
All in all, readable, and you can have a good chance of guessing how to link to something you’ve not seen the URL for.
https://www.youtube.com/We’re going to youtube
channel/Viewing a channel
UC1usFRN4LCMcfIV7UjHNuQg/Some long opaque identifier
videosBut at least we know we’re seeing the videos.
This isn’t all that useful.
Noteworthy is youtube has user pages, which look like https://www.youtube.com/user/NurdRage.
This is an informative URL, very high signal to noise ratio. But channel pages get the fun
base64. But wait, there’s more! There’s also new style channels, which look like
https://www.youtube.com/c/Nighthawkinlight. These are like the user pages with a human readable
name, but using
/c/ instead of
/channel/. Why? No idea.
https://www.youtube.comWe’re going to youtube
/watch?Watching a video
v=dQw4w9WgXcQWith this opaque video id.
It is opaque, but there’s not much shorter you can get, there’s not extra shit tacked on for the
fun of it. (Youtube has a short link in the form of
And you can modify the link to start at a specific timestamp, using
t=1337, where 1337 is the
number of seconds past the start of the video. I’d prefer it to be a colon delimited timestamp
though, as that’s more readable. But even still, anyone with a simple calculator can work out the
seconds to start at a given timestamp.
(Discord URL has been modified, but it doesn’t change the point)
https://discordapp.com/Going to discord
channels/Seeing a channel
729293063826175566Opaque server id
729293064267238942Opaque channel id
738076547881893032Opaque message id.
I would not at all be surprised if only the message id is really needed here. And it’s not like the server id and channel id are providing any useful information.
It’s not like you’re really meant to be using these, in any case. They don’t even embed, and using them’s a pretty bad experience.
Okay… You ready?
https://www.amazon.co.uk/Going to amazon
B0791RGQW3/The actual unique product id
ref=s9_acsd_al_bw_c2_x_0_t?And a whole bunch of opaque identifiers that amazon probably cares about, but I don’t.
All of this shit isn’t needed, by the way. Just https://amazon.co.uk/dp/B0791RGQW3 works fine. So if you’re sending an amazon link, strip the tracking shit out.
Opaque URLs like youtube’s are good in that they don’t contain any information that might need to change. For instance, if you have a deadname in a URL, perhaps as a github username, that’s a problem. At best, you’re able to change it, and have hard redirects to the new name, but there will still be old links floating around with the old information.
On the other hand, this advantage of not keeping any information has the issue that… the URL provides no information.
A URL doesn’t need multiple opaque identifiers though. If you are going to do that, then either make use of the hierarchy to shorten the URL, or cut leave it as a direct link to the object, and cut out the hierarchy.
Informational text that doesn’t help resolve the URL can be useful, but it should be obvious to a user what’s informational text, so they can strip it out.
Any URL components that are not human readable and that don’t help resolve the page can, and
should, just be removed. Or at the very least, if you do feel the need to track users, use a
smaller identifier than what amazon uses. And name it something obvious, like
If you’re going to have an identifer that’s intended to be opaque, it should be completely opaque. Don’t use the timestamp or any structured data in it. An exception being using, say, an issue number, because the number there is known to the users and it’s reasonably expected to be public. But seeing a URL should not let anyone who is not the website see any information about the user who created the URL.
If you have anything after a hash, that better take you to a specific part of a page, with the URL without the hash still being valid.