Thoughts On Distributed Issue Tracking

Date: Wednesday, 27 May 2020

Git’s a distributed code versioning system, but software projects are more than just code. We should be able to version our issues and pull requests in the same repo. When you clone a repo from one service (Say, GitHub), and push that repo to some other service (Say, a self-hosted git server), the code and all issues and pull requests / other items should be moved along with it.

Below I’ve written up some rough thoughts about how we should do this.

Requirements

All data should be stored in a human readable format. Someone who finds a repo with issues logged against it with zero knowledge of the specifics should be able to track down an issue and read comments against it. It doesn’t have to be the easiest thing, but it should be possible. This is so that in 15 years when someone finds a commit labelled fixes the bar as mentioned in issue abcdef, you’re able to find the context needed.

It should be possible for one repo to be the issue tracker for multiple repos, or one repo to have multiple distinct issue trackers in different subdirectories. The default case is for the issue repo to be the same as the code repo.

It should work on completely unmodified servers. You should be able to type git issue add <issue title> and push it to GitHub without problems, and then anyone fetching the repo has access to those issues.

Storage

You can do this either by using a branch or a constantly overwritten tag. Advantages of the branch is that it’s a lot cleaner (Git expects branches to change, so it’s not as much of a hack to do this on an orphan branch). A tag that’s constantly overwritten has the (possible) advantage that it’s not cloned by default, but every time you want to fetch it, you’ll need to update your copy of the tag with whatever’s on the remote, since git won’t overwrite a tag by default.

Format

The file system should be used as much as possible, as opposed to JSON / binary files, to reduce/eliminate the need for git merge drivers.

Each object type (issue, pull request, something else) is stored under the root as a folder.

Under that, each object type defines its own storage format, other than each individual object being a folder (So an issue would be issues/arbitrary_identifier/, with its individual files being stored under that)

The identifier that each item is known as would be the commit hash in which it was added. (The tools would ensure only one item is added per commit, adding more than one is invalid). This makes git show make sense. This is more verbose than a sequential number such as #44580, but has the advantage of not needing a central server to maintain uniqueness. Whenever possible the full hash should be used (In comments, the hash should be expanded out to be the full hash when being stored, with the hash collapsed when being shown)

Issue Format

The body and comments are stored as markdown (So they can be easily read). The body / every comment is stored as a separate file

Metadata like the title is stored before the body in header-like format, such as

Title: This is a title for a big issue

Here we have a big issue body

Tags are stored as zero byte files in a tags folder.

Pull Request Format

A pull request is basically an issue, only in a pull requests folder instead of issues, and a Target-Branch: header is added to the body.

Pull request comments can also have their own header to indicate what they’re against, perhaps the lines, then a commit hash, then a file name. The lines come first because it’s easier to parse that way, no need to handle the case where a file name has something that looks like line numbers

Comment-For: L25-35 abcdef0123456:/src/foo/bar.rs

This unsafe block shouldn't be here, you can instead use list.copy_within(3..6, 10);

That being said, there’s no real reason why issues shouldn’t be able to comment on lines of code, perhaps to point out buggy sections.

UI

Of course, this doesn’t have to be purely used through the command line, and indeed I’d expect a lot of the use to be through a web interface. Git services already have the ability to make commits in a repo, so making issues in this format would be just a matter of handling it correctly.

Whenever you click on the issues tab, it would read off the issues from the current state. Whenever you do an action, it would be handled the same as when you edit a file online.

A secondary advantage here is that tools that want to track/create issues/pull requests (Say, IDE integration) don’t need to authenticate to the server, they just need to read/write your local repo and push the state as needed. You could even self-host the web UI completely offline if you prefer working like that.

Closing Thoughts

I’ve been vague about some specifics of this, mainly because I haven’t actually written this as a project, and have no experience writing issue tracking systems. I do have experience moving to a different source repo and not being able to find the issues that old commits have referenced because they’re 2 issue trackers ago.

I suppose I’m not as much asking for my solution as I am asking for something to be done. We wouldn’t accept all the commit history being lost every time we move repos, so every developer should have a full copy of all the project history and relevant information stored such that they can work on it offline.

And in any case, Fossil can do it and a lot more.

Hm, I might check out (pun half intended) Fossil. Maybe a future blog post?