Friday, 16 October 2015

System Deployment (part 6 of x)

Time to wander into a slightly different topic: now that the mechani.se domain is back up and running it is time to setup a gitolite3 installation on face.mechani.se

Problem Context for gitolite.


My way of working on a linux system has evolved over the years because of some specific desires:

  • I work on several machines - my environment should always be the same
  • I dislike the hassle of maintaining backups - but I need to know that they are in-place, and every-time I switch machine I am effectively doing a restore.
  • Switching machines should break my context as little as possible.
The third point is the killer; during a session on one machine I build up a thick and fecund context. Depending on the work it may be edits to source, environmental variables, command history and other forms of state that are local to the machine. Over time each machine acquires a layering of packages and installed artefacts (libraries, modules, random pieces of related source). Even seemingly inconsequential parts of the machine state are useful: the collection of workspaces and windows, positions and combinations are all memory prompts for a particular task. 

The original dream (probably not even mine, these things tend to be infectious) was a teleporting environment: effectively hibernate a machine and transport the hibernated image to a new machine to be restored. These are the idle dreams of a grad student who works late through the night and doesn't want to start fresh when he trudges into the office. These dreams never quite found traction, although years of experimenting allowed them to morph into something more useful.

Virtualbox introduced teleportation a few years ago. The reality involves more suckage than the dream. Image files are large and cumbersome to transport. Somewhere I have a custom diff utility that syncs the hard-drive overlays inside a .VDI against a target-set to reduce the diff to a few hundred megs at a time (possible over my positively rural ADSL as well as on a flash drive). It just didn't really work out for us. Versioning entire OS images, avoiding branches and generally playing the maintain-the-repo-consistency game on a larger scale was even less fun that you would think.

It turns out that the answer is very boring and simple - its more of a process than an artefact.

Version control for everything


Most of work falls neatly into two categories:
  • Things that I know I will want to archive.
  • Things that will definitely be deleted after trying something out.
This clean taxonomic split is the stuff that programmers live for. It suggests a scratch directory that never needs to be backed up or transported off machine, and a set of version control repositories for everything that I would displeased to lose. That is where people balk at the idea of how much hassle it would be to keep everything in sync, and the server-side issue of maintaining a bare-bones repository against each of those projects. I wrote a script. Well, truth be told I wrote quite a few over the years find the right way to work, but in the end they all collapsed into a single script.

Much like a sordid-ploy by Sauron there is a single repository that rules them all, gitenv:
  • Every configuration file in my home directory is linked into this folder. Using git on the dense representation (the folder) allows version control over a sparse hierarchy of files (the overlay on my home directory). This is a nice trick. In some places these are symlinks to prevent chaos, and in other places we go straight for the jugular with hard-links (e.g. the .ssh directory is hard-linked to a directory inside the repository so that everything within can be versioned and shared across multiple machines).
  • Shell history for each machine is stored so history across all machines is searchable.
  • Custom bin directory, for all the magic.
  • A secrets file. Yup, these are a terrible idea, but then again so is losing access to a password. Which is why mine is encrypted using gpg and the plaintext contents never touch disk. In theory. Although my security needs are not particularly challenging and everytime I screw the passphrase I end up splashing the contents across the file-system. Yay!
This repository has a very strange property for source-control: the files within act as if they are in continuous change. Normally the state of a repository's contents acts discretely: things do not change in-between git commands. But linking the known_hosts file, and the shell history into this repository means that the contents are always dirty. Because it is alway dirty it always needs to merge against the remote - so each machine has a slightly different history for this repository. It is challenging to work with.

Everything else is simple in comparison, there is a single independent repository for each:
  • Project - independent tree for a piece of source, with docs and test data.
  • Course - all materials and archives of student submissions.
  • Document collections - articles, books etc
  • Web server - each active server has its contents in source control - these repositories have post-commit hooks to deploy the master branch live on the machine.
This means that each machine that I use has a collection of a few dozen repositories. This would be a serious pain to maintain by hand. Instead one script takes care of the difficult between the continuous environment repository and the server (and its mirrors), and then works out how close to consensus the rest of the repositories are. Where the actions to establish consensus are simple (i.e. the repository is purely ahead or behind) the script brings it into line automatically. This makes things sane.

Transporting state between machines is the same as using backup/restore. This is absolutely essential - it means that the backup and the restore mechanism are in use every day. When you positively, absolutely need to rely on a system, make sure that you eat your own dog-food. Mmmm chewy. The weird thing about my backup and restore system is that any two machines rarely have exactly the same contents - but they all chase the same consensus state, and progress towards synchronisation is monotonic. This is actually good enough to make sure that nothing is every lost.

Gitolite3


Git servers are nice and easy. Manually keeping track of repository details is an absolute pain in the arse. Thankfully gitosis, and now gitolite have made that process incredibly simple. Despite that simplicity I have not yet worked out how to integrate this into the preseeded process, so for now this is a dangling live piece of state on the server. [Note to self: seeing it like this it is quite obvious running this with the sudo flipped around, root or root->git should make it easy]

Each git server needs a user dedicated to gitolite3:

sudo adduser --system --shell /bin/bash --gecos 'Git version control' --group --disabled-password --home /home/git git
# Copy public key into git.pub
sudo su git
cp ../main/rsa_git.pub git.pub
gitolite setup -pk git.pub

The docs make it look much more complex, but on debian if you have installed the gitolite3 package this is all there is to it. Don't reuse a key - it may seem easier in the short-term but it actually makes things much more complex in the long term. Dedicate a key to git, and use an agent properly!

All the repositories inside gitolite are bare - this is the point of a server, guaranteed push. This has been running quite happily against a single server or years, as I do the upgrade I'm setting up a second mirror for the git server. I haven't tried automated the sync between mirrors yet - there is a bit of thought to be had first about whether or not pushes are guaranteed in a system with mirrors. I'm sure it will be fun to find out :)

I am always forgetting the admin repo URLs as there are slight differences in git-urls under the different protocol prefixes, but here it is as simple as:

git clone git@face.mechani.se:gitolite-admin face-gitolite-admin

Inside the admin repo the key is already in place so the config layout becomes completely uniform, conf/gitolite.conf looks like:

repo randomforests
    RW+ = git

repo paperBase
    RW+ = git

So now the allsync.py script needs some tweaks to handle multiple remotes as mirrors...



No comments:

Post a Comment