Monday, 19 October 2015

System Deployment (part 7 of x)

The Mechanise server is back up and running, with a mirror of each git repository on it. Before teaching starts for the winter term it needs the web-server used for the courses to be brought back into life. This is complicated by a massive rewrite / redesign of large chunks of it that will probably stretch long into the term.

The web server.

This should be a simple beast: python code using the twisted library for HTTP processing. Large chunks of the site are content served dynamically to each student. Over the years it has been a testbed for pedagogic projects: generating unique assignments for students, integrating automatic testing into the submission system and other crazy ideas. As a result it has sprawled out of control, and the software architecture looks inspired by Picasso having a merry old time high on weapon-grade LSD.

The first step is get the deployment system working again on the server. When the git repository hosting the server is updated a post-update hook springs into life:

  • Copy the source tree and resources into the production tree.
  • Kill the old server.
  • Respawn the noob.
Git hooks are a strange mess of server-side state that is not versioned... Inside the bare repository on the server we update files in the .git/hooks directory that git will execute during certain actions. The post-update hook is the one that will redeploy the server:


echo "Website updated from commit" | logger -t gitolite
GIT_WORK_TREE=/var/www/thesite git checkout -f | logger -t gitolite
chmod 755 -R /var/www/thesite
chown git:git -R /var/www/thesite
curl -s http://localhost/restart
sleep 2
top -bn1 | grep python
ps -A --forest | grep -C1 python
tail /var/log/syslog

Like all archeologists we can find evidence of panic among the primitive people. The sleep followed by a dump of info is a sure sign that something did not work once, and that the confirmation used to debug that was so comforting that it was never removed.

Using a URL to kill the server is asking for trouble, currently we check the incoming transport that it originated on the 127 interface. This should not be spoof-able, but if it is then we can use a random number in the file-system to lock this request. This approach works better than a direct kill from the gitolite user as:

  • No worries about serialisation; if we are in the processing hook for the restart page then any file I/O for another request is done.
  • No worries about privileges to kill a process belonging to another user with introducing a privilege escalation attack.

First we need the user that will run the server, and their home-directory. The user www-data is already installed on debian for this purpose:

main@face:~$ grep www-data /etc/passwd
main@face:~$ sudo su www-data
[sudo] password for main:
This account is currently not available.
main@face:~$ sudo su -s /bin/bash www-data
www-data@face:/home/main$ cd
bash: cd: /var/www: No such file or directory

It is designed not to permit casual use - once upon a time it was a dreadful security hole when people would forget to set a strong password on the account, or even worse leave the default in place. We actually like it that way, so we will create the home directory that it needs and leave it disabled so that only the root user can log into it by forcing a different shell.

root@face:/home/main# mkdir /var/www
root@face:/home/main# ls -ld /var/www
drwxr-xr-x 2 root root 4096 Oct 19 09:34 /var/www
root@face:/home/main# chown git:git /var/www
root@face:/home/main# ls -ld /var/www
drwxr-xr-x 2 git git 4096 Oct 19 09:34 /var/www
root@face:/home/main# echo >/var/www/webservice <<EOF

cd /var/www/thesite
while true
authbind python 2>&1 | logger -t www
echo "Web server exited, restarting" | logger -t www
sleep 2

root@face:/home/main# chown www-data:www-data /var/www/webservice
root@face:/home/main# chmod 744 /var/www/webservice
root@face:/home/main# su -s /bin/bash www-data
www-data@face:/home/main$ cd
www-data@face:~$ ls -al
total 12
drwxr-xr-x 2 git git 4096 Oct 19 09:37 .
drwxr-xr-x 13 root root 4096 Oct 19 09:34 ..
-rwxr--r-- 1 www-data www-data 167 Oct 19 09:37 webservice
www-data@face:~$ ./webservice
./webservice: line 3: cd: /var/www/thesite: No such file or directory
root@face:/home/main# mkdir /var/www/thesite
root@face:/home/main# chown git:git /var/www/thesite
root@face:/home/main# chmod 755 /var/www/thesite
root@face:/home/main# touch /etc/authbind/byport/80
root@face:/home/main# chown www-data:www-data /etc/authbind/byport/80
root@face:/home/main# chmod 500 /etc/authbind/byport/80
root@face:/home/main# ls -l /etc/authbind/byport/80
-r-x------ 1 www-data www-data 0 Oct 19 10:17 /etc/authbind/byport/80

Cool. The user has just enough privileges to execute the service script, but it cannot do anything else as it has no write permission anywhere. The gitolite user owns the www-data home-directory and the yet to be created thesite directory inside it. This is the target that we perform the bare checkout into each time the repo is updated. The basic workflow is like this:

  • Dev work happens off the server using -local to run the server in the non-production environment.
  • Deployment happens when the the dev-commits are pushed back up to face.
  • The update-hook fires:
    • The production source tree is updated with the new code.
    • The old server is killed.
    • The service script spawns a new server after a couple of seconds.

Friday, 16 October 2015

System Deployment (part 6 of x)

Time to wander into a slightly different topic: now that the domain is back up and running it is time to setup a gitolite3 installation on

Problem Context for gitolite.

My way of working on a linux system has evolved over the years because of some specific desires:

  • I work on several machines - my environment should always be the same
  • I dislike the hassle of maintaining backups - but I need to know that they are in-place, and every-time I switch machine I am effectively doing a restore.
  • Switching machines should break my context as little as possible.
The third point is the killer; during a session on one machine I build up a thick and fecund context. Depending on the work it may be edits to source, environmental variables, command history and other forms of state that are local to the machine. Over time each machine acquires a layering of packages and installed artefacts (libraries, modules, random pieces of related source). Even seemingly inconsequential parts of the machine state are useful: the collection of workspaces and windows, positions and combinations are all memory prompts for a particular task. 

The original dream (probably not even mine, these things tend to be infectious) was a teleporting environment: effectively hibernate a machine and transport the hibernated image to a new machine to be restored. These are the idle dreams of a grad student who works late through the night and doesn't want to start fresh when he trudges into the office. These dreams never quite found traction, although years of experimenting allowed them to morph into something more useful.

Virtualbox introduced teleportation a few years ago. The reality involves more suckage than the dream. Image files are large and cumbersome to transport. Somewhere I have a custom diff utility that syncs the hard-drive overlays inside a .VDI against a target-set to reduce the diff to a few hundred megs at a time (possible over my positively rural ADSL as well as on a flash drive). It just didn't really work out for us. Versioning entire OS images, avoiding branches and generally playing the maintain-the-repo-consistency game on a larger scale was even less fun that you would think.

It turns out that the answer is very boring and simple - its more of a process than an artefact.

Version control for everything

Most of work falls neatly into two categories:
  • Things that I know I will want to archive.
  • Things that will definitely be deleted after trying something out.
This clean taxonomic split is the stuff that programmers live for. It suggests a scratch directory that never needs to be backed up or transported off machine, and a set of version control repositories for everything that I would displeased to lose. That is where people balk at the idea of how much hassle it would be to keep everything in sync, and the server-side issue of maintaining a bare-bones repository against each of those projects. I wrote a script. Well, truth be told I wrote quite a few over the years find the right way to work, but in the end they all collapsed into a single script.

Much like a sordid-ploy by Sauron there is a single repository that rules them all, gitenv:
  • Every configuration file in my home directory is linked into this folder. Using git on the dense representation (the folder) allows version control over a sparse hierarchy of files (the overlay on my home directory). This is a nice trick. In some places these are symlinks to prevent chaos, and in other places we go straight for the jugular with hard-links (e.g. the .ssh directory is hard-linked to a directory inside the repository so that everything within can be versioned and shared across multiple machines).
  • Shell history for each machine is stored so history across all machines is searchable.
  • Custom bin directory, for all the magic.
  • A secrets file. Yup, these are a terrible idea, but then again so is losing access to a password. Which is why mine is encrypted using gpg and the plaintext contents never touch disk. In theory. Although my security needs are not particularly challenging and everytime I screw the passphrase I end up splashing the contents across the file-system. Yay!
This repository has a very strange property for source-control: the files within act as if they are in continuous change. Normally the state of a repository's contents acts discretely: things do not change in-between git commands. But linking the known_hosts file, and the shell history into this repository means that the contents are always dirty. Because it is alway dirty it always needs to merge against the remote - so each machine has a slightly different history for this repository. It is challenging to work with.

Everything else is simple in comparison, there is a single independent repository for each:
  • Project - independent tree for a piece of source, with docs and test data.
  • Course - all materials and archives of student submissions.
  • Document collections - articles, books etc
  • Web server - each active server has its contents in source control - these repositories have post-commit hooks to deploy the master branch live on the machine.
This means that each machine that I use has a collection of a few dozen repositories. This would be a serious pain to maintain by hand. Instead one script takes care of the difficult between the continuous environment repository and the server (and its mirrors), and then works out how close to consensus the rest of the repositories are. Where the actions to establish consensus are simple (i.e. the repository is purely ahead or behind) the script brings it into line automatically. This makes things sane.

Transporting state between machines is the same as using backup/restore. This is absolutely essential - it means that the backup and the restore mechanism are in use every day. When you positively, absolutely need to rely on a system, make sure that you eat your own dog-food. Mmmm chewy. The weird thing about my backup and restore system is that any two machines rarely have exactly the same contents - but they all chase the same consensus state, and progress towards synchronisation is monotonic. This is actually good enough to make sure that nothing is every lost.


Git servers are nice and easy. Manually keeping track of repository details is an absolute pain in the arse. Thankfully gitosis, and now gitolite have made that process incredibly simple. Despite that simplicity I have not yet worked out how to integrate this into the preseeded process, so for now this is a dangling live piece of state on the server. [Note to self: seeing it like this it is quite obvious running this with the sudo flipped around, root or root->git should make it easy]

Each git server needs a user dedicated to gitolite3:

sudo adduser --system --shell /bin/bash --gecos 'Git version control' --group --disabled-password --home /home/git git
# Copy public key into
sudo su git
cp ../main/
gitolite setup -pk

The docs make it look much more complex, but on debian if you have installed the gitolite3 package this is all there is to it. Don't reuse a key - it may seem easier in the short-term but it actually makes things much more complex in the long term. Dedicate a key to git, and use an agent properly!

All the repositories inside gitolite are bare - this is the point of a server, guaranteed push. This has been running quite happily against a single server or years, as I do the upgrade I'm setting up a second mirror for the git server. I haven't tried automated the sync between mirrors yet - there is a bit of thought to be had first about whether or not pushes are guaranteed in a system with mirrors. I'm sure it will be fun to find out :)

I am always forgetting the admin repo URLs as there are slight differences in git-urls under the different protocol prefixes, but here it is as simple as:

git clone face-gitolite-admin

Inside the admin repo the key is already in place so the config layout becomes completely uniform, conf/gitolite.conf looks like:

repo randomforests
    RW+ = git

repo paperBase
    RW+ = git

So now the script needs some tweaks to handle multiple remotes as mirrors...

Thursday, 15 October 2015

System Deployment (5 of x)

Fixing the dodgy network settings. The debian-installer finds a hostname from DHCP of another site running on my providers service. Manually editing /etc/hostname and then rebooting to solve this.

Seems like the static configuration worked out well. Seeing this output gives me a nice warm feeling:

main@face:~$ netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:ssh *:* LISTEN
tcp6 0 0 [::]:ssh [::]:* LISTEN

I like to know exactly what is running on a server, and how many access points it adds for an attacker. In this state the box should be safe until another openssh exploit is discovered, and they do not happen every year. Good enough.

Next I need to bring up bind9 for the domain (ha, there goes any security hopes :). Every step now will be done twice:

  • Live steps on the server (configuration drift)
  • Replicating the changes in the file-sytem overlay inside the remastering .iso

At the end I'll blast away the live system to check the installer can rebuild it properly. This won't happen very often, because the KVM simulated access to the raw disk is as slow as it is possible to be. The automated install (minus the manual modprobe for the driver, which is still bugging the crap out of me) takes about 40 minutes. Not so nice.


The best way to configure bind9 is using an old configuration that works :) Otherwise proceed very carefully, and read as much as you can about security errors in bind9 configurations. For recovery of old files I took a not so elegant approach to backing up the live server: dd the disk and copy it to a workstation. Mounting this loopback (before killing the old server) gives me a simple way to recover things.

Disaster: apparently I did not dd the disk offsite. Instead I took the shortcut of tar'ing the entire file-system as it was running. This obviously made some sort of sense at the time. For inexplicable reasons it means that the /root directory is missing, along with the .bash_history inside it that was a record of the steps in building the server directly. Lesson: next time be ye not a dick and dd the disk offsite.

Not to worry, I deliberately cultivate an anally-rententive OCD-level journal of work. The steps to build the server will be in there... Disaster the 2nd: as I was running on 16-hour workdays with little sleep when the server was first installed there is a blank spot of a month in my notes. Oh bugger. Lesson: kind of unclear...

Well it can't be that hard to lookup again, I've already written the zone files once... Oh look, there's the zone files and everything is sealed inside a chroot jail so that I can run bind9 as a non-privleged user. That's a really cool idea, how the hell did I do that then?

Hmm, so it all changed with jessie then? I don't think that systemd and I will become friends after I prize its charred tentacles off of my drive.

The late_command in the preseed.cfg is getting a bit busy so we'll add a new script to the iso called chroot_bind9.bash:

mkdir -p /var/bind9/chroot/{etc,dev,var/cache/bind,var/run/named}
mknod /var/bind9/chroot/dev/null c 1 3
mknod /var/bind9/chroot/dev/random c 1 8
chmod 660 /var/bind9/chroot/dev/{null,random}
mv /etc/bind /var/bind9/chroot/etc   # Installed default from package
ln -s /var/bind9/chroot/etc/bind /etc/bind
cp /etc/localtime /var/bind9/chroot/etc/
chown -R bind:bind /etc/bind/*
chmod 775 /var/bind9/chroot/var/{cache/bind,run/named}
chmod 775 /var/bind9/chroot/var/{cache/bind,run/named}
chgrp bind /var/bind9/chroot/var/{cache/bind,run/named}
cp /cdrom/initd_bind9 /etc/init.d/bind9
# Next line is deliberately fragile - should be checked / rewritten if there is a major update to bind9
# Also - this is gnu sed style argument, not bsd.
sed -i 's:PIDFILE=/var/run/named/' /etc/init.d/bind9
echo "\$AddUnixListenSocket /var/bind9/chroot/dev/log" > /etc/rsyslog.d/bind-chroot.conf
# Skip service restarts as we will reboot soon

The makefile needs to be updated to get the new info into the .iso:

remaster.iso: copy
cp preseed.cfg copy/
cp isolinux.cfg copy/isolinux/
cp /home/amoss/.ssh/ copy/
cp chroot_bind9.bash copy/
chmod +x copy/chroot_bind9.bash
cp mechani.db copy/
tar czf copy/overlay.tgz -C config etc home/main
genisoimage -b isolinux/isolinux.bin -c isolinux/ -o remaster.iso -J -R -no-emul-boot -boot-load-size 4 -boot-info-table copy/

And lastly the preseed is updated to execute the new script:

d-i preseed/late_command string \
tar xzf /cdrom/overlay.tgz -C /target ; \
in-target chown -R main:main /home/main ; \
in-target chown root:root /etc/hosts ; \
in-target chown root:root /etc/ssh/sshd_config ; \
chmod 700 /target/home/main/.ssh ; \
in-target chown main:main /home/main/.ssh/authorized_keys ; \
chmod 600 /target/home/main/.ssh/authorized_keys ; \

Again, for emphasis: none of this has been tested yet - but what fun is life if we do not live dangerously, eh? After a robust exchange of views with my registrar about the quality of service they have rebuild the entente cordial by manually flicking some switches somewhere, and lo and behold:

dig +trace
; <<>> DiG 9.8.3-P1 <<>> +trace
;; global options: +cmd
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
. 14196 IN NS
;; Received 228 bytes from in 89 ms

se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
se. 172800 IN NS
;; Received 492 bytes from in 123 ms 86400 IN NS
;; Received 63 bytes from in 175 ms 1800 IN A 1800 IN NS
;; Received 63 bytes from in 49 ms

Is good, no? The domain has only been offline for a month due to a "routine upgrade" :) Next up I will restore the gitolite configuration and mirror my lonely git server...

Wednesday, 14 October 2015

System Deployment (4 of x)

The server is back up and running, having managed to do a hdd install inside the KVM environment for the first time. Yay!. There is not anything running on it yet. There is a rough todo list before the site is back up.

  • Find out why the static network config is broken, and disable dhcp, verify that sshd is the open port open on the machine.
  • Stick the bind9 config back on the machine and bring the DNS back to life.
  • Install gitolite and setup barebones mirrors of the git server on gimli.
  • Put the web-user back in, and setup the hooks to run the server, redeploy from git.
But first a brief detour through the boot sequence (need to convert this to lecture slides)...

Linux Boot Sequence

This has changed over the years, and will probably change again. These details are against the current stable branch of jessie (late 2015).

Step 0: BIOS

The BIOS is effectively the firmware for the specific PC that it is running on. It exists in order to bootstrap the machine: the kernel to load is somewhere in the machine storage. The BIOS should contain enough of the storage-specific drivers to access the storage and load in the next stage for boot. The first BIOS was introduced on the original IBM PC in 1983. It has not changed much since then. In 1983 the range of peripheral devices available was limited; the BIOS was meant to function as a hardware abstraction layer in an era when that meant accessing the console, keyboard and disk driver. This layer is ignored by modern kernels.

The BIOS on every machine (the industry is currently in a transition to UEFI to replace this completely) follows a simple specification to find and execute the Master Boot Record (MBR).
For each disk in the user-specified boot order (e.g. HDD, cdrom, usb etc) :

  • Load sector 0 (512 bytes) into memory at 7C00h.
  • Verify the two-byte signature: 7DFEh=55h, 7DFFh=AAh.
  • Jump to the code in 16-bit real mode with the following register setup:
    • Code Segment = 0000h
    • Instruction Pointer = 7C00h

Step 1: MBR (stage 1)

The MBR used in DOS / Windows has changed over the years to include support for Logical Block Addresses / Plug'n'Play and other extension. The BIOS-MBR interface must remain constant to guarantee that the boot sequence will work without knowing the specific combination of BIOS and O/S on the machine.

It is easy to access the MBR from a live linux system as it is at a fixed location on the disk, for example if we are on the first scsi disk in the system:

dd if=/dev/sda of=mbr_copy bs=512 count=1
dd if=mbr_copy of=/dev/sda bs=512 count=1

Booting Linux almost always means booting the GRUB MBR. If we want to see how that works then we can just disassemble the code in the mbr:

objdump -D -b binary -mi386 -Maddr16,data16 /usr/mdec/mbr

mbr_copy:     file format binary
Disassembly of section .data:
00000000 <.data>:
   0:   eb 63                   jmp    0x65
65: fa cli
66: 90 nop
67: 90 nop
68: f6 c2 80 test $0x80,%dl
6b: 74 05 je 0x72
6d: f6 c2 70 test $0x70,%dl
70: 74 02 je 0x74
72: b2 80 mov $0x80,%dl
74: ea 79 7c 00 00 ljmp $0x0,$0x7c79

Here we can see a check on a parameter passed in by the BIOS (%dl indicates which disk was booted), a quick check to see if it is a fixed or removable disk and then a direct call into a BIOS routine by its real-memory address. (the API for the BIOS is a list of addresses and register setups).

All of the stage 1 functionality has to fit into 510 bytes of 16-bit real-mode code. This is not a lot. To make life more interesting there is a data-structure embedded inside this code in a standard format, this is the partition table for the drive that gives us four primary partitions. To be read and written by standard tools this table must be at specific locations inside the sector. When we access this table, e.g. something like:

sudo fdisk /dev/sda

The fdisk tools needs to access the table without executing any code in the MBR to do so. This reduces the space for executable code to 446 bytes (4x 16-byte table entries). This is enough to find and locate a larger boot-stage on the disk, using the raw BIOS routines and execute this second stage. An incredibly detailed (and thus useful) walkthrough of the booting scheme can be read on the grub mailing list.

Step 3 : MBR (stage 2) 

The second stage boot-loader is much larger. In the windows world this is called the VBR and is loaded directly from the beginning of the partition to boot. GRUB (in the classic MBR scheme) loads its second stage from a gap in the disk - the rest of the first cylinder on the disk is used as padding to align the first partition on a cylinder boundary. This padding consists of 63 sectors, or 31.5K of space. This is enough space to identify a boot partition with a known file system and load files (not raw sectors) from there. Typically the second stage of GRUB will display a splash screen with a menu and allow the user to select what to boot next. For a windows install this means chainloading the initial sectors from the installed partition. For a linux install it means loading a kernel file into memory, along with an initial ramdisk holding a bootstrap filesystem and passing control to the kernel.

The splashscreen menu is control by a file-format for GRUB2 that looks like this:

menuentry "Remaster ISO" {
set root='(hd0,1)'
#loopback loop /remaster.iso
linux /vmlinuz initrd=/initrd.gz root=/dev/vda1 vga=788 auto=true panic=20 priority=critical preseed/file=/cdrom/preseed.cfg ---
initrd /initrd.gz

The root is the partition that contains the files for grub. Numbering is somewhat chaotic, harddisks are numbered from zero: hd0, hd1... while partitions are numbered from 1. So (hd0,1) is the first partition on the first disk. This corresponds to the partitioning scheme in the previous post (1GB partition to hold the kernel, initrd and .iso imagers for installers). The second stage will mount the ext2 file-system on that partition, then the filenames ('/vmlinuz') are absolute paths in that file-system. The kernel accepts some arguments from the boot-loader, panic=20 is very useful during development: when the kernel panics and refuses to boot it reboots the system back to grub after 20 seconds. When you don't have a physical keyboard to stab ctrl-alt-del on, this one saves a lot of coffee mugs from acts of extreme violence.

On some distros (gentoo springs to mind, although they may have updated it since I used it last) these config files are edited directly on the boot drive, normally under /boot/grub. Debian has some extra support for building the configuration. Each menuitem becomes an executable script under /etc/grub.d, so for example the above becomes /etc/grub.d/11_remaster by wrapping the contents in a here-document:

#!/bin/sh -e
cat << EOF
menuentry "Remaster ISO" {
set root='(hd0,1)'
#loopback loop /remaster.iso
linux /vmlinuz initrd=/initrd.gz root=/dev/vda1 vga=788 auto=true panic=20 priority=critical preseed/file=/cdrom/preseed.cfg ---
initrd /initrd.gz

Step 4: Kernel / initrd

GRUB mounted the boot drive in order to load the kernel files, but the kernel cannot access this mount: we want the kernel to be independent of the boot-loader, and looking through memory for these data-structures would represent a huge dependency. So the kernel will need to access the hardware and load the file-system itself. Standard boot-strip problem: where are the drivers to do this? They of course on the disk. Bugger.

In Linux the solution is quite elegant. Start with a / file-system already mounted, including all the necessary drivers. Use it to mount the real / file system on the disk, and then move the original out of the way. This is much easier to achieve, and doesn't involve its own bootstrap problem because we can just serialise the contents of a ramdisk directly onto the harddrive. The kernel can be booted with this ramdrive read back into memory. The tool for doing this is cpio, the contents get gzipped and GRUB knows how to load and unzip the initrd.gz directly into memory for the kernel to use during boot.

An important principe here is reuse of existing tools: the initrd is a standard filesystem (before the serialisation and zipping) so we can edit any part of the boot-image by mounting it and using standard tools on it. This is what the cycleramd.bash script in the previous post did to insert the KVM para-virtualisation drivers.

Step 5: Init / inittab

Once the kernel has finished initialising itself, and gained access to the root file system it can proceed with booting the rest of the system. The /sbin/init executable is responsible for bring the system up to the required level. In an ideal world this is a simple isolate piece of code controlled by a single plaintext configuration file called /etc/inittab. Trying to pull this from my desktop system fell over the slimey entrails of systemd, which I will describe another time. The inittab from the hd-media initrd is a simpler example to work from:

# /etc/inittab
# busybox init configuration for debian-installer

# main rc script
::sysinit:/sbin/reopen-console /sbin/debian-installer-startup

# main setup program
::respawn:/sbin/reopen-console /sbin/debian-installer

# convenience shells

# logging
tty4::respawn:/usr/bin/tail -f /var/log/syslog

# Stuff to do before rebooting
::ctrlaltdel:/sbin/shutdown > /dev/null 2>&1

# re-exec init on receipt of SIGHUP/SIGUSR1

The configuration file defines what programs we should connect to the ttys that we can access through alt-f1, alt-f2 etc. On a desktop we would expect to see the X server connected to a login manager. Different forms of shutdown are associated with commands. On a desktop we would see different runlevels associated with the programs executed to get there. In this ramdisk we see how to turn the linux system into a one-program kiosk system, in this case to run the installer. Neat.

Step 6: Rest of the system

This depends entirely on the programs launched from init. The single-application kiosk image boots a very different system from a typical server (ha, what is that?) or a typical desktop environment. Seeing how the debian-installer manages its boot gives me some very evil ideas for single-process servers in a locked down environment that I may explore later. For lecture slides I should probably also take some time to describe on another day:

  • The modern GPT scheme and UEFI
  • Booting from cdrom and usb.
  • Way to launch server processes
  • Access to the debian-installer source on anonscm.
  • Alterative preseeding approach with the config on the ramdisk.

Sunday, 11 October 2015

System Deployment (3 of x)

The desktop installer seems to work (its in day-to-day use now). Currently it builds an e17 desktop on-top of Debian, with enough support to rebuild everything in my repositories. It is stable enough to host the build system for remastering a server: the target environment.

Server overview.

The server is a VPS (virtual private server) running in a data-center belonging to the provided. The virtual machine runs inside a KVM hosting environment. Physical access is simulated through VNC  - importantly this connects to the host rather than the guest so it remains available during reboot and reinstall. Unfortunately the keymap is a bit screwed up and there is no way to change it. Most(!?!) important punctuation can be found by setting the local keymap to US, but it is a minimal environment.

The simulated CD can only be changed by someone with admin privileges on the host, so it requires a support ticket and 24-48hr turn-around time. For this reason it is left set to a virgin debian installer image.

Bootstrap of the installation environment.

One issue that crops up straight away is that although the netinst image can find the virtual driver for installation - it cannot find it directly at boot. This is not a problem for a simple clean install. But the re-installer will use the hd-media kernel/initrd to boot the cdrom image. And this system cannot find the virtual drive.

KVM uses paravirtualisation, so the kernel will need the virtio drivers (in particular virtio_blk) and
these are not in the hd-media images by default. The initial environment will look like this:

Partition 1: 1000MB, ext2, bootable
   /vmlinuz - default kernel image from hd-media
   /initrd.gz - modified hd-media initial ramdisk with extra modules for virtio
   /remaster.iso - preseeded debian installer for the target installer
Partition 2: 1000MB, swap
Partition 3: Remaining space, ext4, mounted as /
Grub installed on the MBR
  - Standard menuitem to boot /dev/vda3 into the target system.
  - Extra menuitem to boot kernel/initrd from /dev/vda1

The standard installer is used through VNC to partition the disk and get a working system onto /dev/vda3. To rebuild the initrd we use the following script in the desktop environment. This saves a huge amount of work: the environment created by the jessie installer is the environment that the jessie installer was built inside.

rm -rf tmpramd
mkdir tmpramd
(cd tmpramd && gunzip -c ../hdmedia/initrd.gz | cpio -i)
cp /lib/modules/3.16.0-4-amd64/kernel/drivers/block/virtio_blk.ko tmpramd/lib/modules/3.16.0-4-amd64/kernel/drivers/block/
mkdir tmpramd/lib/modules/3.16.0-4-amd64/kernel/drivers/virtio
cp /lib/modules/3.16.0-4-amd64/kernel/drivers/virtio/*ko tmpramd/lib/modules/3.16.0-4-amd64/kernel/drivers/virtio/
cp /lib/modules/3.16.0-4-amd64/kernel/drivers/block/virtio_blk.ko tmpramd/lib/modules/3.16.0-4-amd64/kernel/drivers/block/
#sed -e 's:start-udev$:&\n/sbin/modprobe virtio_blk:' tmpramd/init >newinit
#mv newinit tmpramd/init
#chmod +x tmpramd/init
echo >>tmpramd/etc/rcS.d/virtio /sbin/modprobe virtio_blk
chmod +x tempramd/etc/rcS.d/virtio
#echo virtio_blk >>tmpramd/etc/modules
(cd tmpramd && find . | cpio -H newc -o | gzip >../cycled.gz)
scp cycled.gz main@face:initrd.gz

As the comments in the script show there are several ways to do this that do not seem to work. I don't why. If you have any idea please leave a comment. Trying to use /etc/modules to force loading the driver does nothing - perhaps this was 2.4 only thing that has been long superceded in the kernel? Inserting the modprobe into the init for the ramdisk just causes a kernel panic when it fails. Inserting the modprobe into the rcS.d means it is called later in init, when control is passed to the debian-installer-init script. This seems to work. [edit: no it doesn't. Some kind of stateful problem in testing, it looked like it worked but this is currently broken. Will update in a later post]. Inside the clean debian install we create /etc/grub.d/11_remaster and execute update-grub.

#!/bin/sh -e
cat << EOF
menuentry "Remaster ISO" {
set root='(hd0,1)'
#loopback loop /remaster.iso
linux /vmlinuz initrd=/initrd.gz root=/dev/vda1 vga=788 auto=true panic=20 priority=critical preseed/file=/cdrom/preseed.cfg ---
initrd /initrd.gz

This puts us in the position where we can execute the preseeded installer directly from the harddrive to build the target system. There is no way to avoid the partitioning step inside the preseeded installer, so it is vital that the partitions made in the original clean install are identical to those made in the preseeded installer. Overwriting the partition table with the same data does not lose any data on the disk.

The preseeded installer.

As with the desktop install the preseed file is wrapped inside the .iso for the installer. Looks very similar, same partitioning scheme: 1GB for installer images, 1GB for swap, rest for a single / file-system. No dynamic network, hardcoded to the static setup of the target server. The overlay that gets untar'd at the end overwrites the sshd config. No passwords, no root access. Only strong-passphrase keys, the public halves being in the .iso and converted directly into an .ssh/authorized_keys file. There is a single random string with the password for the main user, but this can only be used over VNC. Basic package load for the server.

d-i debian-installer/language string en
d-i debian-installer/country string SE
d-i debian-installer/locale string en_SE.UTF-8
d-i keyboard-configuration/xkb-keymap select sweden

d-i netcfg/choose_interface select eth0

# To pick a particular interface instead:
#d-i netcfg/choose_interface select eth1

# To set a different link detection timeout (default is 3 seconds).
# Values are interpreted as seconds.
#d-i netcfg/link_wait_timeout string 10

# If you have a slow dhcp server and the installer times out waiting for
# it, this might be useful.
#d-i netcfg/dhcp_timeout string 60
#d-i netcfg/dhcpv6_timeout string 60

# If you prefer to configure the network manually, uncomment this line and
# the static network configuration below.
d-i netcfg/disable_autoconfig boolean true

# If you want the preconfiguration file to work on systems both with and
# without a dhcp server, uncomment these lines and the static network
# configuration below.
#d-i netcfg/dhcp_failed note
#d-i netcfg/dhcp_options select Configure network manually

# Static network configuration.
d-i netcfg/get_ipaddress string
d-i netcfg/get_netmask string
d-i netcfg/get_gateway string
d-i netcfg/get_nameservers string
d-i netcfg/confirm_static boolean true

# IPv6 example
#d-i netcfg/get_ipaddress string fc00::2
#d-i netcfg/get_netmask string ffff:ffff:ffff:ffff::
#d-i netcfg/get_gateway string fc00::1
#d-i netcfg/get_nameservers string fc00::1
#d-i netcfg/confirm_static boolean true

d-i netcfg/get_hostname string face
d-i netcfg/get_domain string
d-i netcfg/hostname string face
d-i netcfg/wireless_wep string # Disable that annoying WEP key dialog.

### Mirror settings
d-i mirror/protocol string ftp
d-i mirror/country string se
d-i mirror/ftp/hostname string
d-i mirror/ftp/directory string /debian

### Account setup
# Skip creation of a root account (normal user account will be able to
# use sudo).
d-i passwd/root-login boolean false
# Alternatively, to skip creation of a normal user account.
#d-i passwd/make-user boolean false

# Root password, either in clear text
#d-i passwd/root-password password abc
#d-i passwd/root-password-again password abc
# or encrypted using an MD5 hash.
#d-i passwd/root-password-crypted password [MD5 hash]

# To create a normal user account.
d-i passwd/user-fullname string The main user
d-i passwd/username string main
d-i passwd/user-password password xxxxxxxx
d-i passwd/user-password-again password xxxxxxxx

d-i clock-setup/utc boolean true
d-i time/zone string Europe/Stockholm
d-i clock-setup/ntp boolean true

d-i partman-auto/disk string /dev/vda
d-i partman-auto/method string regular
# Manual use of the installer on face reports 30.1GB
d-i partman-auto/expert_recipe string \
remasterPart :: \
1000 1000 1000 ext2 \
$primary{ } $bootable{ } \
method{ keep } \
. \
1000 1000 1000 linux-swap \
$primary{ } \
method{ swap } format{ } \
. \
15000 15000 150000 ext4 \
$primary{ } $bootable{ } \
method{ format } format{ } \
use_filesystem{ } filesystem{ ext4 } \
mountpoint{ / } \
#d-i partman/choose_recipe select atomic
d-i partman-auto/choose_recipe select remasterPart
d-i partman/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i partman-basicmethods/method_only boolean false
d-i partman-md/confirm boolean true

# Package setup
tasksel tasksel/first multiselect minimal
d-i pkgsel/include string openssh-server git python python-dateutil sudo bind9 bind9-host gitolite3 binutils dnsutils authbind curl
popularity-contest popularity-contest/participate boolean false

d-i grub-installer/only_debian boolean true
d-i grub-installer/with_other_os boolean true
d-i grub-installer/bootdev string /dev/vda
d-i finish-install/reboot_in_progress note

d-i preseed/late_command string \
tar xzf /cdrom/overlay.tgz -C /target ; \
in-target chown -R main:main /home/main ; \
in-target chown root:root /etc/hosts ; \
in-target chown root:root /etc/ssh/sshd_config ; \
chmod 700 /target/home/main/.ssh ; \
in-target chown main:main /home/main/.ssh/authorized_keys ; \
chmod 600 /target/home/main/.ssh/authorized_keys

Sunday, 4 October 2015

System Deployment (part 2 of x)

There is not much interesting to say about the debian installer any more: what used to be complex has become quite simple. Hit the appropriate buttons for localisation, choose a standard task selection and drive layout then wait around for the system to build.

The first thing that we want to do is to automate this process. The debian installer was written with scripting support in mind. Preseeding is a technique for supplying the answers to each installer prompt in a simple text format. If the installer can see the preseed file then it can execute the entire installation and reboot into the new system without any user input.

What do we need to make this work?
  1. A preseed file to configure the installer.
  2. A method of getting the preseed file into the installer image.
Let's see how to solve the second problem first.

Remastering the installer image to include a preseed file.

This is easy to do in a running linux system. It seems to be next to impossible in a modern OSX version (the newest debian installers use UDF rather than ISO and the specific UDF used does not mount under OSX).

remaster.iso: copy
cp preseed.cfg copy/
        cp ~/.ssh/ copy/
genisoimage -b isolinux/isolinux.bin -c isolinux/ -o remaster.iso -J -R -no-emul-boot -boot-load-size 4 -boot-info-table copy/

copy: debian-8.2.0-amd64-netinst.iso
mkdir copy 2>/dev/null || true
mkdir loop 2>/dev/null || true
mount debian-8.2.0-amd64-netinst.iso loop/
rsync -rav loop/ copy
umount loop rm -rf loop

debian-8.2.0-amd64-netinst.iso: curl -LO

A makefile may not be the best way to do this, but it is fast and cheap. We don't want to grab the original installer each time - we want to cache it to save bandwidth. We don't want to unpack the image again unless we have to. A makefile is very natural way to express building from caches.

Unfortunately there is something funky here - it could be that using the directory copy as a target is causing problems with time-stamping for the make logic. Either way I've had to kill the copy directory a few times to get changes to propagate. This mostly works.

Using a file called preseed.cfg in the root directory of the installer image is supported directly by the debian installer. The ssh key file is not used directly by the installer but it is explained later in this post.

Writing a preseed file.

Writing a preseed file is like any other configuration - start with a working example (in this case the jessie example file). Run it to check it works. Tweak it until it does the right thing. In this case the right thing is a desktop install targeted at a virtualbox VM for development work. There is an extra unused partition on the harddrive - this is for storing the .ISO on directly so we can do re-installs direct from the harddrive without needing access to the virtual cdrom.

d-i debian-installer/language string en
d-i debian-installer/country string SE
d-i debian-installer/locale string en_SE.UTF-8
d-i keyboard-configuration/xkb-keymap select sweden
d-i netcfg/choose_interface select auto
d-i netcfg/get_hostname string unassigned-hostname
d-i netcfg/get_domain string unassigned-domain

d-i netcfg/hostname string psuedo2
d-i netcfg/wireless_wep string

d-i mirror/protocol string ftp
d-i mirror/country string se
d-i mirror/ftp/hostname string
d-i mirror/ftp/directory string /debian

d-i passwd/root-login boolean false
d-i passwd/user-fullname string Amoss
d-i passwd/username string amoss
d-i passwd/user-password password abc
d-i passwd/user-password-again password abc

d-i clock-setup/utc boolean true
d-i time/zone string Europe/Stockholm
d-i clock-setup/ntp boolean true

d-i partman-auto/disk string /dev/sda
d-i partman-auto/method string regular
d-i partman-auto/expert_recipe string                         \
      remasterPart ::                                         \
              1000 1000 1000 ext2                             \
                      $primary{ } $bootable{ }                \
                      method{ keep }                          \
              .                                               \
              1000 1000 1000 linux-swap                       \
                      $primary{ }                             \
                      method{ swap } format{ }                \
              .                                               \
              15000 15000 150000 ext3                         \
                      $primary{ } $bootable{ }                \
                      method{ format } format{ }              \
                      use_filesystem{ } filesystem{ ext3 }    \
                      mountpoint{ / }                         \
d-i partman/choose_recipe select remasterPart
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i partman-basicmethods/method_only boolean false

tasksel tasksel/first multiselect minimal
d-i pkgsel/include string openssh-server xorg e17 xdm terminology
popularity-contest popularity-contest/participate boolean false

d-i grub-installer/only_debian boolean true
d-i grub-installer/with_other_os boolean true
di finish-install/reboot_in_progress note

d-i preseed/late_command string \
mkdir /target/home/amoss/.ssh ; \
in-target chown amoss:amoss /home/amoss/.ssh ; \
chmod 700 /target/home/amoss/.ssh ; \
cp /cdrom/ /target/home/amoss/.ssh/authorized_keys ; \
in-target chown amoss:amoss /home/main/.ssh/authorized_keys ; \
chmod 600 /target/home/amoss/.ssh/authorized_keys

Some words about security. 

Don't really set your password to abc - even when you are just testing this in a machine sitting behind NAT. It just sets you up for fail when you change the network configuration in virtualbox and forget to update the password. The idea is to set a randomised password for the only user account and disable root access.

The late_command script at the end is very specific to my configuration, but it should be adaptable for anyone. I have one ssh key that I use to log into desktop machines. It has a strong passphrase on it as sometimes I store the private key in places that I do not entirely trust (and thus it needs to be strong enough to survive brute-force attacks for the expected lifetime of the key).

I place the public-half of this key into the disk image inside the makefile. The installer then creates a .ssh directory for the single user and sets the public key as an authorised key for login to the new desktop. I then have a single way into the newly installed image - ssh to the virtualbox bridge IP address using the key. Once I have securely logged in then I can set the password to something memorable - this avoids storing the password in the .ISO image that is being built. Even if we never release the .ISO into the wild storing a plaintext password in it of any value is simply a bad idea.

Saturday, 3 October 2015

System Deployment (1 of x)

The time has come for a new series of posts, as a bit of background: these series of posts are sketches of new material that relates to one of my courses. In this case the course is the basic linux introduction, which will be getting some new material. The subject to be developed in this series is how we should build and deploy linux systems. I will be covering both desktop and server builds. The information on desktop system is partly a record of how I have finally tamed the jungle of systems that I work on, and partly a description for students who may be installing a linux system to use within the course. The information on server systems is partly a record of how the course server was built, and how it works, and it is partly a guide for students: although they will not be installing a server system within the work on the course, the information on how to do so is a useful body of knowledge to take away from the course as it may be something they need to do in future courses.

On a personal note, the course server was taken off line a couple of weeks ago for a major upgrade. Two things went wrong: the VPS provider had claimed they allow installation from a custom .ISO, which is what prompted this work originally. They did not mean they allowed the customer to upload a custom .ISO, they meant that support staff could do it through a ticket. This means I may need to find a new provider, as there are problems with running a hdd-install inside a para-virtualised system that I may not be able to overcome. The other problem is that I've been off work with some health problems for the past three weeks, and it is now too close to the start of the course to believe this will be working in time. It seems likely that this year the course will be run from an old copy inside It's Learning, and that deployment of new material on the course server will probably be delayed until a different academic year. Slippage is a bitch.


Regardless of the OS, and regardless of the machine there is an observation that is both timeless and ubiquitous: when a machine is first installed it is fast and stable. Over time it becomes less so. The computer industry has some similarities with the sale of used cars: we can make that "new car smell" that lasts until you get your purchase home and start to use it. Then things go slowly downhill.

Some people would have you believe that the reasons for this are difficult and not clearly understood. Because of this there is a human tendency to attribute agency, and assume somewhere in the process we are deliberately being screwed. I would tend towards a different explanation, that I believe most programmers would agree with. At installation time a computer system exists in a state of low-entropy. The variations between systems are due to different sets of drivers on different hardware, or different selections of features and services. The installer is simply a program that builds a known target state on the system. As a programmer I always like it when my program is trying to build a value that I know. Good times.

During most uses of a system it experiences unpredictable, unknown, changes. The system tends towards a higher state of entropy. Every piece of software that is installed, every change of driver, every update or upgrade of code creates more uncertainty about the state of the system. Eventually it reaches a state of maximum entropy - the heat-death of a computer, at which point an expert is summoned to "wipe all of that crap off and install it fresh". The cycle continues, after all thermodynamics does not take prisoners and the outcome is inevitable.

Problem statements.

1. System upgrades create more entropy than system installs.

This should be read as a diagram of "non-commutativity", i.e normal use + upgrade ≠ upgrade + normal use. The circles should be seen as estimates of "valid possible states / configurations". The amount of uncertainty / entropy is indicated by the number of circles.

2. Configuration drift destroys robustness.
Robustness should be interpreted as doing the least surprising thing. Editing the configuration of a live system until it seems to work does not create the least surprise.

3. Even the fastest human expert imposes latency on fixing a system.
If I type really fast then I can mimic a very slow script. Maybe.

4. Reliability is easiest to achieve through redundancy. Robustness requires returning to a known state.
When something breaks it needs a backup. If something has become questionable it helps if the backup is not also screwed.

5. The higher the degree of system entropy the less secure the system.
If we know things then we can rule out attack surfaces. Pop quiz: which is more secure, a dynamic IP or a static one. Sensible answer: why would it make a difference, the information exchange between the dhcp server and dhcp client only needs to convey MAC addresses and IP addresses. Real world answer: programmers are lazy idiots, and both dhcp servers and clients use bash to process strings in the protocol. Unless patched and verified using DHCP opens a system to the shellshock exploit.

Solution (to be developed over series)

Automated installation and deployment.

Ok, so can we unpack this a little to get an idea of why this would be a solution (all details in upcoming posts):
1. Avoid upgrading the system, capture the target as a delta from the default install, reapply the delta to an install of the new version. This is "the other pathway" on the diagram above.
2. Never reconfigure anything on the target. Preseed and script all configuration and installation. This allows deployment of the system into a development environment with some reassurance that the config matches the live system.
3. A preseeded installer can rebuild a system in about 10 minutes. Not to the clean OS install, but to a fully working system with working data downloaded from source control.
4. Configuration changes are captured in the version control for the system that builds the installer.
5. All configuration is documented. In the longer term it seems interesting to make the target system read-only to enforce this - running in a similar manner to a live distribution. This closes all the holes - if there is no root on the target box, then no attacker can own it.