Friday, 14 October 2016

Automated Testing in Docker (part 6 of 5).

An extra postscript.

Testing submissions on the compiler course has been working well with docker over the last year. The latest version builds a report summary that is stored on the web-server, and then rendered into a viewable report for students. It seems to have made many platform-dependent bugs fixable for students, which has increased the quality of submissions. It has been successful enough that I ported it over to the Linux course where it has made an impact on the grading process in the course.

Then it all broke.

The testing process was performed on one of two machines, depending on where I was working at the time:

  • A Mac laptop, using boot2docker inside virtualbox.
  • A linux desktop, using local install of docker.
There were no observable differences between testing results from the two platforms, although this may have been something of a fluke. The desktop machine died two weeks ago and was replaced with a much newer desktop with a Skylake processor. Unfortunately the processor is currently unstable on debian Jessie so the desktop is running Ubunut 16.06 until the microcoding / libc issues are resolved.

Running Docker under Ubuntu.

The benefit of docker (over manipulating raw VM images) is the convenience that the cmd-line tools give for handling containers and images. The performance benefits are not so important for this application. But both of these attributes arise because Docker builds images on onion-filesystems, building up on readonly layers.

Switching to Ubuntu caused an unforeseen problem in the testing environment - all the core dumps disappeared. Investigating this revealed that the ulimit -c unlimited in the testing script was not sufficient to generate cores. The kernel checks /proc/sys/kernel/core_pattern to decide where to write the image.

In a docker container this is simply a read-only copy of the host! When /proc only served as an informative (reflective) interface to the kernel status this was not a problem. But not that /proc is also used as a configuration interface it means that details of the host are leaking into the container. In particular Ubuntu sets this to:

      |/usr/share/apport/apport %p %s %c %P

So that cores are piped into a reporting tool - which is not installed in the docker container, and is not the desired behaviour anyway.

Conclusion: using docker in its default mode on linux as a form of configuration management for a testing environment is fatally flawed.

Wrapping the linux docker inside boot2docker.

The official way to install docker does not seem to include a virtualized linux option. The VM approach is used on Windows and OS-X, but the installer for them (Docker-Toolkit) is not available on linux. So this needs to be done manually:

curl -L https://github.com/docker/machine/releases/download/v0.8.2/docker-machine-`uname -s`-`uname -m` -odocker-machine
chmod 755 docker-machine
sudo mv docker-machine /usr/local/bin/
sudo chown root:root /usr/local/bin/docker-machine
docker-machine create --driver virtualbox default

Yes, I shit you not. It really is that ugly to get it onto an Ubuntu system. Life now takes a turn for the more "interesting":

Error creating machine: Error in driver during machine creation: This computer doesn't have VT-X/AMD-v enabled. Enabling it in the BIOS is mandatory

It seems that VT-X is disabled by default on the HP EliteDesks. Enabling it allows the boot2docker image to run successfully (https://github.com/docker/machine/issues/1983), and then docker-machine nv default produces the right values to connect.

Note: all the old scripts use sudo docker, this still works - but it connects the daemon to a different docker server. Using docker as user (instead of sudo) shows the right machine where stuff works. This is confusing to use.

Standard install for the debian_localdev image used to test submissions:

docker run -it --name localdev debian /bin/bash
> apt-get update
> apt-get install gcc g++ clang gdb make flex bison graphviz vim
> ^d
docker commit localdev debian_localdev

After this we still get leakage from the server host - but boot2docker is quite minimal so we should be able to tolerate the configuration leakage.

cat /proc/sys/kernel/core_pattern
core

Need to remember to update the docker scripts before retesting all the submisssions..

Monday, 21 March 2016

Quick notes on unbreaking gitolite

Some quick notes on de-fuck-ifying a gitolite3 installation.


Here is post-hoc explanation of what probably happened:

  • Installed gitolite3 on debian using a key called git.pub
  • Lost the private half.
  • "Fixed" the problem with a new key called rsa_git.pub, this was manually inserted into .ssh/authorized_keys instead of redoing the install.
  • Stuff worked (for about 5 months) long enough to dispel any suspicious that it was all funky and rotten underneath.
  • Tried to update the admin repo to add a new repo - all hell broke loose.

Symptoms


After committing the admin change, things got kind of weird and then this happened:

git clone ssh://git@face.mechani.se/gitolite-admin.git admin_face
Cloning into 'admin_face'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

The key is in an agent that has been running for months, so that seems to be unpossible!

At this point I realised that I had screwed the admin key and googled how to fix it. The instructions for gitolite3 say:

When in doubt, run 'gitolite setup' anyway; it doesn't do any harm, though it may take a minute or so if you have more than a few thousand repos!

This is not in any way true. It is entirely possible that running setup on a live install will break that install. It is not a safe operation at all. Do not believe the lies: there is no cake.

After using this to install the new key something really bad happened, it looks like this:

FATAL: R any gitolite-admin rsa_git DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

So at this point things look really bad as access to the admin interface has been lost. This can be confirmed by trying to ssh directly into the git user on the server which shows this:


PTY allocation request failed on channel 0 
hello rsa_git, this is git@face running gitolite3 3.6.1-2 (Debian) on git 2.1.4

R W testing


Solution


A bit more googling shows that people then tend to panic and wipe their gitolite install and install from scratch to fix this. Instead I will now quote again from the gitolite3 docs:


Don't panic!

First, have a poke around the git user directory (if you no longer have a way to do this, i.e. you do not have root on the box, then go ahead and panic, that's probably the right approach). .gitolite/logs is very interesting and lets you reconstruct what has happened. More importantly:

.gitolite/conf is where the bare contents of your admin repo get blasted into!

So fix .gitolite/conf/gitolite.conf first to regain access (i.e. change the keynames on every repo).
This will not do anything at all, so note the presence of a file called gitolite.conf-compiled.pm that is obviously a cache. Delete it. This still does not work but at least the error message changes to indicate it needs that file. Finally, just run:

gitolite3 compile

This will regenerate the conf cache properly and let you back in. Problem solved, panic avoided