Monday, 19 October 2015

System Deployment (part 7 of x)

The Mechanise server is back up and running, with a mirror of each git repository on it. Before teaching starts for the winter term it needs the web-server used for the courses to be brought back into life. This is complicated by a massive rewrite / redesign of large chunks of it that will probably stretch long into the term.

The web server.


This should be a simple beast: python code using the twisted library for HTTP processing. Large chunks of the site are content served dynamically to each student. Over the years it has been a testbed for pedagogic projects: generating unique assignments for students, integrating automatic testing into the submission system and other crazy ideas. As a result it has sprawled out of control, and the software architecture looks inspired by Picasso having a merry old time high on weapon-grade LSD.

The first step is get the deployment system working again on the server. When the git repository hosting the server is updated a post-update hook springs into life:

  • Copy the source tree and resources into the production tree.
  • Kill the old server.
  • Respawn the noob.
Git hooks are a strange mess of server-side state that is not versioned... Inside the bare repository on the server we update files in the .git/hooks directory that git will execute during certain actions. The post-update hook is the one that will redeploy the server:

#!/bin/sh

echo "Website updated from commit" | logger -t gitolite
GIT_WORK_TREE=/var/www/thesite git checkout -f | logger -t gitolite
chmod 755 -R /var/www/thesite
chown git:git -R /var/www/thesite
curl -s http://localhost/restart
sleep 2
top -bn1 | grep python
ps -A --forest | grep -C1 python
tail /var/log/syslog

Like all archeologists we can find evidence of panic among the primitive people. The sleep followed by a dump of info is a sure sign that something did not work once, and that the confirmation used to debug that was so comforting that it was never removed.

Using a URL to kill the server is asking for trouble, currently we check the incoming transport that it originated on the 127 interface. This should not be spoof-able, but if it is then we can use a random number in the file-system to lock this request. This approach works better than a direct kill from the gitolite user as:

  • No worries about serialisation; if we are in the processing hook for the restart page then any file I/O for another request is done.
  • No worries about privileges to kill a process belonging to another user with introducing a privilege escalation attack.

First we need the user that will run the server, and their home-directory. The user www-data is already installed on debian for this purpose:

main@face:~$ grep www-data /etc/passwd
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
main@face:~$ sudo su www-data
[sudo] password for main:
This account is currently not available.
main@face:~$ sudo su -s /bin/bash www-data
www-data@face:/home/main$ cd
bash: cd: /var/www: No such file or directory

It is designed not to permit casual use - once upon a time it was a dreadful security hole when people would forget to set a strong password on the account, or even worse leave the default in place. We actually like it that way, so we will create the home directory that it needs and leave it disabled so that only the root user can log into it by forcing a different shell.

root@face:/home/main# mkdir /var/www
root@face:/home/main# ls -ld /var/www
drwxr-xr-x 2 root root 4096 Oct 19 09:34 /var/www
root@face:/home/main# chown git:git /var/www
root@face:/home/main# ls -ld /var/www
drwxr-xr-x 2 git git 4096 Oct 19 09:34 /var/www
root@face:/home/main# echo >/var/www/webservice <<EOF

#!/bin/bash
cd /var/www/thesite
while true
do
authbind python server.py 2>&1 | logger -t www
echo "Web server exited, restarting" | logger -t www
sleep 2
done
EOF

root@face:/home/main# chown www-data:www-data /var/www/webservice
root@face:/home/main# chmod 744 /var/www/webservice
root@face:/home/main# su -s /bin/bash www-data
www-data@face:/home/main$ cd
www-data@face:~$ ls -al
total 12
drwxr-xr-x 2 git git 4096 Oct 19 09:37 .
drwxr-xr-x 13 root root 4096 Oct 19 09:34 ..
-rwxr--r-- 1 www-data www-data 167 Oct 19 09:37 webservice
www-data@face:~$ ./webservice
./webservice: line 3: cd: /var/www/thesite: No such file or directory
^C
^D
root@face:/home/main# mkdir /var/www/thesite
root@face:/home/main# chown git:git /var/www/thesite
root@face:/home/main# chmod 755 /var/www/thesite
root@face:/home/main# touch /etc/authbind/byport/80
root@face:/home/main# chown www-data:www-data /etc/authbind/byport/80
root@face:/home/main# chmod 500 /etc/authbind/byport/80
root@face:/home/main# ls -l /etc/authbind/byport/80
-r-x------ 1 www-data www-data 0 Oct 19 10:17 /etc/authbind/byport/80

Cool. The user has just enough privileges to execute the service script, but it cannot do anything else as it has no write permission anywhere. The gitolite user owns the www-data home-directory and the yet to be created thesite directory inside it. This is the target that we perform the bare checkout into each time the repo is updated. The basic workflow is like this:

  • Dev work happens off the server using -local to run the server in the non-production environment.
  • Deployment happens when the the dev-commits are pushed back up to face.
  • The update-hook fires:
    • The production source tree is updated with the new code.
    • The old server is killed.
    • The service script spawns a new server after a couple of seconds.

No comments:

Post a Comment