moar articles
Tweet
Follow @nicferrier

Virtualization for Python projects

Virtualization has been a hot topic for a while but till now it's mostly been about how to get more out of hosting resources, or how to get cheap hosting resources because someone else is.

This year, developers have started to get into the benefits - using virtualization to abstract the target environment for your project so you can deliver builds in a more predictable way.

WooMe's new tool, veh, makes virtualization for Python one step easier still. It let's you package dependancies in a simple manageble way and control the distribution of them.

But first... a step back into what virtualization means.

Lighter, Faster, Cheaper: pick all three

The idea is you build a virtual machine image of some developer friendly operating system and have developers build their software on that. You can then deploy the VM to your live servers. Dealing with dependancy issues like a new version of Python or Ruby, a particular C library or a particular verion of Tomcat is much easier on a virtual machine and the sysadmins get to manage the hardware operating system independantly.

While virtual machine image virtualization is a massive step forward the biggest difficulty is the size of the files you end up with. moving VM images around is tough, they're big. This is ok for live, you can even go to the lengths twitter.com go to and deploy with bittorrent. But here at WooMe even our dev servers are very remote so pulling down a new VM to work on or pushing one up when it's complete is a real problem.

What we need is some sort of differential virtualized environment where you can start with a base and then capture the changes you make and just move those around.

Well. Python has a great tool called virtualenv which abstracts Python dependancies into a local directory system. It means to run any program you can totally specify the Python environment, completly seperately from the hosts environment. Specific versions of packages that clash with the systems' packages that the system doesn't need at all and avoiding some packages that the system does need, all these are easily doable.

Modern Python build tools even have built in support for virtualenvs. So pip, for example, can do this:

pip install -E some_virtual_env lxml=2.2.3

and that will install version 2.2.3 of lxml into the virtual env called some_virtual_env.

Making it easy

So virtualenvs are great. They don't by themselves resolve our mobility problem. If we get our developers to use them we still have to move the virtualenvs around; to production and to other developers.

Here's where our new tool called veh comes in. veh let's you specify the list of packages you want to be in your virtualenv in a version controlled file in your project and then use the program to control access to your program via the virtualenv.

For example, here's myproject/mypython.py:

print 'hello developers!'

first, make a new hg project and commit our python file to it:

cd myproject
hg  init
hg add mypython.py
hg ci -m "first python file" mypython.py

then install veh into the project and add lxml as a dependancy:

veh install
echo "lxml-frompypi = lxml" >> .veh.conf

The text on the left of the = is just a label and could be anything unique for the package. The text on the right is the package name as installable with pip.

Now, from any other directory, use veh to make a virtualenv and create an interactive shell inside it:

cd /tmp
veh -R ~/myproject shell 

Until we exit from that shell we'll be using the virtualenv Python, totally protected from the base machine Python environment or any other Python environment.

So now we have a solution to the dependancy problem because now, when the developer is happy with the dependancies he has he can:

cd myproject
hg ci -m "set dependancies for this project" .veh.conf
hg push deploymentserver

The live system can use veh to guard access to the program with the virtualenv. Perhaps the django startup script would look like this:

veh -R nicproject shell -- ./manage.py run-spawning -p 8080 -T 10 -P 5

That runs the django manage.py command and presumably runs Spawning on port 8080 with 10 threads in 5 processes. Before django is run the virtualenv is created and the dependancies pulled down, so Spawning runs inside whatever environment the developers specified.

You can rebuild and refresh the virtualenv as well:

veh -R nicproject refresh

refresh pulls in any new changes you've made to the virtualenv config.

veh -R nicproject rebuild

trashes the virtualenv and builds a new one.

Distributing packages and not losing flexibility

There's been much debate in the devops community of late about whether to try and build packages for deployment. Packaging is a great discipline but the cost is high. With virtualenv and veh however, you can get the benefits of separated packaged development without having to build a copy of an apt repository or something.

We can Python package stuff and use veh to help distribute them. There's no need to build a complex infrastructure either because pip can install from a repository url.

Consider this veh.conf:

[packages]
lxml = lxml
biglib = hg+http://devserver/hg/mydependancy_project
smalllib = hg+http://devserver/hg/another_depend
# End

this will allow veh to build a virtualenv with lxml from pypi and mydependancy_project and another_depend both pulled from a site local mercurial web server.

Stuff to do

veh isn't finished yet, but there isn't much more to do. Here's my top 5:

If you like the idea of veh or you have suggestions to make it more usable then do please hit me up on twitter or github.