BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Pack Up the Wagon, We're Going Offline

Pack Up the Wagon, We're Going Offline

We’re a Python shop at Cloudify and we needed to be able to ship our plugins to customers. The plugins are sets of multiple Python packages which had to be available for different platforms, in different versions, for installation by our framework. The fact that we work with enterprise companies – banks and the likes – organizations that cannot simply install things off of the internet, is forcing us to provide the entire package and its dependencies available to the internet-less user. “pip install and be done with it” is not an option. This had us searching for solutions.

As such, during my work, I’ve had the distinct pleasure (note the sarcastic tone) of dealing with packaging artifacts, namely, Python packages along with their dependencies. These packages had to be available for installing at runtime and had to contain information on the packages for the installer to read.

Python doesn’t make handling dependencies easy. While you can statically link Python and ship it with dependencies, clever dependency handling is not Python’s strong side and it would require a lot of work to achieve a high level of portability although, as you will see, it’s gotten better. On top of that, Python’s default method of installing dependencies requires an internet connection which, even today, is not always a viable prerequisite.

Alternatives

At first, we thought about providing a PyPI (Python Package Index) clone along with our product. Unfortunately, PyPI is about 150GB and that is not something we can afford to ship to customers.

We needed a REAL offline solution, so we built one.

We have a lot of experience with virtualenv and Spotify’s dh-virtualenv came to mind. We could use it to package an environment with dependencies. Unfortunately, dh-virtualenv revolves around Debian packages and we needed packages for potentially any distribution. Additionally, our agents are already based on virtualenv and we just needed a way to install more plugins into the same environment.

Twitter’s PEX also came to mind. It is a way to create portable Python environments and ship them easily as single binaries. Unfortunately, after deploying such packages, we would not be able to install additional plugins in runtime, which is a requirement in our case.

Our Solution

What we came up with is Wagon  – a very straight-forward solution for creating and installing a Python package. What it does is provide an abstraction for creating an archive which contains a Python package with its dependencies from multiple sources (GitHub, local path, PyPI); allows installing the package directly from the archive, validating it and displaying its metadata.

A Use Case

Flask is a Pythonic easy to use web framework. It is widely used and actively developed by Armin Ronacher, who also wrote the templating mechanism Jinja.

Now let’s say you want to start writing a REST service in Python and decided to use Flask and install it. Flask has some dependencies, like click, Jinja2, and others.

When you run `pip install flask`, pip calls PyPI and looks for a package named “flask”. The installation then starts while looking for dependencies stated within its setup.py file. If an alternative index is provided instead of PyPI, it will be searched so that the dependencies can be resolved. So far, so good.

Now, what if you want to install it without accessing the index? What if you don’t even have an index to access? What if you’re a bank, for instance, and need to take all of the packages required for your applications and install them in an “offline” manner within your organization?

Let’s take a step back.

In 2004, the egg format was introduced to Python by setuptools. It provided a way to distribute a Python package containing its code, metadata and accompanying resources.

For various reasons (too long to detail here), another way of distributing Python packages prevailed and in 2012, under PEP0427 (later updated with PEP0491), Wheels were introduced.

While there are many differences between Wheels and Eggs (literally), some of the main ones are:

  • Wheels do not contain .pyc files so that the code is compiled in runtime, which means that it can be used across Python versions (if they don’t contain C extensions).
  • Wheel files contain version information, while Eggs do not.
  • Wheel file naming conventions are more elaborate, allowing for much more information to be stored within the filename itself. We’ll get back to that.

To support the new format, pip added a wheel subcommand allowing you to create Wheels, and in addition, added a built-in mechanism for installing them.

Some of the most widely used Python modules are now packaged and distributed (via PyPI), as Wheels. Take a look at http://pythonwheels.com/ for the commonly used packages.

Let’s take Flask, again, as an example and “Wheel” it.

$ pip wheel flask
Collecting flask
  Using cached Flask-0.10.1.tar.gz
Collecting Werkzeug>=0.7 (from flask)
  Using cached Werkzeug-0.10.4-py2.py3-none-any.whl
  Saved ./wheelhouse/Werkzeug-0.10.4-py2.py3-none-any.whl
Collecting Jinja2>=2.4 (from flask)
  Using cached Jinja2-2.8-py2.py3-none-any.whl
  Saved ./wheelhouse/Jinja2-2.8-py2.py3-none-any.whl
Collecting itsdangerous>=0.21 (from flask)
  Using cached itsdangerous-0.24.tar.gz
Collecting MarkupSafe (from Jinja2>=2.4->flask)
  Using cached MarkupSafe-0.23.tar.gz
Skipping Werkzeug, due to already being wheel.
Skipping Jinja2, due to already being wheel.
Building wheels for collected packages: flask, itsdangerous, MarkupSafe
  Running setup.py bdist_wheel for flask
  Stored in directory: /home/nir0s/repos/wagon/wheelhouse
  Running setup.py bdist_wheel for itsdangerous
  Stored in directory: /home/nir0s/repos/wagon/wheelhouse
  Running setup.py bdist_wheel for MarkupSafe
  Stored in directory: /home/nir0s/repos/wagon/wheelhouse
Successfully built flask itsdangerous MarkupSafe

$ ll wheelhouse/
total 712
drwxrwxr-x 2 nir0s nir0s   4096 אוק 13 08:35 ./
drwxrwxr-x 7 nir0s nir0s   4096 אוק 13 08:35 ../
-rw-rw-r-- 1 nir0s nir0s 115947 אוק 13 08:35 Flask-0.10.1-py2-none-any.whl
-rw-rw-r-- 1 nir0s nir0s  10434 אוק 13 08:35 itsdangerous-0.24-py2-none-any.whl
-rw-rw-r-- 1 nir0s nir0s 263888 אוק 13 08:35 Jinja2-2.8-py2.py3-none-any.whl
-rw-rw-r-- 1 nir0s nir0s  26027 אוק 13 08:35 MarkupSafe-0.23-cp27-none-linux_x86_64.whl
-rw-rw-r-- 1 nir0s nir0s 293089 אוק 13 08:35 Werkzeug-0.10.4-py2.py3-none-any.whl

What we just did here was download or create Wheels for Flask and all of its dependencies. Note that MarkupSafe was built for Linux while all others support any platform. Also, note that the Python versions by which these wheels can run are stated within the file names.

So, even after creating wheels for the different dependencies, a problem still remains. Wheels are archives of single Python packages. While pip’s wheel support does download all dependencies for a package, and even creates a wheel for dependencies which were not originally distributed as Wheels on PyPI, transporting an entire package along with its dependencies remains an issue.

What we were looking to do is take these wheels, packages them together, add some metadata and then extract and install them elsewhere.

This is where Wagon comes in.

First, we’ll install Wagon by running `pip install wagon`. We then continue by creating an archive for a Python package. Let’s create a Flask Wagon:

$ wagon create -s flask --validate
INFO - Creating module package for flask==0.10.1...
INFO - Module name: flask
INFO - Module version: 0.10.1
INFO - Downloading Wheels for flask==0.10.1...
INFO - Creating archive: ./flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.wgn…
INFO - Installing ./flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.tar.gz
INFO - Identified machine platform: linux_x86_64
INFO - Installing flask...
INFO - Validation Passed! (Cleaning up temporary files).
INFO - Process complete!

We have now created a “Wagon” containing Flask and its dependencies and validated it. Now, let’s install the archive into a virtualenv of our choosing:

$ virtualenv flask
…

$ wagon install -s flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.tar.gz --virtualenv flask
INFO - Installing flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.tar.gz
INFO - Identified machine platform: linux_x86_64
INFO - Installing flask...

$ flask/bin/pip freeze
Flask==0.10.1
itsdangerous==0.24
Jinja2==2.8
MarkupSafe==0.23
Werkzeug==0.10.4

That’s it. We’ve now installed Flask into a designated virtualenv without accessing PyPI. Note that when creating the Wagon, we also passed the `--validate` flag which validates that the metadata is intact, the relevant Wheels are in the Wagon, and the Wagon is installable.

The validation can also be executed after the fact by executing:

wagon validate -s flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.tar.gz

Now let’s look at the metadata generated for the archive:

$ wagon showmeta -s flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.wgn
{
    "archive_name": "flask-0.10.1-py27-none-linux_x86_64-Ubuntu-trusty.wgn", 
    "build_server_os_properties": {
        "distribution": "ubuntu", 
        "distribution_release": "trusty", 
        "distribution_version": "14.04"
    }, 
    "excluded_wheels": [], 
    "package_name": "flask", 
    "package_source": "flask==0.10.1", 
    "package_version": "0.10.1", 
    "supported_platform": "linux_x86_64", 
    "supported_python_versions": [
        "py27"
    ], 
    "wheels": [
        "Flask-0.10.1-py2-none-any.whl", 
        "Werkzeug-0.10.4-py2.py3-none-any.whl", 
        "MarkupSafe-0.23-cp27-none-linux_x86_64.whl", 
        "Jinja2-2.8-py2.py3-none-any.whl", 
        "itsdangerous-0.24-py2-none-any.whl"
    ]
}

The metadata file offers information about the archive that enables developers to programmatically install the archive according to certain conditions. For instance, you could store the metadata for all Wagons in a document store and pull an archive for installation according to its `package_name` and `package_version`. Note that the archive is named according to the Wheel naming convention (described under PEP0427/PEP0491) with some additions which you can read about here:

Specifically for Wheels containing C extensions on Linux, please head to this link.

How are the wheels spinning?

Wagon uses pip wheel’s built-in capabilities to create and install archives, which we call wagons. It first extracts some information from the request to download a certain package like its name and version; then, it downloads the package and its relevant dependencies to a single directory. If requirement.txt files are provided, it also handles them.

To attach a single platform to a set of wheels, Wagon iterates over the wheel files of the package and its dependencies and extracts the first platform that it can find and is not “any” (e.g. linux_x86_64, win32, etc.). This becomes the supported platform for the archive. Information about the OS is extracted and appended to the metadata and excluded packages are removed in the process.

Installation is done by extracting the archive and using the metadata to install the relevant package, which all resolves its dependencies.

Verification is done by actually installing the wagon and verifying that its metadata is intact.

More!

Wagon provides a slew of additional features like:

  • Creating and installing Wagons from local paths or URLs to GitHub like archives.
  • Excluding certain packages from your Wagon via the multi `--exclude` flag.
  • Fetching Wheels from requirement.txt files from URLs and local paths via the `-r` flag.
  • Explicitly specifying supported Python versions for a wagon via the multi `--pyver` flag.
  • Upgrading an already installed package via the `--upgrade` flag.
  • Installing into a specified or currently active virtualenv.
  • Choosing the Wagon format (zip/tar.gz, defaulting to tar.gz).
  • Passing additional parameters to `pip` when creating or installing Wagons.
  • And MOAR

Wagon currently supports Python 2.6.x and 2.7.x only but we’re planning on adding Python 3.x support at some point. Linux, Windows and OS X are supported. We’re hoping to formulate Wagon as the known format for packaging sets of Wheels for offline installation of Python packages.

For contributions and bug reports, see our official Wagon repo.

About the Author

Nir Cohen is an architect at GigaSpaces working on Cloudify and a co-organizer of DevOps Days Tel Aviv. Nir is a relatively short, brown eyed human being, loves animals and holds true to ethics as a life path. A thinker, who likes to walk long distances, breathe and eat lettuce. He likes to think of himself as an innovative, think-tank type of guy who adores challenges and has an extremely strong affinity to automation and system architecture. He's primarily driven by researching and deploying new systems and services. Find Nir on GitHub, follow him on Twitter, or read some of his other stuff.

Rate this Article

Adoption
Style

BT