Mor Shonrai

A blog covering everything from programming, data science, astronomy and anything that pops into my head.

Creating and Modifying Images with Apptainer

Apptainer logo: Image credit https://apptainer.org/

What is Apptainer?

Apptainer, formerly known as Singularity, is a “computer program that performs operating-system-level virtualization“, i.e. containerization. Containerization allows for applications to be created and executed in a controlled environment (or name namespace), without interfering with the host OS’s environment.

Apptainer addresses security concerns associated with other popular containerization software (for example Docker), making it popular in high-performance computing (HPC) environments. Apptainer enables developers to create and develop code in their preferred environment before packaging it into a Singularity Container Image (SCI). These images can be easily shared with others while ensuring reproducibility in various computing environments.

Let’s start using Apptainer!


There are numerous sources of prexisting images. Commonly used ones are DockerHub, GitHub Container Registry, and Library.

Apptainer can convert Docker images directly into Apptainer images. This is extremely useful, allowing us to piggyback on the work of others. For example, let’s say we would like to have the latest version of Python to run a test; we could simply use the latest Python Docker image on DockerHub using something like:

apptainer shell docker://python:latest

This would create an Apptainer image and start an interactive shell in a container generated from the python:latest image hosted on DockerHub.

apptainer shell docker://python:latest
INFO:    Using cached SIF image
Apptainer> python --version
Python 3.12.2

If we wanted to access a different version, for example if we needed Python 2.7, we could use:

apptainer shell docker://python:2.7
Apptainer> python --version
Python 2.7

Unless otherwise specified, the latest tag will be pulled from DockerHub.

Creating Custom Images

There are two methods to create or modify images:

  • Editing and existing image using a sandbox.
  • Building an image from a base image and definition file.

We’ll walk through these two methods to see the different use cases. Both methods are extremely useful tools to have in your belt.

Modifying images using a sandbox

When taking a base image, it is often the case that we’re missing packages or we would like to install other packages into that image. For example, the Python container we’ve been using doesn’t have ipython.

apptainer shell docker://python
INFO:    Using cached SIF image
Apptainer> ipython
bash: ipython: command not found

We can create a sandbox, essentially unpacking the contents of the image into a directory, allowing that image to be modified.

apptainer build --sandbox python_project docker://python
INFO:    Starting build...
Getting image source signatures
Copying blob 63941d09e532 skipped: already exists  
...
2024/03/25 11:18:20  info unpack layer: sha256:09527f...
INFO:    Creating sandbox directory...
INFO:    Build complete: python_project

Here we have run apptainer build --sandbox python_project docker://python. We specify the base image as docker://python (the latest version of the Python image on DockerHub), and we build a --sandbox directory with the name python_project.

If we look inside this directory, it will look very similar to what is at / (root directory) of your own machine:

ls python_project 
bin  boot  dev  environment  etc  home  lib  
lib64  media  mnt  opt  proc  root  run  sbin  
singularity  srv  sys  tmp  usr  var

We can then create a new container by executing the following command:

apptainer shell --writable python_project

Within the container, we can proceed to install ipython.

pip install ipython

To exit the image, type exit. To confirm that ipython is now installed, we can recreate the container using the following command:

apptainer shell python_project
Apptainer> ipython --version
8.22.2 

Note that we aren’t calling —writable because we no longer need the directory to be writable. We could have the container writable; however, this is bad practice as modifying a container unintentionally might break compatibility with others using the container.

Modifying images by first creating a sandbox is an excellent option id we want to make a slight change to a preexisting image.

Creating an image from a sandbox

We have now pulled an image from DockerHub, created a modified version of that image, and saved it to a directory.

Looking at the directory, we notice that it takes up a pretty substantial amount of memory:

du -sh ./python_project 
 1.1G ./python_project 

If we only have a single image that we work with, 1GB might not be too bad, but as we increase the complexity of the project and use more containers, this will quickly take up a lot of space.

We can build a SIF (Singularity Image Format) file from that directory using:

apptainer build python_project.sif python_project
INFO:    Starting build...
INFO:    Creating SIF file...
INFO:    Build complete: python_project.sif

Note that the format is apptainer build <output_name> <input_name>. This command creates a file called python_project.sif, which can be thought of as a compressed version of the python_project directory:

du -sh ./*
1.1G	./python_project
346M	./python_project.sif

In this case, the sif file takes up around 1/3 of the storage of the directory.

Before we take a look at building our own images, let’s take a quick look at the run and exec commands, and how to bind folders within our containers.

The apptainer run and apptainer exec commands

Containers can be given a predefined command to run. This command can be executed using apptainer run <image_name>. For example, with the python_project image:

apptainer run python_project.sif 
Python 3.12.2 (main, Mar 12 2024, 11:02:14) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

We can see that this puts us into a Python interpreter as designed by the maintainers of this image.

We can also specify the command to run. Consider the Python script hello.py:

print ("Hello, world!")
print ("Inside of container!")

We can run this script using the container:

apptainer exec python_project.sif python hello.py
Hello, world!
Inside of container!

apptainer run executes the default command specified by the image author, while apptainer exec allows you to run custom commands within a running container.

Binding folders within our container

By default, Apptainer will bind some commonly used directories (for example home, /tmp) to the container, allowing them to be accessed from within the container. This behavior may depend on your environment (for example, whether or not a read directory is automatically bound).


We can explicitly bind directories when executing/running commands using the following syntax:

apptainer exec -B </path/to/local/directory>:</path/to/container/mount/point> image.sif <command>

For example:

apptainer exec -B $HOME/Downloads:/Downloads python_project.sif bash

This command mounts the ~/Downloads folder to /Downloads within the container and runs bash to get an interactive terminal. We could then access the downloads from /Downloads.

We can bind multiple directories, for example:

apptainer exec -B $HOME/Downloads:/Downloads -B $HOME/Desktop:/Desktop  python_project.sif bash

By default, Apptainer will also mount the $HOME directory. This can occasionally be problematic; for example, if you have some config files stored in $HOME, they may be picked up when mounting the $HOME directory instead of the configs stored in the image. You can specify the $HOME directory using --home <dir_name>:

apptainer exec --home `pwd` python_project.sif bash

This command sets the $HOME to be the current directory when launching the container.

When binding directories, we can also specify the read/write permissions by adding an additional argument when binding a directory. We can specify that the user has read/write permission to a directory using rw or that it is read only using ro. For example:

apptainer exec --bind ~/Downloads/:/Downloads:ro python_project.sif touch /Downloads/test.txt
/usr/bin/touch: cannot touch '/Downloads/test.txt': Read-only file system

Here ~/Downloads on the host operating system is mounted to /Downloads within the container but specified as read only with the ro option. As a result, if we try to touch a file within that directory (which will either update the time stamp of an existing file or create a new empty file), it fails reporting that we have a read-only file system.

By default, apptainer will try to bind directories with read/write permissions. It is important to remember that apptainer will execute commands with the same permission level as the user. If we try to do something that the user doesn’t have permission to do, it will fail. For example:

apptainer exec --bind /usr/:/Downloads:rw python_project.sif touch /Downloads/test.txt

Here we’re attempting to bind the /usr directory of the host OS to the /Downloads directory within the container with read/write permission. We then try to touch a new file within that directory (which will be the /usr directory on the host OS!). The /usr directory is a very important directory on a Linux OS, as a result any unintended changes could be destructive to all users! Thankfully this command fails with a permission denied issue.

/usr/bin/touch: cannot touch '/Downloads/test.txt': Permission denied

This is because we’re not running this command a root or with sudo permission. Remember if you don’t know what your doing or what a command does… NEVER RUN IT AS SUDO!

Creating a custom image using a definition file

Similar to Docker, we can create an image from a file with a list of instructions.

Apptainer def files follow this format:

Bootstrap: docker
From: ubuntu:{{ VERSION }}
Stage: build

%arguments
    VERSION=22.04

%setup

%files

%environment
    
%post

%runscript

%startscript
    
%test

%labels

%help

We’ll walk through these sections one by one.

Preamble

Bootstrap: docker
From: ubuntu
Stage: build
  • Bootstrap specifies where we are getting the base image from. In this case, it’s docker (DockerHub).
  • From specifies the base image. In this case, it will grab the latest Ubuntu image.
  • Stage specifies the stage of the build. Multiple stages can be used to simplify the build process and reduce the final file size (more on this later).

Arguments

%arguments
    VERSION=22.04

Arguments are variables that can be used within the definition file. Using arguments allows us to change variables only in one place rather than multiple instances, preventing bugs.

In the above example, we’ve specified an argument VERSION=22.04. This argument is then accessed in the preamble when selecting the Ubuntu image version:

From: ubuntu:{{ VERSION }}

This specifies that we will be using ubuntu:22.04.

Setup

Setup commands are first executed outside of the container on the host system before starting to build the image.

For example, suppose we want to compress some files that will later be added to the container:

%setup
    tar -zcvf files.tar.gz ./*.txt

This command would compress all the files ending in .txt in the current directory into files.tar.gz (also in the current directory).

Files

This is where we can specify files to be copied into the container.

%files
    files.tar.gz /opt

Here, we are copying the files.tar.gz that was created in the %setup into the /opt directory of the image (/opt/files.tar.gz).

Environment

Here we specify environmental variables that we want set within the container.

%enviroment
    export PATH=$PATH:/app/bin
    export DEFAULT_PORT=8001

In this example, we set two environmental variables. First, we modify the PATH to include /app/bin, where the hypothetical binaries for our application reside. Second, we specify the DEFAULT_PORT to be 8001.

We can access these variables anytime within the container or the build process.

Post

In this section, we specify the command we want to run after the base image has downloaded. Environmental variables for the host system are not passed, so this can be considered a clean environment.

This will likely be the most detailed section of your definition script. For example:

%post
    apt-get update && apt-get install -y gcc
    pip install ipython

In the above example, we are simply updating the Ubuntu base image and installing gcc. We then install ipython using pip.

This is a simple example, but %post would be the section where dependencies would be installed and/or compiled.

Runscript

This is where we define a set of commands that will be executed when running apptainer run image.sif or when running the image itself as a command (e.g., ./image.sif).

Internally, these commands will form a simple script that will be executed.

%runscript
    ipython

This example will start an IPython interpreter. We could have something more complicated, such as:

%runscript
    echo "Received the following arguments $*"
    ipython $*

This will output the arguments passed before executing them with IPython. For example:

apptainer run ./jupyter.sif --version
Received the following arguments --version
8.22.2

Here, we’re passing --version as an argument. This gets passed and ran as ipython --version, which gives 8.22.2.

One could use the %runscript section to define a default behavior and how arguments are handled.

Startscript

This is similar to the %runscript section where we create a script to be run when running the container. Specifically, the %startscript runs when the container is launched as an instance rather than a process launched with run or exec. Instances can be considered more of a daemon, which will have a more passive interface. For example, an instance may monitor a port to receive a command that controls its behavior. It might be better to launch a web server as an instance.

Likewise, if you have multiple steps in a data pipeline, they could be passed between instances which are persistent compared to the analysis target.

Test

This defines a test script that is run at the end of the build process and can be used to ensure the validity of the built container.

For example, if we are building a data pipeline, we might want to make sure we get the expected answer.

%test
    python test_script.py
    if [ $? -eq 0 ]; then
        echo "Script executed successfully"
    else
        echo "Script failed"
        exit 1
    fi

Here we are running test_script.py. The output of this code will be accessible using $?, which returns the last return code.

    if [ $? -eq 0 ]; then

This line checks if the return code is 0, which is a typical code for a successful execution. In our Python code, we would have a line like:

if successful_test:
    exit(0)
else:
    exit(1)

If the code executes successfully, then the return will be 0; otherwise, it will be 1.

Labels

%labels
    Author myuser@example.com
    Version v0.0.1
    MyLabel Hello World

Here we define a set of labels that are viewable using the apptainer inspect command.

Versioning can be super important when developing an application. Maintaining an up-to-date version number can prevent a lot of headaches when trying to debug issues.

Help

Help specifies a help message that will be outputted:

%help
    This is a container with jupyter lab and notebook install

This can be accessed using:

apptainer run-help my_container.sif

Example definition script

Here is an example of a .def file which installs Jupyter, IPython, Matplotlib, and NumPy.

Bootstrap: docker
From: python:latest

%post
    pip install jupyter ipykernel jupyterlab notebook
    pip install matplotlib numpy

%environment
    export DEFAULT_PORT=8001

%runscript
    ipython $*

%startscript
    jupyter lab --port=$DEFAULT_PORT

This can be built with:

apptainer build jupyter.sif jupyter.def

The runscript will take arguments and pass them to IPython. For example:

./jupyter.sif hello.py
Hello, world!
Inside of container!

The startscript will start a Jupyter Lab on port 8001. This can be launched using:

apptainer instance start jupyter.sif jupyter-server 

When navigating to http://localhost:8001, we’ll notice that we need to log in. We can get a login code using:

apptainer exec instance://jupyter-server jupyter lab list
Currently running servers:
http://localhost:8001/?token=643b... :: /home/apptainer

Clicking on that link will log us in. We need to remember to stop the instance once we’re finished.

apptainer instance stop jupyter-server            

Multi-stage Builds

Multi-stage builds offer many benefits when generating a container image. As the name suggests, multi-stage builds allow us to build an image in multiple stages. This means that we can dedicate different stages of the build for different processes.

Let’s consider the following example. Let’s say we have a data analysis pipeline, which has some Rust code (see Rust Tutorial Part I for an introduction to Rust) that converts units from meters to millimeters. The code might look something like this:

use std::env;

// Function to convert meters to mm
fn convert_meters_to_mm(meters: f64) -> f64 {
    let mm = meters * 1000.0;
    mm
}

fn main() {
    // Get the command line arguments
    let args: Vec<String> = env::args().collect();
    // Parse 2nd argument to a string
    let meters = args[1].trim().parse::<f64>().expect("Failed to parse input");
    // Convert meters to mm
    let mm = convert_meters_to_mm(meters);
    println!("{}", mm);
}

The above code takes in a single argument which is assumed to be a float and multiples it by 1000, converting from meters to millimeters. We can compile this with:

rustc convert_units.rs

Which produces a binary called convert_units. To compile this, we need rustc (a Rust compiler) installed. Using a Ubuntu base image, our definition file would look something like this:

Bootstrap: docker
From: ubuntu
Stage: build

%files
    convert_units.rs /build/convert_units.rs

%post
    apt-get update && apt-get upgrade -y && apt-get install -y rustc
    rustc /build/convert_units.rs -o /bin/convert_units

%runscript
    /bin/convert_units $*

Here we have a single stage called build. In this stage, we copy the source code to the /build directory at the %files stage. In the %post stage, we update the OS and install rustc, a Rust compiler. Despite pulling the latest image by default, it is good practice to update your image to make sure you have all the latest security and bug fixes in place.

We then compile the code to /bin/convert_units. We then specify this as the entry point of the %runscript stage.

We can run this as:

./single_stage.sif 1.25
1250

You’ll notice that the convert_units.rs file is no longer needed once convert_units is compiled. Likewise, we only need rustc to compile convert_units; we don’t use it later in the file. We could turn this into a multi-stage build. In the first stage of the build we’ll install rustc and compile convert_units. In the second stage we’ll simply copy across the compiled binary convert_units.

Bootstrap: docker
From: ubuntu
Stage: build


%files
    convert_units.cpp /build/convert_units.cpp

%post
    apt-get update && apt-get upgrade -y && apt-get install rustc -y 
    rustc /build/convert_units.rs -o /bin/convert_units

Bootstrap: docker
From: ubuntu
Stage: final

%files from build
  /build/convert_units /bin/convert_units

%post
    apt-get update && apt-get upgrade -y

%runscript
    /bin/convert_units $*

The definition file is similar to the single_stage.def file; however, we have broken this up into two stages.

The first stage, tagged as build, will add the source file convert_units.rs to the image, update the OS, install rustc, and compile /build/convert_units.

The second stage, called final, uses the same Bootstrap and base image (Ubuntu) as the build stage. However, at the %files stage on line 17, we are only copying the /build/convert_units from the build stage to /bin/convert_units in the final stage. We still want to make sure we have an up-to-date OS (security updates are always important), so we still run apt-get update && apt-get upgrade -y. Finally, the %runscript stage is only included in the final stage.

We can see that we get the same behavior from both images:

./single_stage.sif 1.25 ; ./multi_stage.sif 1.25
1250
1250

However, when we look at the size of the files, we see a difference:

ls -lah ./*_stage.sif
-rwxr-xr-x 1 obriens obriens  59M Aug 27 08:36 ./multi_stage.sif
-rwxr-xr-x 1 obriens obriens 270M Aug 27 08:41 ./single_stage.sif

You’ll notice that the multi_stage.sif build is around 20% the size of single_stage.sif. This is partly due to the multi_stage.sif not containing the source code (convert_units.rs), but also due to it not containing rustc.

Image Size

Another benefit of working with Apptainer is the small size of the .sif files. For comparison, lets create a comparable docker image for our multi-stage convert_units application:

FROM ubuntu:latest AS build

RUN apt-get update && apt-get upgrade -y && apt-get install rustc -y 
COPY convert_units.rs /build/convert_units.rs
RUN rustc /build/convert_units.rs -o /bin/convert_units


FROM ubuntu:latest AS final
RUN apt-get update && apt-get upgrade -y
COPY --from=build  /bin/convert_units  /bin/convert_units

ENTRYPOINT ["/bin/convert_units"]

In the above we once again use a multi-stage build, with compiling of the source file occurring in the build stage and only the final executable and system updates. We can build this image with something like:

docker build -t ms_convert .

Once built, we can get the size of this image using:

docker images | grep 'ms_convert'
ms_convert    latest    9a3db9630821   6 minutes ago   170MB

The docker image is 170MB compared to the .sif image which is 59MB. This is almost one third the size of the docker image! The size of Apptainer images make them ideal systems with limited storage.

Summary

In this power we learned a little about how to use Apptainer to containerize an application. With Apptainer we can simply convert a preexisting Docker image into an Apptainer image, allowing us to build on preexisting efforts. We also learned how to build new images and customize and modify preexisting images. We looked at multi-stage builds and how they can help to further decrease the size of the image before finally comparing image sizes to an equivalent Docker image.

Apptainer is a lean and efficient containerization software that solves many security concerns of Docker, especially on shared systems (such as HPC clusters). This makes Apptainer a very useful tool to have in ones tool bag! In future posts, we’ll look in more details at some peculiarities of Apptainer and how we can use these to our advantage!

Leave a Reply