Mor Shonrai

A blog covering everything from programming, data science, astronomy and anything that pops into my head.

How To Containerize Jupyter Kernels With Docker And Apptainer

In this tutorial we’ll look at how to use a containerized Jupyter kernel.

The first question one might have is why would we want to containerize a Jupyter kernel?

We’re often faced with the problem of having complex dependencies that might be difficult to install on ones system.

For example, let’s say we’re using Windows or Mac OS. But, one of the packages we want to use is only available for Linux.

We have a few solutions:

  • Dual boot Linux: This is when we split our computer’s hard drive in two and install Linux on one side of the hard drive, with the original operating system remaining on the other. A downside of this is that we need to partition our hard drive, reducing the storage available to either partition. This is not something we can do on a shared system like a high performance computing (HPC) cluster.
  • Virtual Machine (VM): In this setup we’d have a Linux virtual machine set up on the Window’s system. This setup requires a full installation of the guest operating system, Linux. This installation can use a significant amount of storage space on the host’s system. VMs also need resources (e.g. CPU, RAM, GPU) to be allocated to the VM. This means that those resources are no longer available to the host system’s scheduler. For example, if the VM is allocated 20% of the CPU allocation and is sitting idle, the host system can only access 80% of the CPU allocation. The remaining 20% goes unused. The availability of VMs on a shared system will be highly depended on the system.
  • 100% containerized workflow: With this option, we’d launch Jupyter hub from a container. This isn’t a bad solution, but it doesn’t allow for swapping between different kernels. Imagine we have a dependency that now only works in Windows and not Linux. We’d need to operate two different Jupyter lab instances. One would be from the host system and the other containerized. This approach allows us to swap between different workflows. This would work in an HPC environment. Yet, an HPC system will likely need queuing such a job as a normal job. This may require further layers of ssh-tunneling to go from a login node to an analysis node. HPC clusters which support Jupyter environments often use a web portal for interaction. For example, Digital Research Alliance of Canada supports this system. The portal uses a python environment based on versions available on the cluster.

The most promising and flexible solution is to use a containerized workflow.

Nonetheless, it would be nice if we use this containerized python environment within the preexisting JupyterHub web portal.

In this post we’ll do just that.

Apptainer or Docker?

For this tutorial, we’ll work in both Apptainer and Docker to show how this can be done using either containerization software. For one’s personal machine, it might be preferable to use Docker. However, for HPC environments such as DRAC (Digital Research Alliance of Canada, Narval, Beluga, Cedar, Niagra) we’ll use Apptainer.

Let’s build an Image

Let’s build a simple small image with installs some python packages from a requirements file. For this we’ll use the Python 3.12-slim image (see DockerHub) as our base image. We’ll use the following requirements.txt file as our requirements:

numpy
matplotlib
scipy
ipykernel
ipython


Docker image

FROM python:3.12-slim

COPY ./requirements.txt /build/requirements.txt

RUN pip install -r /build/requirements.txt

CMD ["python"]

Which can be built with the tag kernel as:

docker build -t kernel .

Apptainer image

Let’s use the file below kernel.def:

Bootstrap: docker
From: python:3.12-slim

%files
    ./requirements.txt /build/requirements.txt

%post
    pip install -r /build/requirements.txt

%runscript
    python

Which can be built to the file kernel.sif as:

apptainer build kernel.sif kernel.def

Installing a custom kernel

Now that we have our image created, let’s create a custom Jupyter kernel.

python -m ipykernel install --user --name custom-kernel --display-name="custom-kernel"

Which installs the kernel in my home directory, e.g.:

Installed kernelspec custom-kernel in 
/home/obriens/.local/share/jupyter/kernels/custom-kernel

Creating the kernel.json file (/home/obriens/.local/share/jupyter/kernels/custom-kernel/kernel.json):

{
 "argv": [
  "/home/obriens/miniforge3/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "custom-kernel",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

Notice that this file will be used as a template when launching new kernels. The {connection_file} entry will be replace with the name of the connection file for the new kernel. We want to modify this file with our containerized kernel:

Docker

{
 "argv": [
     "docker",
     "run",
     "--network=host",
     "--rm",
     "-v",
     "{connection_file}:/connection_file",
     "kernel",
     "python",
     "-m",
     "ipykernel_launcher",
     "-f",
     "/connection_file"
 ],
 "display_name": "custom-kernel (docker)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

Note that we pass the keyword --network=host, and we’re mounting the {connection_file} from the host to /connection_file within the container.

Apptainer

{
 "argv": [
     "apptainer",
     "exec",
     "--bind",
     "{connection_file}:/tmp/connection_file",
     "/path/to/kernel.sif",
     "python",
     "-m",
     "ipykernel_launcher",
     "-f",
     "/tmp/connection_file"
 ],
 "display_name": "custom-kernel (apptainer)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

Note we’re binding {connection_file} from the host to /connection_file within the container.

Launching the custom kernel

With the image created, the kernel installed and kernel.json modified, we can now launch a Jupyter lab instance and connect to the kernel using (feel free to use any custom options such as the instance port):

jupyter lab

Now we can access the launcher by clicked the “New Launcher” button or pressing Ctrl + Shift + L:

Once pressed, we can see an option to launch a Notebook or Python console using our custom kernel:

If we’re using the Docker kernel, then we can monitor the container using any of the standard methods we’d use for monitoring docker containers, for example:

docker ps
CONTAINER ID   IMAGE                     COMMAND                  CREATED              STATUS              PORTS                                         NAMES
0c39ccf064d0   obriens/kernel:latest     "python -m ipykernel…"   About a minute ago   Up About a minute                                                 determined_chatelet

Or to read the logs of that container:

docker logs 0c39ccf064d0
NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see https://github.com/ipython/ipython/issues/2049


To connect another client to this kernel, use:
    --existing /connection_file

If we shut down the kernel from Jupyter lab (Kernel->Shut Down All Kernels), you’ll notice that the container is deleted upon shut down (docker ps).

Some Peculiarities

Number of Containers Created

If we launch multiple notebooks within Jupyter lab, we’ll have multiple containers spawned, for example with 4 notebooks:

docker ps
CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS          PORTS NAMES
6830bcd6b669   obriens/kernel:latest     "python -m ipykernel…"   32 seconds ago   Up 31 seconds competent_zhukovsky
bcec50f54eed   obriens/kernel:latest     "python -m ipykernel…"   34 seconds ago   Up 34 seconds recursing_noether
c64e9ab35482   obriens/kernel:latest     "python -m ipykernel…"   38 seconds ago   Up 37 seconds blissful_solomon
20bc592ceb4c   obriens/kernel:latest     "python -m ipykernel…"   41 seconds ago   Up 41 seconds boring_ardinghelli

This might be the desired behavior, however having an individual container for each instance might be very memory and resource consuming. Instead, we could start an instance of some container and then attach any new kernels to that container instead. For this we need to create some “daemon” that will keep the container process alive. Let’s modify the Dockerfile to create an infinite process:

FROM python:3.12-slim

COPY ./requirements.txt /build/requirements.txt

RUN pip install -r /build/requirements.txt
RUN echo "#!/bin/bash\nwhile true; do sleep 1; done" >> /build/keep_alive.sh ; chmod a+x /build/keep_alive.sh 

CMD ["/build/keep_alive.sh"]

Here we’re creating a script called /build/keep_alive.sh. This scrip contains a while loop that will loop while true, i.e. until terminated and then simply sleep for 1 second.

Aside on docker stop

Note: When running docker stop, Docker will send a SIGTERM signal to the process and wait, by default, 10 seconds before sending a SIGKILL signal. We can keep this in mind when setting the sleep time. If the sleep time is > 10 seconds then the process will be killed. If < 10 seconds then the process will attempt to gracefully terminate. This can help prevent data corruptions by ensuring files are properly closed or clean up any child processes managed by this instance. This is something to keep in mind when designing your image.

We then set the start script to be /build/keep_alive.sh, so that when we start this container, it will default to this script and run until interrupted.

Next we need to create a start script called launch_docker.sh that we’ll call when trying to start/restart a Jupyter kernel:

#!/bin/bash

# Get the filename of the connection file
connection_file=$(basename $1)
container_name="server"
image_name="obriens/kernel:latest"

# Check if the server is currently runnong
if [ "$( docker container inspect -f '{{.State.Running}}' $container_name )" = "true" ]; then
    echo "Server is running"
else
    # if not then start the server
    echo "Starting the server"
    # Get the path of where the connection files will be stored
    connection_path="$(jupyter --runtime-dir)"
    # Stop and remove the container if it already exists
    docker stop $container_name
    docker rm $container_name
    # Create a new container
    # Add it to the host network and mounting the connection_path
    docker create --name=$container_name -v  $connection_path:/connections --network=host  $image_name
    # Start this server instance
    docker start  $container_name
fi

# Launch a ipykernel using the connection file that was passed as arg 1
docker exec  $container_name python -m ipykernel_launcher -f /connections/$connection_file

We can then make this executable with:

chmod +x launch_docker.sh

Finally, we can modify the kernel.json file to call this script:

{
 "argv": [
     "/path/to/launch_docker.sh",
     "{connection_file}"
 ],
 "display_name": "custom-kernel (docker)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

With this new kernel.json file and launch_docker.sh in place, we can now relaunch jupyter lab and open multiple notebooks.

If we now check the containers that we have running with docker ps:

CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS          PORTS NAMES
e2a7475eb14f   obriens/kernel:latest        "/build/keep_alive.sh"   15 minutes ago   Up 15 minutes  server

We can see that only one kernel is running.
The container can be stopped at any time with:

docker stop server

And removed with:

docker rm server

One downside is that we’ll need to manually stop the container once we’re finished.

What about Apptainer?

Similarly, we can limit the number of Apptainer containers by starting our container as an instance and then connecting to that instance.

Apptainer doesn’t require a daemon script (like /build/keep_alive.sh), so we don’t need to modify the image.

Let’s use the following script (launch_apptainer.sh)to launch the Apptainer instance:

#!/bin/bash

# Get the filename of the connection file
connection_file=$(basename $1)
container_name="server"
image_name="/path/to/kernel.sif"
# Check if the server instance is currently running
if [ $(apptainer instance list $container_name | wc | awk '{print $1}') -ge 2 ]; then
    echo "Server is running"
else
    # if it isn't start an instance
    echo "Starting the server"
    # Bind the runtime-dir (where the connection files are) to /connections within the container
    apptainer instance start --bind `jupyter --runtime-dir`:/connections $image_name $container_name
fi

# Attach and run ipykernel_laucher
apptainer exec instance://$container_name python -m ipykernel_launcher -f /connections/$connection_file

Making it executable with:

chmod +x launch_apptainer.sh

The instance can be stopped with:

apptainer instance stop server 

Networking and Port mapping

You might have noticed that we didn’t need to pass any ports to the docker image. Instead, we simply add the container to the host network. In this set up, the host network handles all communication between the container and Jupyter lab. The Jupyter instance manages the ports being used.

We don’t actually need to know what ports should be mapped within the container. This is in contrast to launching a Jupyter instance from within a container. Likewise we don’t need to worry about mounting the working directories! This greatly simplifies the setup. Ports are determined at launch, without the need to reconfigure a run command of a docker-compose file.

Summary

In this post we discussed some of the benefits and use-cases for containerizing a Jupyter kernel. We looked at containerizing kernels using either Docker or Apptainer images.

This method greatly simplifies mounting and porting issues. These problems might arise when an inexperienced user tries to run Jupyter from a container. Instead, we rely on the host Jupyter instance to handle all the file IO and porting. This lets the container focus on executing the code. This allows developers to do what they do best, develop code, with the environment containerized to provide complete reproducibility!