menu
May 26, 2020

Improving the security of Data Science containers - Using Docker's seccomp profiles and Linux capabilities features

Jacobus Herman

Data Engineer

No one wants to be the person who exposed sensitive information through their container and caused a hefty GDPR fine, right? What then should data scientists do to improve the security of their containers? I aim to address these questions by providing a practical guide to improve Docker container security within a data science context. Two features will be used to achieve this, seccomp and Linux capabilities.

The main objective is to have a container with its information as secure as possible. This translates into limiting the risk of privilege escalation and keeping data (i.e. files and directories) as private as possible. Most of this endeavour makes use of discretionary access control, ownership, and other security functionalities of the Linux kernel. While familiarity with these topics is needed to understand what is being done, links are given for further information.

The first security measure is creating a non-root system user that will run an application in a container. Next is using seccomp to limit what can be done inside a container. The last measure is dropping Linux capabilities as a way of limiting the superuser’s privileges. Finally, all the security measures are applied to a data science use case to show the usability thereof.

Avoiding the use of a superuser

The first step to securing a container is to create a limited-access user (i.e. non-root). Why? Superuser privileges are generally not needed, which is in line with the principle of least privilege. This practice is also recommended by Docker in their user section.

A user can be created using a RUN instruction, however, doing so would prevent a fixed UID and GID. As a result, a user and group are created with a script as shown below.

The script to create a limited-access user.

Note the fixed UID and GID, which is useful when controlling permissions outside a container or between containers. Furthermore, the user has no home directory and no login capability. The reason is that when a container runs, no one will log in as that user and there will be no interactive shell. More information on the addgroup and adduser commands can be found here.

After creating the script, it can be used in a Docker image as shown below.

The Dockerfile for limited-access user creation and usage.

It may be tempting to stop here and think that a limited-access user is sufficient. However, by default, any information created in Linux is readable by others. There is also the ease of circumventing the limited-access user. As an example, build and start a container with:

docker build -t limited_user:latest .docker run --rm --cpus=".1" --cpuset-cpus 0 --name doc-sec -d limited_user

Then connect to the container from another terminal session with:

docker exec -it -u 0 doc-sec /bin/sh

Doing so clearly circumvents the limited-access user to become the superuser. Fortunately, these issues can be addressed with seccomp.

Limiting exploitable functionality with seccomp

Seccomp, in essence, provides the ability to limit system calls. In the current case, the idea is to block those calls that allow switching to the superuser and setting permissions where other users can read and write data (e.g. chmod 666).

By default, Docker applies a seccomp profile that specifies which system calls are allowed and, optionally, when they can be allowed. Docker’s default profile can be used as the starting point for a more stringent profile. Adding a system call is accomplished by creating a JSON object in the syscalls array. For example, consider the filtering of the setreuid system call as shown below.

An example seccomp profile to filter the setreuid system call.

This filter states that the system call with the name setreuid will be allowed when its first argument (index 0) has a value not equal to zero and its second argument (index 1) also has a value not equal to zero. To determine the arguments of a system call, one should find its specification in the system calls page. For example, the setreuid specification is shown below.

The setreuid and setregid system call specification.

In general, then, the names property is an array containing one or more system calls to filter. The action property should be one of the “Valid action values” as documented here. The args array allows the filter to be applied based on the evaluation of the system call arguments. Each object in this array represents an operation (op) that must be from the “Valid comparison op values” as documented here. The operation will determine if value and/or valueTwo should be used and what their meanings are.

The stringent seccomp profile

Similar to the setreuid example above, I created a more stringent seccomp profile that restricts 15 system calls. The profile is too long to show but is available here. It accomplishes the following objectives:

  • It prevents changing to the superuser UID or GID, which is 0 in both cases.
  • It prevents enabling permissions for others to read or write directories and files. This does not, however, enforce the correct permissions when files or directories are created, which is addressed later.

Applying the seccomp profile is done at container creation with the --security-opt seccomp="./seccomp_profile.json" argument. After creating a container, notice that docker exec -it -u 0 now fails with an error similar to that shown below. This verifies that the first objective of the seccomp profile is accomplished.

$ docker exec -it -u 0 doc-sec /bin/sh
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "setup user: operation not permitted": unknown

Enforcing better file permissions

As mentioned previously, there is still the need to enforce the correct permissions on files and directories upon their creation. This problem is solved using umask and a working directory with proper permissions and ownership. There is, however, a limitation on umask in that it only applies to a session. Therefore, an entrypoint script is needed to set the umask before executing your intended application. An example of such a script is given below.

The entrypoint script to set the umask before executing another command.

Creating a working directory with proper permissions and ownership is most easily shown in the completed Dockerfile below. The information of interest is in lines 15–18. Executing those lines, yields a working directory that is sealed off from everyone except the limited-access user, the group to which it belongs, and the superuser.

The completed Dockerfile incorporating all the previously mentioned security measures.

Restricting superuser privileges with Linux capabilities

The final security measure is to drop Linux capabilities thereby limiting the damage if someone were to become the superuser. Linux capabilities are units that encompass the privileges granted to the superuser. Disabling a capability removes permissions from the superuser. By default, Docker removes certain capabilities while others are retained. The current concern is to drop those capabilities that relate to the superuser’s permissions and ownership.

Investigating the Docker default allowed capabilities shows at least three capabilities related to permissions and ownership, namely CHOWN, DAC_OVERRIDE, and FOWNER. By dropping these capabilities, the superuser will be similar to a regular user with regards to its power over files and directories. These capabilities are dropped by adding the --cap-drop=CHOWN --cap-drop=DAC_OVERRIDE --cap-drop=FOWNER arguments to docker run.

An alternative (more secure) strategy is to drop all capabilities using --cap-drop=ALL and then selectively add only those capabilities needed with the --cap-add= argument. Though this will be time-consuming and still does not replace our previous efforts.

Applying the measures to a data science use case

To verify that the security measures are usable, they were applied to the data science service created for the Docker for Data Scientists webinar. Three containers are used, the first (iris_trainer) to train a model, the second (iris_predictor) to expose the trained model via a Flask API, and the last (iris_frontend) to provide a web application that uses iris_predictor. Explaining the whole example and its technicalities are beyond the current scope (see the GitHub repository). What follows is only a brief overview of the important topics and difficulties.

The trainer and predictor containers made use of Docker’s multi-stage builds feature. Due to the size of the Dockerfile, it is not shown, but can be found here. Both containers use a secured Ubuntu 18.04 base image that should ideally be stored in a registry for reuse. However, since this was a development effort the multi-stage build avoided the need for a registry.

The frontend container also uses a multi-stage build and is located here. Building the AngularJS application (in the container) was the most difficult. In particular, npm install wanted to create a HOME directory. Therefore, the HOME environment variable was set to the same as the container working directory. Additionally, note that the frontend_builder image is unnecessary if your application were to be built in a build pipeline.

Docker Compose was used to integrate the containers into a service (its file is located here). In the compose file, the seccomp profile and Linux capabilities are configured globally. The service can be easily built and run with:

docker-compose build
docker-compose up -d

The service does the following:

  1. It trains a random forest model and saves it to the volume attached at /home/app/model.
  2. It serves a model loaded from the volume attached at /home/app/model. Note that it may be necessary to restart the service if the model was not trained by the time the iris_predictor was started.
  3. It runs an AngularJS web application that serves as a UI to send two input numbers to the model and display the returned output. The web application is accessed through http://localhost:4200/.

Once the service is running successfully, browsing to the web application should yield the following.

The landing page of the AngularJS application.

Entering the inputs 7 and 1 and clicking on the submit button should show a value of 1.92, as illustrated below.

The output of the AngularJS application after entering the corresponding input.

This completes the data science use case. Although the example given here may not be as thorough as what you have, it does show that despite adding more security, containers with Python/Flask, AngularJS, and Nginx still work. There is, of course, no guarantee that this will function in all situations. However, I do believe that even in such situations a little effort (without simply removing all security measures) can result in more secure containers.

Conclusion

Improving the security of Docker containers might seem like a daunting task; not everyone wants to be a security expert and often there is also other data science to be practised. I hope, however, that by reading this post you can use the code or methodology to improve your containers. At the very least you have a reference for creating a limited-access user, applying seccomp and using Linux capabilities. So until next time, keep protecting information and avoid those GDPR fines!