Dockerizing with Distroless

Dockerizing with Distroless

There are many tutorials that effectively step through the process of Dockerizing an application. These are often wonderful resources that truly help improve understanding and move applications into the exciting world of containers.

However, to enhance security and minimizing container size by using Distroless images, the steps forward are not quite as clear. Keep on reading for a guide along the path of the subtleties of optimization with Distroless images. We’ll look at a couple of languages of choice for examples: JavaScript (Node.js), Java, and Python.

Let’s Set the Scene

Dockerizing an application is the process of converting an application to run within a Docker container. The outcome is a much more portable and rapidly deployable application

Distroless Docker images were pioneered by Google to improve security and container size. Typically, security scanning tools help to protect the image and small Linux distributions help to hone container size and performance. Distroless images addresses these topics with images that contain only: the application, its resources, and language runtime dependencies, no operating system distribution. This approach creates a smaller attack surface, reduces compliance scope, and results in a small, performant image. Google has made these for some popular languages. Checkout this jFrog talk from a Google staff engineer on Distroless to learn more about the topic. Disclaimer: Google gets the full advantages of Distroless in tandem with Bazel, but that doesn’t mean it’s still not for you.

Basic Dockerization

Let’s quickly summarize the normal process for application Dockerization. In essence, you’ll start with a base image (from Docker Hub), add your application code with its dependencies, and then configure a few elements likes the ports exposed. For this example, let’s only use Node.js.

Let’s take this stupidly simple Express application that runs on port 8080 in server.js.

const express = require('express')

const app = express()
app.get('/', (req, res) => {
res.send('Hello There')
 })

app.listen(8080, '0.0.0.0')

With this package.json describing its dependencies.

{
 "name": "docker_web_app",
 "version": "1.0.0",
 "main": "server.js",
 "dependencies": {
 "express": "^4.16.1"
 }
}

Now, for Dockerization. ???? 

Let’s create a Dockerfile to containerize this app.

# Start with the node 12 base image from Docker Hub
FROM node:12
 
# Create app directory
WORKDIR /usr/src/app
# Bundle source, assuming the Dockerfile lives in the app's code
COPY . .
# Install app dependencies
RUN npm ci --only=production
 
# Expose the port the server runs on
EXPOSE 8080
 
# Define a default command to start the server
CMD [ "node", "server.js" ]

Finally, build and run the image (executed in the same directory as the Dockerfile)

docker build -t yourusername/repository-name .

docker run -p 3000:8080 yourusername/repository-name

Once more, with Distroless

Now that we are back to business, let’s realize and admit that your ecosystem and approach will greatly impact Distroless use. However, the goal is to address enough topics and show many examples so you can adapt appropriately.

Because Distroless images have no operating system, a multi-stage Docker build is used to perform some config work upfront and then selectively copy artifacts into the Distroless image.

In the first stage of the build, the application is typically copied into the build-env image. Next, perform some actions like dependency installation or certificate configuration. Finally, move the necessary items into the distroless image. Let’s look at some simple examples:

Node.js

 # Use general node image as builder and install dependencies

FROM node:10.17.0 AS build-env
ADD . /app
WORKDIR /app
RUN npm ci --only=production

# Copy application with its dependencies into distroless image
FROM gcr.io/distroless/nodejs
COPY --from=build-env /app /app
WORKDIR /app
CMD ["server.js"]

In this case, the application that is brought into the build-env image (line 3), is not any type of an archive file. So, the application is simply copied into the distroless image after the application’s dependencies are installed.

Java

# Use openjdk image as builder and build a jar
FROM openjdk:11-jdk-slim AS build-env
ADD . /app/examples
WORKDIR /app
RUN javac examples/*.java
RUN jar cfe main.jar examples.HelloJava examples/*.class
 
# Copy the jar into the distroless image
FROM gcr.io/distroless/java:11
COPY --from=build-env /app /app
WORKDIR /app
CMD ["main.jar"]

Here, the app has no dependencies and is compiled into a .jar file. Peak at a few jar command arguments if needed for reference. Now, this example is simple; just compiling some straight Java files into a .jar file. However, using Gradle or Maven would be a very similar in approach.

Python

FROM python:3-slim AS build-env
 
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
 
# Now setup distroless and run the application:
FROM gcr.io/distroless/python3
 
WORKDIR /app
# Set Virtual ENV
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
 
# Copy the source code into the distroless image
COPY --from=build-env /app /app
 
CMD ["hello.py", "/etc"]

In this Python example, we follow the same copy, install, and move to Distroless pattern. In the Distroless image, we configure the Python virtual environment in the elegant fashion.This approach was a big win to streamlining our Dockerfile.

Additionally, depending on use case, the multi-stage build might not be a necessity. For example, coping in a ready-to-go .jar file might be all that’s needed. Let’s begin to look at some more details that are a little more real world (especially for the enterprise).

Vendoring Dependencies

In the previous Docker examples, all of the dependencies are declared in the source code and installed during the Docker build. Vendoring dependencies shifts that concept a bit. Package vendoring is storing the application’s dependent packages within the project.

The source code still explicitly declares the dependencies and they live outside the source code repository. Vendoring is instead about the artifact that is created for deployment, not the source code.

This approach is needed in certain situations. For example, you might be operating within an ecosystem where compliance regulation dictates the application artifact be centrally stored, using the “frozen” artifact for multiple environment or platforms. Audit is typically the driver for procedures of this nature.

In this same vein, the (likely automated) environment building the Docker image might only have access to internal, company registries or maybe no outside internet access to download these dependencies. In this case, before building the image, the ci/cd process can download the dependencies for later use / installation.

Certificates

When operating within an enterprise, it’s often needed to account for and handle everyone’s favourite thing: certificates. Certificates are actually not too big of a deal, just place the cert in the appropriate place within the distroless image. Also, depending on the application’s language/framework, the cert might need to be consumed by the application as well.

Debugging

Since Distroless images lack shell access, debugging can be quite the challenge. Fortunately, there is a corresponding debug image for each language that Distroless supports. The debug image provides a BusyBox shell to enter. If you’re not familiar with the Swiss army knife of Linux, learn more about BusyBox. At the minimum, know you’ll be able navigate and use the editor: vi. That ability alone should drastically improve your debugging process.

Add the :debug tag to change the final image in the multi-stage Dockerfile. Like so:

FROM gcr.io/distroless/python2.7:debug

Also, if the image already has a tag, add -debug. For example, java-debian10:11-debug.

Then build and launch with a shell entrypoint:

$ docker build -t my_fancy_image .
$ docker run --entrypoint=sh -ti my_fancy_image/app
# ls
BUILD       Dockerfile  hello.py 

Don’t be afraid to hop into the image and take a peak at structure or change files. This approach can be quite effective to quickly understand and resolve issues.

When Distroless

Distroless images always bring the described benefits. However, there is a bit of a learning curve to start using them in an ecosystem or project. There is some overhead with using them too; such as the multi-stage build that is often needed. On that note, when deciding to use Distroless evaluate the use case. Use the information and examples here to think about the pros and cons and choose the best path forward.

Summary:

Well, you’ve done it! You made it all the way through this rambling explanation of Distroless image use. Now, don’t let anything “contain” your efforts. Aren’t you glad I saved the puns till the end of the article? I hope you won’t “dock” me for it. Anyway, take the examples here and benefit from my many hours of struggle and research. Adapt this information to fit your ecosystem and environment to benefit from Distroless images.


ZippyOPS Provide consulting, implementation, and management services on DevOps, DevSecOps, Cloud, Automated Ops, Microservices, Infrastructure, and Security

Services offered by us: https://www.zippyops.com/services

Our Products: https://www.zippyops.com/products

Our Solutions: https://www.zippyops.com/solutions

For Demo, videos check out YouTube Playlist: 

https://www.youtube.com/watch?v=4FYvPooN_Tg&list=PLCJ3JpanNyCfXlHahZhYgJH9-rV6ouPro

If this seems interesting, please email us at [email protected] for a call.

 

Relevant Blogs: 

Docker volumes

Managing docker volumes using docker compose

Types of Nodes in Openstack

Amazon Web Services


Recent Comments

No comments

Leave a Comment