From Chaos to Conformity: Docker for Data Scientists | and Egor Howell

Introducing and using Docker for Data Scientists

Towards Data Science
Photo by Ian Taylor on Unsplash

But does it work on my machine?

This is an old meme in today’s community, especially for Data Scientists who want to send their amazing model of machine learning, to find out that the machine has a different behavior. The right distance.


There is an answer for these strange things called containers and control tools such as Docker.

In this post, we’ll get down to the basics and how to build and run using Docker. Using Docker containers has become an industry standard and common practice for data products. As a Data Scientist, learning these tools is the most valuable tool in your arsenal.

Docker is a service that helps to create, run and deploy code and applications in containers.

Now you may be wondering, what is a container?

Obviously, a container is very similar to a virtual machine (VM). It’s a small remote location where everything is ‘fixed’ and can be controlled by any machine. The main selling point of VMs is their mobility, allowing your application or brand to run seamlessly on any server, local machine, or cloud platform such as. AWS.

The main difference between containers and VMs is how they use their computing resources. Containers are very lightweight because they do not share the hardware of the host machine. I won’t delve into the technical details here, but if you want to understand a little more, I’ve linked a good article explaining their differences here.

Docker is the tool we use to easily create, manage and run these containers. This is one of the main reasons why metal has become so popular, because it allows programmers to easily use programs and models that run anywhere.

Photo taken by the author.

There are three things we need to run a container using Docker:

  • Dockerfile: A script file that contains instructions on how to build docker. picture
  • Docker image: An image or template to create a Docker container.
  • Docker container: A remote environment that provides everything a program or learning machine needs to run. It includes things like dependencies and OS versions.
Photo taken by the author.

There are also some important points to keep in mind:

  • Docker Daemon: The reverse path (daemon) which deals with incoming requests to docker.
  • Docker Client: A shell interface that enables the user to communicate with Docker through its daemon.
  • DockerHub: Similar to GitHun, a place where developers can share their Docker images.

a person

The first thing you need to set up is Homemade food (link here). This is called ‘missing package manager for MacOS’ and is very useful for anyone who writes on their Mac.

To install Homebrew, just run the command provided on their website:

/bin/bash -c "$(curl -fsSL"

Verify Homebrew is installed and running brew help.


Now with Homebrew installed, you can install docker by running it brew install docker. Verify that docker is installed and running which docker the output should not raise errors and look like this:



The last step, is the setup Colima. In short, run install colima and confirm that it was installed by which colima. Again, the output should look like this:


Now you might be wondering, what on earth is Colima?

Colima is a software program that helps container time on MacOS. In many ways, Colima creates a space for containers to work on our systems. To do this, it runs on a Linux system that has a daemon that Docker can connect to using the client-server model.

Alternatively, you can install Docker desktop instead of Colima. However, I like Colima for several reasons: its free, very light and I like working in the terminal!

Check out this blog post here for more information about Colima


Below is an example of how Data Scientists and Machine Learning Engineers can deploy their model using Docker:

Photo taken by the author.

The first step is obviously to build their amazing model. Next, you need to wrap all the things you’re using to run the example, things like the python version and package dependencies. The last step is to use the required file inside the Dockerfile.

If this seems a little confusing to you at this point don’t worry, we’ll go over this step by step!

Basic Model

Let’s start by making a basic example. The code snippet provided shows a simple implementation of the Random Forest the type of groups in the common iris group:

Dataset from Kaggle with CC0.

GitHub Gist author.

This file is called for evidence.

Create the Required File

Now that we have our model ready, we need to create a requirement.txt file to install all the dependencies that support the running of our model. In this simple example, we luckily rely only on scikit-learn package. Therefore, our requirement.txt it will look like this:


You can check the version you are running on your computer by scikit-learn --version command.

Create a Dockerfile

Now we can create our Dockerfile!

So, in the same way as for requirement.txt and basic_rf_model.pycreate a file named Dockerfile. Inside Dockerfile we will have the following:

GitHub Gist author.

Let’s go through it line by line to see what it means:

  • FROM python:3.9: This is the base image of our image
  • MAINTAINER This shows who maintains the photo
  • WORKDIR /src: Sets the working directory of the image to src
  • COPY . .: Copy the current files to the Docker directory
  • RUN pip install -r requirements.txt: Install the necessary from requirement.txt file in the Docker environment
  • CMD ["python", ""]: They tell the container to issue a command python and run the model

Start Colima & Docker

The next step is to set up the Docker environment: First we need to start Colima:

colima start

After Colima starts, make sure the Docker commands are running:

docker ps

It should produce something like this:


This is great and means that Colima and Docker are working as expected!

Notice: and docker ps command lists all running containers.

Build a Picture

Now it’s time to build our first Docker Image from Dockerfile what we have done above:

docker build . -t docker_medium_example

The -t The flag shows the name of the image and . tells us to create from this page.

If we now run docker imageswe should see things like this:

Photo courtesy of the author.

Thank you, the image is created!

Run Container

After creating the image, we can run it as a container using IMAGE ID is written above:

docker run bb59f770eb07


Accuracy: 0.9736842105263158

Because all he did was chase script!

More information

This tutorial focuses on what Docker can do and use. There are many features and rules to learn to understand Docker. My detailed tutorials are provided on the Docker site which you can find here.

One cool thing is that you can run the container in the interactive mode and enter its shell. For example, if we run:

docker run -it bb59f770eb07 /bin/bash

You will enter the Docker container and it should look like this:

Photo by author.

We used it again ls Command to display all files in the Docker working directory.

Docker is the best container and tools to ensure that Data Scientists models can run anywhere and anytime without problems. They do this by creating small sub-regions that have everything to make the model work. This is called a container. They are easy to use and lightweight, which makes them popular in today’s industries. In this article, we have gone through a basic example of how to deploy your instance into a container using Docker. The process was simple and straightforward, so it’s something that Data Scientists can learn and pick up quickly.

All the code used in this article is available on my GitHub here:

(All emojis created by OpenMoji – an open source project for emoji and images. Permission: CC BY-SA 4.0)

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *