Introducing and using Docker for Data Scientists
But does it work on my machine?
This is an old meme in today’s community, especially for Data Scientists who want to send their amazing model of machine learning, to find out that the machine has a different behavior. The right distance.
There is an answer for these strange things called containers and control tools such as Docker.
In this post, we’ll get down to the basics and how to build and run using Docker. Using Docker containers has become an industry standard and common practice for data products. As a Data Scientist, learning these tools is the most valuable tool in your arsenal.
Docker is a service that helps to create, run and deploy code and applications in containers.
Now you may be wondering, what is a container?
Obviously, a container is very similar to a virtual machine (VM). It’s a small remote location where everything is ‘fixed’ and can be controlled by any machine. The main selling point of VMs is their mobility, allowing your application or brand to run seamlessly on any server, local machine, or cloud platform such as. AWS.
The main difference between containers and VMs is how they use their computing resources. Containers are very lightweight because they do not share the hardware of the host machine. I won’t delve into the technical details here, but if you want to understand a little more, I’ve linked a good article explaining their differences here.
Docker is the tool we use to easily create, manage and run these containers. This is one of the main reasons why metal has become so popular, because it allows programmers to easily use programs and models that run anywhere.
There are three things we need to run a container using Docker:
- Dockerfile: A script file that contains instructions on how to build docker. picture
- Docker image: An image or template to create a Docker container.
- Docker container: A remote environment that provides everything a program or learning machine needs to run. It includes things like dependencies and OS versions.
There are also some important points to keep in mind:
- Docker Daemon: The reverse path (daemon) which deals with incoming requests to docker.
- Docker Client: A shell interface that enables the user to communicate with Docker through its daemon.
- DockerHub: Similar to GitHun, a place where developers can share their Docker images.
The first thing you need to set up is Homemade food (link here). This is called ‘missing package manager for MacOS’ and is very useful for anyone who writes on their Mac.
To install Homebrew, just run the command provided on their website:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Verify Homebrew is installed and running
Now with Homebrew installed, you can install docker by running it
brew install docker. Verify that docker is installed and running
which docker the output should not raise errors and look like this:
The last step, is the setup Colima. In short, run
install colima and confirm that it was installed by
which colima. Again, the output should look like this:
Now you might be wondering, what on earth is Colima?
Colima is a software program that helps container time on MacOS. In many ways, Colima creates a space for containers to work on our systems. To do this, it runs on a Linux system that has a daemon that Docker can connect to using the client-server model.
Alternatively, you can install Docker desktop instead of Colima. However, I like Colima for several reasons: its free, very light and I like working in the terminal!
Check out this blog post here for more information about Colima
Below is an example of how Data Scientists and Machine Learning Engineers can deploy their model using Docker:
The first step is obviously to build their amazing model. Next, you need to wrap all the things you’re using to run the example, things like the python version and package dependencies. The last step is to use the required file inside the Dockerfile.
If this seems a little confusing to you at this point don’t worry, we’ll go over this step by step!
Let’s start by making a basic example. The code snippet provided shows a simple implementation of the Random Forest the type of groups in the common iris group:
Dataset from Kaggle with CC0.
This file is called
basic_rf_model.py for evidence.
Create the Required File
Now that we have our model ready, we need to create a
requirement.txt file to install all the dependencies that support the running of our model. In this simple example, we luckily rely only on
scikit-learn package. Therefore, our
requirement.txt it will look like this:
You can check the version you are running on your computer by
scikit-learn --version command.
Create a Dockerfile
Now we can create our Dockerfile!
So, in the same way as for
basic_rf_model.pycreate a file named
Dockerfile we will have the following:
Let’s go through it line by line to see what it means:
FROM python:3.9: This is the base image of our image
MAINTAINER firstname.lastname@example.org: This shows who maintains the photo
WORKDIR /src: Sets the working directory of the image to src
COPY . .: Copy the current files to the Docker directory
RUN pip install -r requirements.txt: Install the necessary from
requirement.txtfile in the Docker environment
CMD ["python", "basic_rf_model.py"]: They tell the container to issue a command
python basic_rf_model.pyand run the model
Start Colima & Docker
The next step is to set up the Docker environment: First we need to start Colima:
After Colima starts, make sure the Docker commands are running:
It should produce something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
This is great and means that Colima and Docker are working as expected!
docker pscommand lists all running containers.
Build a Picture
Now it’s time to build our first Docker Image from
Dockerfile what we have done above:
docker build . -t docker_medium_example
-t The flag shows the name of the image and
. tells us to create from this page.
If we now run
docker imageswe should see things like this:
Thank you, the image is created!
After creating the image, we can run it as a container using
IMAGE ID is written above:
docker run bb59f770eb07
Because all he did was chase
This tutorial focuses on what Docker can do and use. There are many features and rules to learn to understand Docker. My detailed tutorials are provided on the Docker site which you can find here.
One cool thing is that you can run the container in the interactive mode and enter its shell. For example, if we run:
docker run -it bb59f770eb07 /bin/bash
You will enter the Docker container and it should look like this:
We used it again
ls Command to display all files in the Docker working directory.
Docker is the best container and tools to ensure that Data Scientists models can run anywhere and anytime without problems. They do this by creating small sub-regions that have everything to make the model work. This is called a container. They are easy to use and lightweight, which makes them popular in today’s industries. In this article, we have gone through a basic example of how to deploy your instance into a container using Docker. The process was simple and straightforward, so it’s something that Data Scientists can learn and pick up quickly.
All the code used in this article is available on my GitHub here:
(All emojis created by OpenMoji – an open source project for emoji and images. Permission: CC BY-SA 4.0)