Docker – a remedy to the problem of complexity of the programming environment

W-MOSZCZYNSKI ps 11-24

What is the problem of configuring the environment?

A development environment is a set of tools, libraries, dependencies, and operating systems in which an application operates. Let’s take the Python language as an example. It’s a popular programming language that requires many libraries, such as NumPy, Pandas, or TensorFlow, to function.

Python is a programming language, while libraries are specialized programs – some are used for grouping and organizing data, like the aforementioned Pandas, while others, like TensorFlow or PyTorch, are exclusively for creating neural networks. While a neural network or data organization could be implemented in Python alone, specialized libraries make the process much simpler and faster. When I started working with Python, I discovered libraries and enthusiastically adopted them. Unfortunately, using different libraries simultaneously often led to conflicts and application crashes. It was frustrating; something that should work without issue caused disruption and loss of ready-made solutions. This is how I learned about the need to configure environments, which is a particularly time-consuming and sometimes very irritating task.

Docker – A Remedy for the Complexity of Development Environments

Each of these libraries has its versions, released at different times. Over time, new versions may appear, which are not always fully compatible with the previous ones. For example, an application written in Python 3.6 may work correctly with NumPy version 1.18 but stop working with the newer version 1.21. These version compatibility issues are a common headache for developers. Python 3.6 works with the Pandas library version 0.25.3, but the new version 1.0.0 already requires a newer version of Python, e.g., 3.7. In such cases, developers either have to update their entire system, which can cause other conflicts, or find a way to run these libraries in a compatible way. Here, the Anaconda platform can help by allowing easy management of different Python and library versions.

Anaconda as a Solution

Anaconda is an environment that solves the problem of library version compatibility. It allows the creation of isolated environments where each application has access to its specific library versions, eliminating conflicts between different library versions. Although Anaconda is very effective at handling libraries within the Python ecosystem, this issue also arises in other programming languages and platforms. Anaconda is essentially a package of frequently used libraries that integrates with software installed on a computer. However, a single package of libraries is rarely sufficient for completing a project, and additional libraries must continuously be added. In theory, adding another library could disrupt the software’s integrity within the Anaconda ecosystem. This scenario has been anticipated by its creators, who introduced a solution called pip.

What is pip?

The acronym pip stands for “pip installs packages.” It’s a recursive acronym, meaning that the expansion contains the name itself. Pip is a tool for installing and managing Python packages, primarily sourced from the Python Package Index (PyPI). Pip automatically installs, updates, and manages the dependencies required for an application or project to function in Python. It is the standard package manager that allows the installation, updating, and removal of libraries and dependencies essential for working with Python. Pip downloads packages from the official Python Package Index repository, which contains a vast number of libraries for various tasks, from scientific calculations to web application development. This tool is extremely useful, as the programmer does not have to worry about compatibility issues or dependencies.

The Idea of Containing the Entire Ecosystem in a “Box”

Software consists not only of source code but also of a set of libraries, tools, and dependencies that must be configured appropriately to work together correctly. To address this, someone came up with the brilliant idea of “boxing” the entire application ecosystem into a single portable environment. This approach allows all dependencies and libraries to be packaged into a single container, which will work the same way regardless of the environment in which it’s run. While a similar solution is offered by Anaconda, Docker allows such issues to be addressed on a broader scale.

What is Docker?

Docker is an open-source platform that enables the creation, deployment, and execution of applications within isolated containers. The idea for Docker originated in 2013 when Solomon Hykes, the founder of dotCloud, began working on a tool to simplify application deployment across different environments. Docker’s history starts with the need to create a tool that would allow running applications in a consistent environment on different machines – local, testing, or production – without requiring manual configuration of each one.

Docker comprises several key components:

Containers – lightweight, portable environments where applications operate. Each container includes all the dependencies required for the application to function.
Docker Engine – the Docker engine manages containers and is responsible for running and isolating them.
Docker Images – saved application environments that can be run as containers. A Docker image is a “snapshot” of the application and its dependencies.
Dockerfile – a configuration file containing instructions for building a Docker image.

Creating Docker in Practice

Creating Docker is simple. To create a container for an application, the programmer writes a configuration file called a Dockerfile. Here is an example of a simple Dockerfile in a few lines:

bash
Copy code
FROM python:3.8
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD [“python”, “app.py”]

This configuration file defines the Python base image version 3.8, sets the working directory, copies files to the container, installs dependencies from the requirements.txt file, and runs the app.py application.