Diving into Docker (Part 2): Images

In the previous blog...

We learned about some fundamental concepts of Docker and its architecture. We saw why we need Docker in the first place and how it is solving the problems and making things easier than the good old VMs.

In this blog, we are going to understand the core part of Docker, Images. We will see the difference between Images and Containers, and also learn about the Image registries.

We will see the commands that help us pull, create, and run the Docker image and much more. So let's get started

What is a Docker image?

A Docker image is a ready-to-use package that has everything an app needs to run, such as:

The application code
The app’s required libraries
Basic operating system files

If you are a developer, you can think of an image like a class blueprint. You use it to create real running things.

Docker images are mainly used at build time, while containers are what you use at run time when the app is actually running.

Where do Docker images come from?

Most of the time, you do not create images from scratch. You pull them from a place called an image registry, and the most common registry is Docker Hub.

So when we pull an image from the Docker Hub, the Docker daemon downloads it to your computer (your Docker host). After that, Docker can use it to start one or more containers

Images are made of layers

A Docker image is not one single file. It is built from many read-only layers stacked on top of each other.

Inside these layers, we have a small cut-down operating system and all the required files and dependencies for the app. Which means you don't get the full system with all the things.

Docker stacks the layers and shows them as one unified image.

When you pull an image like this:

docker image pull ubuntu:latest

Each line that ends with “Pull complete” is basically one layer being downloaded.

You can also inspect layers using:

docker image inspect ubuntu:latest

The idea here is, every image starts with a base layer, and when we add things (like installing Python or adding your app code), Docker adds new layers on top. For example, we have this Ubuntu image. And then we add Python to it, and then we add source code.

Now, the final image becomes a stack of layers in that order.

Image from Docker Deep Dive Zero to Docker in a single book - Nigel Poulton

(Source: Docker Deep Dive - Nigel Poulton)

Docker also has a system called a storage driver that handles stacking these layers. There are many drivers, but the experience feels the same to you.

Images and containers: how they connect

We will learn about containers in the next blog, but for now, just understand that you can start containers from an image using commands like, docker container run and docker service create. Once a container is created from an image, the image and container are linked, and we cannot delete the image until all containers using it are stopped and deleted. If you try, Docker will give an error.

Also, the purpose of a container is usually to run one app or one service, so the image should only include what that app needs.

For example:

If the app does not need a shell, the image should not include many shells
Docker images also do not include a kernel. All containers on a machine share the host machine’s kernel

Image registries and repositories

An image registry is a central place to store images (for example, Docker Hub).
Image registries contain one or more image repositories. In turn, image repositories contain one or more images.

Docker Hub has two types of repositories:

Official repositories
- Checked and curated by Docker
Unofficial repositories
- Can be risky
- Do not assume they are safe, well-documented, or built correctly

Image names and tags (how you pull the right one)

For official images, pulling is usually:

docker image pull <repository>:<tag>

Examples:

docker image pull alpine:latest
docker image pull redis:latest
docker image pull mongo:4.2.6
docker image pull busybox:latest

If you run:

docker image pull alpine

Docker assumes you mean:

alpine:latest

Two important notes about `latest`

If you do not specify a tag, Docker assumes latest. And If the repo does not have a latest tag, the pull will fail.

latest does not mean “newest”. It is just a label. For example, in Alpine, the newest is often tagged edge. So be careful when using latest.

Pulling images from unofficial repos

Same idea, but you include the username or org name:

docker image pull someusername/imagename:version

Pulling from other registries (not Docker Hub)

If the image is in another registry, like there are google registries too, so to get that we include the registry’s DNS name:

docker image pull gcr.io/google-containers/git-sync:v3.1.5

Searching Docker Hub from the command line

You can search Docker Hub using:

docker search alpine

It searches for repos that match a string in the NAME field
NAME is the repository name

To show only official repos:

docker search alpine --filter "is-official=true"

Pulling images by digest

Pulling by tag is common, but there is a problem: Tags can change. Someone can push a new version using the same tag. Then you might not know which exact version your running systems are using

For example, you have exampleimage:1.5 with a known bug. You fix it and push it again as exampleimage:1.5Now two different images have used the same tag name over time. It becomes hard to know what is running where

Docker uses a content-based ID called a digest, which is a cryptographic hash.

The idea here is that if the image content changes, the digest changes. So digests are unchangeable, and they uniquely identify the exact image

When you pull an image, Docker often shows the digest:

docker image pull alpine

You can list digests locally:

docker image ls --digests ubuntu

And you can pull the exact same image again using the digest:

docker image pull ubuntu@sha256:d1e2e92...7e495eff4f9

This makes sure you get exactly the image you expect.

Multi-architecture images

Different machines can have different Operating systems (Linux, Windows), different CPU types (x64/amd64, ARM, etc.).

For example, your laptop might be Linux on x64, A Raspberry Pi is Linux on ARM, A Windows server might be Windows on x64

Before multi-architecture support, you had to manually pick the right image, which was confusing.

So, when you run

docker pull golang:latest

Docker automatically detects your OS and CPU, pulls the correct version for your system. You don’t need to specify anything.

How does Docker do this?

Docker uses two important things in the registry:

Manifest List: This is nothing but a list of supported platforms for that image tag
- Example: Linux/amd64, Linux/arm, Windows/amd64
Manifests: Each platform has its own Manifest. And that Manifest lists the layers and configuration for that platform

So the Flow is:

docker pull golang:latest
        |
Docker checks manifest list
        |
Finds match for your OS + CPU
        |
Downloads correct layers

Docker also lets you build images for other CPU types using buildx.

Example:

docker buildx build --platform linux/arm/v7 -t myimage:arm-v7 .

You can even build ARM images while working on an x64 machine. How cool is that!!

Deleting Docker images

To delete an image:

docker image rm 02674b9cb179

Delete multiple images:

docker image rm f70734b6a266 a4d3716dbb72

Important rules:

Deleting an image removes the image and its layers.
But if a layer is shared by multiple images, it will not be removed until all those images are deleted.
If an image is used by a running container, Docker will not let you delete it
- Stop and delete the container first

Shortcut to delete all images (force):

docker image rm $(docker image ls -q) -f

# Here $(docker image ls -q) will list all the images. Its a shortcut.

Conclusion

This is more than enough for you to understand everything about Docker images right now. In the next blog, we will dive into Containers and see what a container is and how it actually works under the hood. Until then...

https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExbjFtZG8yY3V6azR5ejRoY2p4eXltcWdzcDVrMXVhcjlzdnVrcG5zayZlcD12MV9naWZzX3NlYXJjaCZjdD1n/2nlbKhgnvAK3sR8ffw/giphy.gif

Diving into Docker (Part 2): Images

In the previous blog...

What is a Docker image?

Where do Docker images come from?

Images are made of layers

Images and containers: how they connect

Image registries and repositories

Image names and tags (how you pull the right one)

Two important notes about `latest`

Pulling images from unofficial repos

Pulling from other registries (not Docker Hub)

Searching Docker Hub from the command line

Pulling images by digest

Multi-architecture images

How does Docker do this?

Deleting Docker images

Conclusion

Comments

Dive into Docker

Diving into Docker (Part 3) : Container

More from this blog

100 Blogs Later: Lessons from Code, Content, and Consistency

A Practical Journey from Application to Distributed Systems - Part 6

A Practical Journey from Application to Distributed Systems - Part 5

A Practical Journey from Application to Distributed Systems - Part 4

A Practical Journey from Application to Distributed Systems - Part 3

Command Palette

In the previous blog...

What is a Docker image?

Where do Docker images come from?

Images are made of layers

Images and containers: how they connect

Image registries and repositories

Image names and tags (how you pull the right one)

Two important notes about latest

Pulling images from unofficial repos

Pulling from other registries (not Docker Hub)

Searching Docker Hub from the command line

Pulling images by digest

Multi-architecture images

How does Docker do this?

Deleting Docker images

Conclusion

Comments

Dive into Docker

Diving into Docker (Part 3) : Container

More from this blog

Two important notes about `latest`