LongCut logo

Docker Core Concepts Every Developer Should Know

By LearnThatStack

Summary

Topics Covered

  • Containers Are Processes, Not VMs
  • Image Is Blueprint, Container Runs It
  • Layer Order Accelerates Builds
  • Volumes Persist Beyond Containers
  • Custom Networks Enable DNS

Full Transcript

If you've ever used Docker, your first time probably went like this. You clone

a project, someone tells you to just run Docker Compose up, and either it magically works or it doesn't. And in

both cases, you have no idea why. So,

you start memorizing commands. Docker

build, Docker run. You copy Docker files from the internet, and things mostly work until they don't, and then you feel lost because you don't understand what Docker is doing. In this video, we're

going to walk through the core concepts behind Docker. Not just the commands,

behind Docker. Not just the commands, but the ideas so that Docker makes more sense to you. Let's get into it.

Let's start by clearing up a common misconception about Docker. Docker is

not a virtual machine. It's not running a separate operating system for each container. This is a very common

container. This is a very common assumption people make. We've discussed

that in a separate video in more detail.

Strongly suggest you to watch that.

links in description. Let's recap that quickly.

So what actually happens? Your operating

systems kernel, the core part that manages memory, processes, networking, and file systems, has a set of features that allow it to isolate groups of processes from each other. On Linux,

these are called namespaces and croups.

Namespaces let the kernel give a process its own isolated view of the system, its own process tree, its own network interfaces, its own file system root.

Croups let the kernel limit how much CPU and memory a process can use. A virtual

machine runs a complete second operating system on top of your real one. That's

heavy, that's slow to start, that uses a lot of resources. A container is just a regular process on your machine running with these isolation features turned on.

It shares your actual kernel. That's why

containers start in seconds instead of minutes. They're not booting an OS.

minutes. They're not booting an OS.

They're just starting a process with some walls around it. Docker is the tooling that makes it easy to set up those walls to define what a process can

see, what files it has access to, and what resources it can use. That's the

whole idea. Now you might be thinking if containers need a Linux kernel, what happens on Mac OS or Windows? When you

install Docker on a Mac or Windows machine, it quietly runs a small Linux virtual machine in the background. Your

containers run inside that VM. So Docker

itself isn't a virtual machine, but on non- Linux systems, it needs one under the hood to provide the kernel that containers require. On Linux, containers

containers require. On Linux, containers can talk straight to your kernel. No VM

needed. That's also why Docker tends to feel a bit faster on Linux. Hopefully

with this understanding, a lot of confusing Docker behavior starts making sense. When you think of a container as

sense. When you think of a container as a process, not a machine, questions like, "Why can't my container see this file?" or "Why did my container lose its

file?" or "Why did my container lose its data when it was removed?" have obvious answers. A process doesn't have

answers. A process doesn't have persistent storage. A process sees only

persistent storage. A process sees only what you give it access to. Keep this in mind. We're going to build on it.

mind. We're going to build on it.

Okay. So, we know containers are isolated processes, but when you start using Docker, you'll hear two terms constantly, image and container. They're

related, but they're not the same thing.

An image is a blueprint. A container is a running instance of that blueprint. If

you're a developer, you already understand this relationship. A class

versus an object. a compiled binary versus a running process. A Docker image contains everything your application needs to run. The code, the runtime, the system libraries, the environment

variables, the configuration files. It's

a complete self-contained package, but it's not running. It's just a file sitting on disk. When you say docker run, docker takes that image, creates an isolated environment, and starts a

process inside it. That running thing is a container. What makes this useful? You

a container. What makes this useful? You

can create as many containers from the same image as you want. Just like you can create a hundred objects from the same class. Each container is

same class. Each container is independent. It has its own writable

independent. It has its own writable layer, its own process space, its own network identity, but they all start from the same base. And that's where consistency comes from. Everyone on your

team pulls the same image. Everyone gets

the same environment. The it works on my machine problem goes away because the image is the machine. Images are also immutable, readon. They never change

immutable, readon. They never change after they're built. If you modify files inside a running container, those changes only exist in that container's writable layer. They don't go back into

writable layer. They don't go back into the image. When that container is

the image. When that container is removed, maybe you ran Docker Compose down or you deployed a new container from the same image and your fix is gone. You didn't change the image. You

gone. You didn't change the image. You

change the container's writable layer.

And that layer only exists as long as that specific container does. Stopping

or restarting a container is fine. The

writable layer survives that. But the

moment the container is removed or replaced, everything in that layer is gone. If you need to change something

gone. If you need to change something permanently, change the Docker file.

Rebuild the image. That's the workflow.

The image is the source of truth, not the container. And if you need data to

the container. And if you need data to survive beyond a container's life, database records, uploads, logs, that's a different problem entirely, and we'll solve it with volumes shortly.

Docker could have stopped there.

Isolated processes built from blueprint images. But the way Docker structures

images. But the way Docker structures those images directly affects how fast your builds are. So it's worth understanding.

A Docker image is not one big blob. It's

built from layers stacked on top of each other. Each layer represents a change to

other. Each layer represents a change to the file system. Files added, modified, or removed. When you write a Docker

or removed. When you write a Docker file, instructions that modify the file system, like run, copy, and add, each create a new layer. Other instructions

like command, and environment, add metadata, but don't produce file system layers. Let's look at a simple example

layers. Let's look at a simple example from Node 20 Alpine. That's your base layer. It pulls in a minimal Linux

layer. It pulls in a minimal Linux distribution with NodeJS pre-installed.

That's already several layers created by whoever built the Node image. Worker

just sets the working directory for everything that follows like cding into a folder, but it creates it if it doesn't exist. Copy package.json that

doesn't exist. Copy package.json that

creates a new layer containing just that file. Run npm install that creates a new

file. Run npm install that creates a new layer with all the installed node modules. Copy dot that creates a layer

modules. Copy dot that creates a layer with the rest of your application code.

The reason this matters, Docker caches every layer. When you rebuild your

every layer. When you rebuild your image, Docker checks each instruction and if nothing has changed for that step and nothing has changed in any step before it, Docker reuses the cacheed

layer. It skips the work entirely. This

layer. It skips the work entirely. This

is why the order of instructions in your Docker file matters so much. Look at our example again. We copy package.json

example again. We copy package.json

first, then run npm install, and then copy the rest of your application code.

Why? Because your application code changes all the time, but your dependencies don't change nearly as often. By putting the dependency

often. By putting the dependency installation before the code copy, Docker can cache the npm install layer and skip it entirely when only your code changes. That can save you minutes on

changes. That can save you minutes on every build. If you had copied

every build. If you had copied everything first and then ran npm install, Docker would have to reinstall all your dependencies every single time you changed a line of code. Identical

result, very different build time.

Layers also make image distribution efficient. When you push or pull images,

efficient. When you push or pull images, Docker only transfers the layers that have changed. If two images share the

have changed. If two images share the same base layers, say they're both built on Node 20 Alpine, those shared layers are stored and transferred only once.

Once you start writing Docker files with a layer cache in mind, you'll feel the difference in build speed right away.

We've already seen a Docker file in the layer section. Now let's zoom in on it

layer section. Now let's zoom in on it and understand what each instruction is actually doing. Remember a Docker file

actually doing. Remember a Docker file is just a plain text file. Each line is an instruction. Docker executes them top

an instruction. Docker executes them top to bottom and the result is your image.

We covered from work dur copy and run in the layers section. The remaining

instructions are confusing to many devs. Environment sets environment variables.

These are available both during the build and when the container runs. Think

database URLs, API keys, feature flags.

Expose another culprit for confusion. It

doesn't actually open a port. It's

documentation. It tells anyone reading the Docker file and Docker tooling which port the application inside listens on.

You still need to publish the port at runtime with the -p flag. command. The

default command that runs when a container starts. The thing to know

container starts. The thing to know about command is that it doesn't run during the build. It's stored as metadata in the image and it's executed later when someone creates a container

from that image. You'll also see entry point and the difference between these two also confuses people. The short

version entry point defines the executable, the program your container runs. Command provides the default

runs. Command provides the default arguments. When someone overrides the

arguments. When someone overrides the command at runtime with docker run, they're replacing command not entry point. So if your entry point is node

point. So if your entry point is node and your command is server.js, running docker run myapp worker.js gives you nodeworker.js.

nodeworker.js.

The entry point stays. The arguments

change. You can override entry point 2 if you need to with the d- entry point flag, but you have to do it explicitly.

The point is that in normal usage, entry point is the stable part and command is the flexible part. In practice, if your container should always run the same program, use entry point. Use command

for the default arguments someone might want to change. There's also the build context to think about. When you run docker build dot, that dot tells docker

which directory to use as the build context, the set of files available for copy instructions. Docker's modern build

copy instructions. Docker's modern build engine is smart enough to only transfer files when they're actually needed, but it still scans the entire directory. And

if your copy instructions are broad, like copy dot dot, everything in that directory is fair game. If your project directory has a 2 GB data folder you don't need in the image, a broad copy

will still pull it in. This is why Docker ignore exists. It works just like ignore, excluding files from the build context entirely. A good Docker ignore

context entirely. A good Docker ignore can cut your build time dramatically.

So, at this point, you know how to define an image, build it efficiently, and run it as a container. But there's a problem we hinted at earlier that we still haven't solved. Remember what we

said earlier? Containers are processes,

said earlier? Containers are processes, and by default, when a container is removed, everything inside it is gone.

The right of a layer disappears. But

real applications need persistent data.

Databases need to store records.

Applications need to save uploads. Log

files need to survive restarts. That's

what volumes are for. A volume is a piece of storage that lives outside the container's life cycle. It's managed by Docker and it persists even after the container is removed. You can attach the

same volume to a new container and all the data is still there. There are two main ways to handle persistent storage.

Named volumes are fully managed by Docker. You create them. Docker decides

Docker. You create them. Docker decides

where they're stored on disk and you reference them by name. These are ideal for production data, database files, application stake, anything that needs to survive container replacements. Here

my data is the volume name and fordvarlib postgress data is where it gets mounted inside the container.

Postgress writes its data to that path and the data lives in the volume safe from container removal.

Bind mounts map a specific directory on your host machine directly into the container. These are what make Docker

container. These are what make Docker practical for development. Without them,

every time you change the line of code, you'd have to rebuild the entire image to see the result. Your local source folder is now visible inside the container at app source. Edit a file on

your laptop and the container sees the change immediately. No rebuild needed.

change immediately. No rebuild needed.

One gotcha to be aware of with bind mounts, when you bind mount a directory to a path inside the container, it completely obscures whatever was at that path in the image. If your image has

files at app source and you bind mount an empty directory there, those image files are hidden. They're still in the image layer. They're just covered up by

image layer. They're just covered up by the mount. Named volumes behave a bit

the mount. Named volumes behave a bit differently though. If you mount an

differently though. If you mount an empty named volume to a path that already has files in the image, docker actually copies those image files into the volume the first time. After that,

the volumes contents take over. This is

useful. It's how database images like Postgress seed their data directory. But

it only happens once on the first use of an empty volume. It's a subtle distinction, but it can be confusing when you expect bind mounts and named volumes to behave the same.

Containers are isolated. That's the

whole point. But that creates two problems you need to solve. First, how

does your host machine talk to a container? And second, how do containers

container? And second, how do containers talk to each other? Let's start with the first one. Remember, a container has its

first one. Remember, a container has its own isolated network name space. If your

app is listening on port 3000 inside the container, that port is only visible inside that container. Your host machine can't reach it. If you open localhost

3000 in your browser, nothing happens.

The -p flag creates a bridge. -p 3000

col 3000 means when something hits port 3000 on my host machine, forward it to port 3000 inside the container. The left

number is the host port. The right

number is the container port. And these

numbers don't have to match. -p8803000

means host port 8080 container port 3000. This is actually how you run

3000. This is actually how you run multiple copies of the same container.

They all listen on port 3000 internally, but you map each one to a different port on your host machine. Once you see it this way, the syntax makes sense. It's

always host container. That handles host to container. But the second problem is

to container. But the second problem is container to container. Your web server needs to talk to your database. Your API

needs to talk to your cache. Docker

solves this with networks.

When you install Docker, it creates a default network called bridge. Any

container you run gets attached to this network unless you say otherwise.

Containers on the same network can reach each other. There's a caveat though. On

each other. There's a caveat though. On

the default bridge network, containers can only reach each other by IP address.

That's fragile and inconvenient because container IPs can change. on a custom network one you create yourself. Docker

provides automatic DNS resolution.

Containers can reach each other by name.

Now inside the API container, you can connect to the database at DB5322.

Docker resolves the name DB to the correct container IP automatically. No

hard-coded addresses, no fragile configuration. And if you've ever

configuration. And if you've ever wondered how Docker Compose services find each other by name, this is exactly the mechanism. Compose creates a custom

the mechanism. Compose creates a custom network behind the scenes. We'll see

that in the next section. The default

behavior here isn't the best behavior.

Always create a custom network for your application. It gives you DNS, better

application. It gives you DNS, better isolation, and more control. Networks

also provide isolation between groups of containers. Your front-end containers

containers. Your front-end containers don't need to talk to your database directly. Put them on different

directly. Put them on different networks. The front end talks to the

networks. The front end talks to the back end. The backend talks to the

back end. The backend talks to the database and the front end can't reach the database at all. Clean separation.

At this point, you might be thinking, if I need to manage images, containers, volumes, networks, port mappings, and environment variables. That's a lot of

environment variables. That's a lot of Docker run flags. And you'd be right.

For anything beyond a single container, managing everything with CLI commands is miserable. That's the problem Docker

miserable. That's the problem Docker Compose solves. Docker Compose lets you

Compose solves. Docker Compose lets you define your entire application stack, every service, every network, every volume in a single YAML file. Look at

how readable this is. Two services, API and DB. The API builds from a local

and DB. The API builds from a local Docker file. The database uses the

Docker file. The database uses the official Postgres image. A named volume keeps the database data persistent.

Environment variables configure both services and depends on tells compose to start the database before the API.

Something to watch out for though in this short form depends on only controls startup order. It does not wait for the

startup order. It does not wait for the database to actually be ready yet.

Docker will start the Postgres container first and then immediately start the API. Even if Postgres hasn't finished

API. Even if Postgres hasn't finished initializing yet, your API might crash on its first connection attempt because the database isn't accepting

connections. The fix is to use the long

connections. The fix is to use the long form of depends on with condition service healthy. That way, Compose will

service healthy. That way, Compose will actually wait for a health check to pass before starting your API or more commonly build retry logic into your

application's database connection code so it can handle a brief delay. Either

way, don't assume depends on means ready. One command brings everything up.

ready. One command brings everything up.

One command tears everything down.

Compose automatically creates a custom network for your stack. So the API service can reach the DB service by name exactly like we discussed in the networking section. You don't need to

networking section. You don't need to configure that yourself. One thing worth understanding about compose, it's not a different system. It's not a separate

different system. It's not a separate orchestrator. It's just a convenient way

orchestrator. It's just a convenient way to run the same Docker build, Docker run, Docker network create, and Docker volume create commands you'd run

manually. It takes your declarative EML

manually. It takes your declarative EML file and translates it into those imperative Docker commands.

Understanding the underlying concepts we covered today means you'll always understand what Compose is doing on your behalf.

Let's zoom out. So, we saw that Docker isn't a set of commands to be memorized, but a system underneath to be understood. We saw why your container

understood. We saw why your container couldn't see a file. It has its own isolated file system. And why your data disappeared, the writable layer lives

and dies with the container. Why your

builds were slow. Instruction order

determines what gets cached. why your

services couldn't find each other because the default network doesn't do DNS. These aren't separate topics.

DNS. These aren't separate topics.

They're one connected model. Everything

else you encounter, multi-stage builds, health checks, orchestration with Kubernetes builds directly on these concepts. If this was useful, leaving a

concepts. If this was useful, leaving a like helps the video reach other developers. And if there's a topic you'd

developers. And if there's a topic you'd want a deeper dive on, the comments are the best place to tell us. Thanks for

watching and see you in the next

Loading...

Loading video analysis...