Docker Core Concepts Every Developer Should Know
By LearnThatStack
Summary
Topics Covered
- Containers Are Processes, Not VMs
- Image Is Blueprint, Container Runs It
- Layer Order Accelerates Builds
- Volumes Persist Beyond Containers
- Custom Networks Enable DNS
Full Transcript
If you've ever used Docker, your first time probably went like this. You clone
a project, someone tells you to just run Docker Compose up, and either it magically works or it doesn't. And in
both cases, you have no idea why. So,
you start memorizing commands. Docker
build, Docker run. You copy Docker files from the internet, and things mostly work until they don't, and then you feel lost because you don't understand what Docker is doing. In this video, we're
going to walk through the core concepts behind Docker. Not just the commands,
behind Docker. Not just the commands, but the ideas so that Docker makes more sense to you. Let's get into it.
Let's start by clearing up a common misconception about Docker. Docker is
not a virtual machine. It's not running a separate operating system for each container. This is a very common
container. This is a very common assumption people make. We've discussed
that in a separate video in more detail.
Strongly suggest you to watch that.
links in description. Let's recap that quickly.
So what actually happens? Your operating
systems kernel, the core part that manages memory, processes, networking, and file systems, has a set of features that allow it to isolate groups of processes from each other. On Linux,
these are called namespaces and croups.
Namespaces let the kernel give a process its own isolated view of the system, its own process tree, its own network interfaces, its own file system root.
Croups let the kernel limit how much CPU and memory a process can use. A virtual
machine runs a complete second operating system on top of your real one. That's
heavy, that's slow to start, that uses a lot of resources. A container is just a regular process on your machine running with these isolation features turned on.
It shares your actual kernel. That's why
containers start in seconds instead of minutes. They're not booting an OS.
minutes. They're not booting an OS.
They're just starting a process with some walls around it. Docker is the tooling that makes it easy to set up those walls to define what a process can
see, what files it has access to, and what resources it can use. That's the
whole idea. Now you might be thinking if containers need a Linux kernel, what happens on Mac OS or Windows? When you
install Docker on a Mac or Windows machine, it quietly runs a small Linux virtual machine in the background. Your
containers run inside that VM. So Docker
itself isn't a virtual machine, but on non- Linux systems, it needs one under the hood to provide the kernel that containers require. On Linux, containers
containers require. On Linux, containers can talk straight to your kernel. No VM
needed. That's also why Docker tends to feel a bit faster on Linux. Hopefully
with this understanding, a lot of confusing Docker behavior starts making sense. When you think of a container as
sense. When you think of a container as a process, not a machine, questions like, "Why can't my container see this file?" or "Why did my container lose its
file?" or "Why did my container lose its data when it was removed?" have obvious answers. A process doesn't have
answers. A process doesn't have persistent storage. A process sees only
persistent storage. A process sees only what you give it access to. Keep this in mind. We're going to build on it.
mind. We're going to build on it.
Okay. So, we know containers are isolated processes, but when you start using Docker, you'll hear two terms constantly, image and container. They're
related, but they're not the same thing.
An image is a blueprint. A container is a running instance of that blueprint. If
you're a developer, you already understand this relationship. A class
versus an object. a compiled binary versus a running process. A Docker image contains everything your application needs to run. The code, the runtime, the system libraries, the environment
variables, the configuration files. It's
a complete self-contained package, but it's not running. It's just a file sitting on disk. When you say docker run, docker takes that image, creates an isolated environment, and starts a
process inside it. That running thing is a container. What makes this useful? You
a container. What makes this useful? You
can create as many containers from the same image as you want. Just like you can create a hundred objects from the same class. Each container is
same class. Each container is independent. It has its own writable
independent. It has its own writable layer, its own process space, its own network identity, but they all start from the same base. And that's where consistency comes from. Everyone on your
team pulls the same image. Everyone gets
the same environment. The it works on my machine problem goes away because the image is the machine. Images are also immutable, readon. They never change
immutable, readon. They never change after they're built. If you modify files inside a running container, those changes only exist in that container's writable layer. They don't go back into
writable layer. They don't go back into the image. When that container is
the image. When that container is removed, maybe you ran Docker Compose down or you deployed a new container from the same image and your fix is gone. You didn't change the image. You
gone. You didn't change the image. You
change the container's writable layer.
And that layer only exists as long as that specific container does. Stopping
or restarting a container is fine. The
writable layer survives that. But the
moment the container is removed or replaced, everything in that layer is gone. If you need to change something
gone. If you need to change something permanently, change the Docker file.
Rebuild the image. That's the workflow.
The image is the source of truth, not the container. And if you need data to
the container. And if you need data to survive beyond a container's life, database records, uploads, logs, that's a different problem entirely, and we'll solve it with volumes shortly.
Docker could have stopped there.
Isolated processes built from blueprint images. But the way Docker structures
images. But the way Docker structures those images directly affects how fast your builds are. So it's worth understanding.
A Docker image is not one big blob. It's
built from layers stacked on top of each other. Each layer represents a change to
other. Each layer represents a change to the file system. Files added, modified, or removed. When you write a Docker
or removed. When you write a Docker file, instructions that modify the file system, like run, copy, and add, each create a new layer. Other instructions
like command, and environment, add metadata, but don't produce file system layers. Let's look at a simple example
layers. Let's look at a simple example from Node 20 Alpine. That's your base layer. It pulls in a minimal Linux
layer. It pulls in a minimal Linux distribution with NodeJS pre-installed.
That's already several layers created by whoever built the Node image. Worker
just sets the working directory for everything that follows like cding into a folder, but it creates it if it doesn't exist. Copy package.json that
doesn't exist. Copy package.json that
creates a new layer containing just that file. Run npm install that creates a new
file. Run npm install that creates a new layer with all the installed node modules. Copy dot that creates a layer
modules. Copy dot that creates a layer with the rest of your application code.
The reason this matters, Docker caches every layer. When you rebuild your
every layer. When you rebuild your image, Docker checks each instruction and if nothing has changed for that step and nothing has changed in any step before it, Docker reuses the cacheed
layer. It skips the work entirely. This
layer. It skips the work entirely. This
is why the order of instructions in your Docker file matters so much. Look at our example again. We copy package.json
example again. We copy package.json
first, then run npm install, and then copy the rest of your application code.
Why? Because your application code changes all the time, but your dependencies don't change nearly as often. By putting the dependency
often. By putting the dependency installation before the code copy, Docker can cache the npm install layer and skip it entirely when only your code changes. That can save you minutes on
changes. That can save you minutes on every build. If you had copied
every build. If you had copied everything first and then ran npm install, Docker would have to reinstall all your dependencies every single time you changed a line of code. Identical
result, very different build time.
Layers also make image distribution efficient. When you push or pull images,
efficient. When you push or pull images, Docker only transfers the layers that have changed. If two images share the
have changed. If two images share the same base layers, say they're both built on Node 20 Alpine, those shared layers are stored and transferred only once.
Once you start writing Docker files with a layer cache in mind, you'll feel the difference in build speed right away.
We've already seen a Docker file in the layer section. Now let's zoom in on it
layer section. Now let's zoom in on it and understand what each instruction is actually doing. Remember a Docker file
actually doing. Remember a Docker file is just a plain text file. Each line is an instruction. Docker executes them top
an instruction. Docker executes them top to bottom and the result is your image.
We covered from work dur copy and run in the layers section. The remaining
instructions are confusing to many devs. Environment sets environment variables.
These are available both during the build and when the container runs. Think
database URLs, API keys, feature flags.
Expose another culprit for confusion. It
doesn't actually open a port. It's
documentation. It tells anyone reading the Docker file and Docker tooling which port the application inside listens on.
You still need to publish the port at runtime with the -p flag. command. The
default command that runs when a container starts. The thing to know
container starts. The thing to know about command is that it doesn't run during the build. It's stored as metadata in the image and it's executed later when someone creates a container
from that image. You'll also see entry point and the difference between these two also confuses people. The short
version entry point defines the executable, the program your container runs. Command provides the default
runs. Command provides the default arguments. When someone overrides the
arguments. When someone overrides the command at runtime with docker run, they're replacing command not entry point. So if your entry point is node
point. So if your entry point is node and your command is server.js, running docker run myapp worker.js gives you nodeworker.js.
nodeworker.js.
The entry point stays. The arguments
change. You can override entry point 2 if you need to with the d- entry point flag, but you have to do it explicitly.
The point is that in normal usage, entry point is the stable part and command is the flexible part. In practice, if your container should always run the same program, use entry point. Use command
for the default arguments someone might want to change. There's also the build context to think about. When you run docker build dot, that dot tells docker
which directory to use as the build context, the set of files available for copy instructions. Docker's modern build
copy instructions. Docker's modern build engine is smart enough to only transfer files when they're actually needed, but it still scans the entire directory. And
if your copy instructions are broad, like copy dot dot, everything in that directory is fair game. If your project directory has a 2 GB data folder you don't need in the image, a broad copy
will still pull it in. This is why Docker ignore exists. It works just like ignore, excluding files from the build context entirely. A good Docker ignore
context entirely. A good Docker ignore can cut your build time dramatically.
So, at this point, you know how to define an image, build it efficiently, and run it as a container. But there's a problem we hinted at earlier that we still haven't solved. Remember what we
said earlier? Containers are processes,
said earlier? Containers are processes, and by default, when a container is removed, everything inside it is gone.
The right of a layer disappears. But
real applications need persistent data.
Databases need to store records.
Applications need to save uploads. Log
files need to survive restarts. That's
what volumes are for. A volume is a piece of storage that lives outside the container's life cycle. It's managed by Docker and it persists even after the container is removed. You can attach the
same volume to a new container and all the data is still there. There are two main ways to handle persistent storage.
Named volumes are fully managed by Docker. You create them. Docker decides
Docker. You create them. Docker decides
where they're stored on disk and you reference them by name. These are ideal for production data, database files, application stake, anything that needs to survive container replacements. Here
my data is the volume name and fordvarlib postgress data is where it gets mounted inside the container.
Postgress writes its data to that path and the data lives in the volume safe from container removal.
Bind mounts map a specific directory on your host machine directly into the container. These are what make Docker
container. These are what make Docker practical for development. Without them,
every time you change the line of code, you'd have to rebuild the entire image to see the result. Your local source folder is now visible inside the container at app source. Edit a file on
your laptop and the container sees the change immediately. No rebuild needed.
change immediately. No rebuild needed.
One gotcha to be aware of with bind mounts, when you bind mount a directory to a path inside the container, it completely obscures whatever was at that path in the image. If your image has
files at app source and you bind mount an empty directory there, those image files are hidden. They're still in the image layer. They're just covered up by
image layer. They're just covered up by the mount. Named volumes behave a bit
the mount. Named volumes behave a bit differently though. If you mount an
differently though. If you mount an empty named volume to a path that already has files in the image, docker actually copies those image files into the volume the first time. After that,
the volumes contents take over. This is
useful. It's how database images like Postgress seed their data directory. But
it only happens once on the first use of an empty volume. It's a subtle distinction, but it can be confusing when you expect bind mounts and named volumes to behave the same.
Containers are isolated. That's the
whole point. But that creates two problems you need to solve. First, how
does your host machine talk to a container? And second, how do containers
container? And second, how do containers talk to each other? Let's start with the first one. Remember, a container has its
first one. Remember, a container has its own isolated network name space. If your
app is listening on port 3000 inside the container, that port is only visible inside that container. Your host machine can't reach it. If you open localhost
3000 in your browser, nothing happens.
The -p flag creates a bridge. -p 3000
col 3000 means when something hits port 3000 on my host machine, forward it to port 3000 inside the container. The left
number is the host port. The right
number is the container port. And these
numbers don't have to match. -p8803000
means host port 8080 container port 3000. This is actually how you run
3000. This is actually how you run multiple copies of the same container.
They all listen on port 3000 internally, but you map each one to a different port on your host machine. Once you see it this way, the syntax makes sense. It's
always host container. That handles host to container. But the second problem is
to container. But the second problem is container to container. Your web server needs to talk to your database. Your API
needs to talk to your cache. Docker
solves this with networks.
When you install Docker, it creates a default network called bridge. Any
container you run gets attached to this network unless you say otherwise.
Containers on the same network can reach each other. There's a caveat though. On
each other. There's a caveat though. On
the default bridge network, containers can only reach each other by IP address.
That's fragile and inconvenient because container IPs can change. on a custom network one you create yourself. Docker
provides automatic DNS resolution.
Containers can reach each other by name.
Now inside the API container, you can connect to the database at DB5322.
Docker resolves the name DB to the correct container IP automatically. No
hard-coded addresses, no fragile configuration. And if you've ever
configuration. And if you've ever wondered how Docker Compose services find each other by name, this is exactly the mechanism. Compose creates a custom
the mechanism. Compose creates a custom network behind the scenes. We'll see
that in the next section. The default
behavior here isn't the best behavior.
Always create a custom network for your application. It gives you DNS, better
application. It gives you DNS, better isolation, and more control. Networks
also provide isolation between groups of containers. Your front-end containers
containers. Your front-end containers don't need to talk to your database directly. Put them on different
directly. Put them on different networks. The front end talks to the
networks. The front end talks to the back end. The backend talks to the
back end. The backend talks to the database and the front end can't reach the database at all. Clean separation.
At this point, you might be thinking, if I need to manage images, containers, volumes, networks, port mappings, and environment variables. That's a lot of
environment variables. That's a lot of Docker run flags. And you'd be right.
For anything beyond a single container, managing everything with CLI commands is miserable. That's the problem Docker
miserable. That's the problem Docker Compose solves. Docker Compose lets you
Compose solves. Docker Compose lets you define your entire application stack, every service, every network, every volume in a single YAML file. Look at
how readable this is. Two services, API and DB. The API builds from a local
and DB. The API builds from a local Docker file. The database uses the
Docker file. The database uses the official Postgres image. A named volume keeps the database data persistent.
Environment variables configure both services and depends on tells compose to start the database before the API.
Something to watch out for though in this short form depends on only controls startup order. It does not wait for the
startup order. It does not wait for the database to actually be ready yet.
Docker will start the Postgres container first and then immediately start the API. Even if Postgres hasn't finished
API. Even if Postgres hasn't finished initializing yet, your API might crash on its first connection attempt because the database isn't accepting
connections. The fix is to use the long
connections. The fix is to use the long form of depends on with condition service healthy. That way, Compose will
service healthy. That way, Compose will actually wait for a health check to pass before starting your API or more commonly build retry logic into your
application's database connection code so it can handle a brief delay. Either
way, don't assume depends on means ready. One command brings everything up.
ready. One command brings everything up.
One command tears everything down.
Compose automatically creates a custom network for your stack. So the API service can reach the DB service by name exactly like we discussed in the networking section. You don't need to
networking section. You don't need to configure that yourself. One thing worth understanding about compose, it's not a different system. It's not a separate
different system. It's not a separate orchestrator. It's just a convenient way
orchestrator. It's just a convenient way to run the same Docker build, Docker run, Docker network create, and Docker volume create commands you'd run
manually. It takes your declarative EML
manually. It takes your declarative EML file and translates it into those imperative Docker commands.
Understanding the underlying concepts we covered today means you'll always understand what Compose is doing on your behalf.
Let's zoom out. So, we saw that Docker isn't a set of commands to be memorized, but a system underneath to be understood. We saw why your container
understood. We saw why your container couldn't see a file. It has its own isolated file system. And why your data disappeared, the writable layer lives
and dies with the container. Why your
builds were slow. Instruction order
determines what gets cached. why your
services couldn't find each other because the default network doesn't do DNS. These aren't separate topics.
DNS. These aren't separate topics.
They're one connected model. Everything
else you encounter, multi-stage builds, health checks, orchestration with Kubernetes builds directly on these concepts. If this was useful, leaving a
concepts. If this was useful, leaving a like helps the video reach other developers. And if there's a topic you'd
developers. And if there's a topic you'd want a deeper dive on, the comments are the best place to tell us. Thanks for
watching and see you in the next
Loading video analysis...