System Design was HARD until I Learned these 30 Concepts

By Ashish Pratap Singh

Summary

## Key takeaways - **Client-Server Architecture: The Foundation**: Client-server architecture is the fundamental building block for most web applications, where a client (like a browser) sends requests to a continuously running server for data operations, and the server responds. This model relies on IP addresses for server identification, which are translated from human-friendly domain names via the DNS. [00:37], [01:43] - **Proxies: Intermediaries for Security and Performance**: Proxy servers act as intermediaries between users and the internet, hiding user IP addresses for privacy. Reverse proxies, on the other hand, manage incoming client requests and route them to backend servers, enhancing security and potentially improving performance. [02:12], [02:30] - **Latency and CDN: Bridging Geographical Gaps**: Latency, the delay caused by physical distance between users and servers, can significantly slow down applications. Deploying services across multiple data centers and utilizing Content Delivery Networks (CDNs) helps reduce latency by serving content from geographically closer servers. [02:43], [15:40] - **APIs: The Language of Server Communication**: APIs (Application Programming Interfaces) act as intermediaries allowing clients to communicate with servers without needing to understand low-level details. REST and GraphQL are popular API styles, with REST being stateless and resource-oriented, while GraphQL allows clients to request precisely the data they need in a single query. [04:12], [05:05] - **SQL vs. NoSQL: Choosing the Right Database**: Databases are crucial for managing application data. SQL databases are ideal for structured, consistent data like in banking systems, while NoSQL databases offer high scalability and flexible schemas for large-scale distributed data, with many applications using both. [06:38], [07:00] - **Scaling Strategies: Vertical vs. Horizontal**: Vertical scaling (scaling up) enhances a single server's power, but has limitations. Horizontal scaling (scaling out) distributes load across multiple servers, improving capacity and reliability, managed by load balancers that direct traffic and reroute requests if a server fails. [07:51], [08:25]

Topics Covered

GraphQL: When Over-fetching Makes REST Inefficient
Why Horizontal Scaling Beats Vertical for Reliability
Sharding: Scaling Databases for Massive Data Volumes
WebSockets vs. Polling: The Real-Time Efficiency Debate
Idempotency: Preventing Duplicate Actions in Distributed Systems

Full Transcript

If you want to level up from a junior

developer to a senior engineer or land a

high paying job at a big tech company,

you need to learn system design. But

where do you start? To master system

design, you first need to understand the

core concepts and fundamental building

blocks that come up when designing real

world systems or tackling system design

interview questions. In this video, I

will break down the 30 most important

system design concepts you need to know.

Learning these concepts helped me land

high paying offers from multiple big

tech companies. And in my 8 years as a

software engineer, I've seen them used

repeatedly when building and scaling

large scale systems. Let's get started.

Almost every web application that you

use is built on this simple yet powerful

concept called client server

architecture. Here is how it works. On

one side, you have a client. This could

be a web browser, a mobile app or any

other frontend application. And on the

other side, you have a server, a machine

that runs continuously waiting to handle

incoming request. The client sends a

request to store, retrieve or modify

data. The server receives the request,

processes it, performs the necessary

operations, and sends back a response.

This sounds simple, right? But there is

a big question. How does the client even

know where to find a server? A client

doesn't magically know where a server

is. It needs an address to locate and

communicate with it. On the internet,

computers identify each other using IP

addresses, which works like phone

numbers for servers. Every publicly

deployed server has a unique IP address.

Something like this. When a client wants

to interact with a service, it must send

request to the correct IP address. But

there's a problem. When we visit a

website, we don't type its IP address.

We just enter the website name. Right?

Instead of relying on hard to remember

IP addresses, we use something much more

human friendly, domain names. But we

need a way to map a domain name to its

corresponding IP address. This is where

DNS or domain name system comes in. It

maps easy to remember domain names like

algo masteraster.io io to their

corresponding IP addresses. When you

type algo masteraster.io into your

browser, your computer asks a DNS server

for the corresponding IP address. Once

the DNS server responds with the IP,

your browser uses it to establish a

connection with the server and make a

request. You can find the IP address of

any domain name using the ping command.

When you visit a website, your request

doesn't always go directly to the

server. Sometimes it passes through a

proxy or reverse proxy first. A proxy

server acts as a middleman between your

device and the internet. When you

request a web page, the proxy forwards

your request to the target server,

retrieves the response, and sends it

back to you. A proxy server hides your

IP address, keeping your location and

identity private. A reverse proxy works

the other way around. It intercepts the

client request and forwards them to the

back end server based on predefined

rules. Whenever a client communicates

with a server, there is always some

delay. One of the biggest cause of this

delay is physical distance. For example,

if our server is in New York, but a user

in India sends a request, the data has

to travel halfway across the world and

then the response has to make the same

long trip back. This roundtrip delay is

called latency. High latency can make

applications feel slow and unresponsive.

One way to reduce latency is by

deploying our service across multiple

data centers worldwide. This way, users

can connect to the nearest server

instead of waiting for data to travel

across the globe. Once a connection is

made, how do clients and servers

actually communicate? Every time you

visit a website, your browser and the

server communicate using a set of rules

called HTTP. That's why most URLs is

start with HTTP or it secure version

HTTPS. The client sends a request to the

server. This request includes a header

containing details like the request

type, browser type, and cookies and

sometimes a request body which carries

additional data like form inputs. The

server processes the request and

responds with an HTTP response either

returning the requested data or an error

message if something goes wrong. HTTP

has a major security flaw. It sends data

in plain text. Modern websites use

HTTPS. HTTPS encrypts all data using SSL

or TLS protocol ensuring that even if

someone intercepts the request, they

can't read or alter it. But clients and

servers don't directly exchange raw HTTP

requests and response. HTTP is just a

protocol for transferring data, but it

doesn't define how requests should be

structured, what format responses should

be in, or how different clients should

interact with the server. This is where

APIs or application programming

interfaces come in. Think of an API as a

middleman that allows clients to

communicate with servers without

worrying about low-level details. A

client sends a request to an API. The

API hosted on a server processes the

request, interacts with databases or

other services, and prepares a response.

The API sends back the response in a

structured format, usually JSON or XML,

which the client understands and can

display. There are different API styles

to serve different needs. Two of the

most popular ones are REST and GraphQL.

Just a quick note to keep this video

concise, I'm covering these topics at a

high level, but if you want to go deeper

and learn these topics in more detail,

check out my blog at

blog.algammaster.io. Every week I

publish in-depth articles on complex

system design topics with clear

explanations and real world examples.

Make sure to subscribe so that you don't

miss my new articles. Among the

different API styles, REST is the most

widely used. A REST API follows a set of

rules that defines how clients and

servers communicate over HTTP in a

structured way. REST is stateless. Every

request is independent. Everything is

created as a resource. For example,

users, orders, products. It uses

standard HTTP methods like get to

retrieve data, post to create new data,

put to update existing data, and delete

to remove data. Rest APIs are great

because they are simple, scalable, and

easy to cast, but they have limitations,

especially when dealing with complex

data retrieval. REST endpoints often

return more data than needed, leading to

inefficient network uses. To address

these challenges, GraphQL was introduced

in 2015 by Facebook. Unlike REST,

GraphQL lets client ask for exactly what

they need. Nothing more, nothing less.

With a REST API, if you need a user's

profile along with their recent post,

you might have to make multiple requests

to different endpoints. With GraphQL,

you can combine those requests into one

and fetch exactly the data you need in a

single query. The server responds with

only the requested fields. However,

GraphQL also comes with trade-offs. It

requires more processing on the server

side, and it isn't as easy to cast as

REST. Now when a client makes a request

they usually want to store or retrieve

data. But this brings up another

question where is the actual data

stored. If our application deals with

small amounts of data we could store it

as a variable or as a file and load it

in memory. But modern applications

handle massive volumes of data far more

than what memory can efficiently handle.

That's why we need a dedicated server

for storing and managing data. A

database. A database is the backbone of

any modern application. It ensures that

data is stored, retrieved and managed

efficiently while keeping it secure,

consistent and durable. When a client

request to store or retrieve data, the

server communicates with the database,

fetches the required information and

returns it to the client. But not all

databases are the same. In system

design, we typically choose between SQL

and NoSQL databases. SQL databases store

data in tables with a strict predefined

schema and they follow ACIT properties.

Because of these guarantees, SQL

databases are ideal for applications

that require strong consistency and

structured relationships such as banking

systems. NoSQL databases on the other

hand are designed for high scalability

and performance. They don't require a

fixed schema and use different data

models including key value stores,

document stores, graph databases, and

wide column stores which are optimized

for large scale distributed data. So,

which one should you use? If you need

structured relational data with a strong

consistency, SQL is the better choice.

If you need high scalability, flexible

schema, NoSQL is the better choice. Many

modern applications use both SQL and

NoSQL together. As our user base grows,

so does the number of requests hitting

our application servers. One of the

quickest solutions is to upgrade the

existing server by adding more CPU, RAM

or storage. This approach is called

vertical scaling or scaling up which

makes a single machine more powerful.

But there are some major limitations

with this approach. You can't keep

upgrading a server forever. Every

machine has a maximum capacity. More

powerful servers become exponentially

more expensive. If this one server

crashes, the entire system goes down.

So, while vertical scaling is a quick

fix, it's not a long-term solution for

handling high traffic and ensuring

system reliability. Let's look at a

better approach, one that makes our

system more scalable and fall tolerant.

Instead of upgrading a single server,

what if we add more servers to share the

load? This approach is called horizontal

scaling or scaling out where we

distribute the workload across multiple

machines. More servers is equal to more

capacity which means the system can

handle increasing traffic more

effectively. If one server goes down,

others can take over which improves

reliability. But horizontal scaling

introduces a new challenge. How do

clients know which server to connect to?

This is where a load balancer comes in.

A load balancer sits between clients and

backend servers acting as a traffic

manager that distributes request across

multiple servers. If one server crashes,

the load balancer automatically

redirects traffic to another healthy

server. But how does a load balancer

decide which server should handle the

next request? It is a load balancing

algorithm such as roundroin, least

connections and IP hashing. So far we

have talked about scaling our

application servers. But as traffic

grows, the volume of data also

increases. At first we can scale a

database vertically by adding more CPU,

RAM and storage similar to application

servers. But there is a limit of how

much a single machine can handle. So

let's explore other database scaling

techniques that can help manage large

volumes of data efficiently. One of the

quickest and most effective ways to

speed up database read queries is

indexing. Think of it like the index

page at the back of a book. Instead of

flipping through every page, you jump

directly to the relevant section. A

database index works the same way. It's

a super efficient lookup table that

helps the database quickly locate the

required data without scanning the

entire table. An index is stores column

values along with pointers to actual

data rows in the table. Indexes are

typically created on columns that are

frequently queried such as primary keys,

foreign keys, and columns frequently

used in rare conditions. While indexes

speed up reads, they slow down rights

since the index needs to be updated

whenever data changes. That's why we

should only index the most frequently

accessed columns. Indexing can

significantly improve read performance.

But what if even indexing isn't enough

and our single database server can't

handle the growing number of read

request? That's where our next database

scaling technique, replication, comes

in. Just like we added more application

servers to handle increasing traffic, we

can scale our database by creating

copies of it across multiple servers.

Here is how it works. We have one

primary database also called the primary

replica that handles all write

operations. We have multiple read

replicas that handle read queries.

Whenever data is written to primary

database, it gets copied to read

replicas so that they stay in sync.

Replication improves the read

performance since read requests are

spread across multiple replicas reducing

the load on each one. This also improves

availability since if the primary

replica fails, a read replica can take

over as the new primary. Replication is

great for scaling read heavy

applications. But what do we need to

scale right operations or store huge

amounts of data? Let's say our service

became popular. It now has millions of

users and our database has grown to

terabytes of data. A single database

server will eventually struggle to

handle all this data efficiently.

Instead of keeping everything in one

place, we split the database into

smaller, more manageable pieces and

distribute them across multiple servers.

This technique is called sarding. Here

is how it works. We divide the database

into smaller parts called Sards. Each

sard contains a subset of the total

data. Data is distributed based on the

sarding key. For example, user ID. By

distributing data this way, we reduce

database load since each SA handles only

a portion of queries and speed of read

and write performance since queries are

distributed across multiple SS instead

of hitting a single database. Singing is

also referred to as horizontal

partitioning since it splits data by

rows. But what if the issue isn't the

number of rows but rather the number of

columns? In such cases, we use vertical

partitioning where we split the database

by columns. Imagine we have a user table

that stores profile details, login

history, and billing information. As

this table grows, queries become slower

because the table must scan many columns

even when a request only needs a few

specific fields. To optimize this, we

use vertical partitioning where we split

user table into smaller, more focused

tables based on users patterns. This

improves query performance since each

request only scans relevant columns

instead of the entire table. It also

reduces unnecessary disk IO making data

retrieval quicker. However, no matter

how much we optimize the database,

retrieving data from disk is always

slower than retrieving it from memory.

What if we could store frequently access

data in memory? This is called caching.

Caching is used to optimize the

performance of a system by storing

frequently access data in memory instead

of repeatedly fetching it from database.

One of the most common caching

strategies is the cash aside pattern.

Here is how it works. When a user

requests the data, the application first

checks the C. If the data is in the

cache, it's returned instantly avoiding

a database call. If the data is not in

the C, the application retrieves it from

the database. It stores it in the C for

future request and returns it to the

user. Next time the same data is

requested, it's served directly from C,

making the request much faster. To

prevent outdated data from being served,

we use time to live value or TTL. Let's

look at next database scaling technique.

Most relational database use

normalization to store data efficiently

by breaking it into separate tables.

While this reduces redundancy, it also

introduces joins. When retrieving data

from multiple tables, the data must

combine them using join operations,

which can slow down queries as the data

set grows. Denormalization reduces the

number of joints by combining related

data into a single table, even if it

means some data gets duplicated. For

example, instead of keeping users and

orders in a separate table, we create

user orders table that stores user

details along with the latest orders.

Now, when retrieving a user's order

history, we don't need a join operation.

The data is already stored together

leading to faster queries and better

read performance. Denormalization is

often used in read heavy applications

where speed is more critical. But the

downside is it leads to increased

storage and more complex update request.

As we scale our system across multiple

servers, databases and data centers, we

enter the world of distributed systems.

One of the fundamental principles of

distributed systems is the cap theorem

which states that no distributed system

can achieve all three of the following

at the same time. Consistency,

availability, and partition tolerance.

Since network failures are inevitable,

we must choose between consistency plus

partition tolerance or availability plus

partition tolerance. If you want to

learn about cap theorem in more detail,

you can check out this article on my

blog called Cap theorem explained. Most

modern applications don't just store

text record. They also need to handle

images, videos, PDFs, and other large

files. Traditional databases are not

designed to store large unstructured

files efficiently. So, what's the

solution? We use blob storage like

Amazon S3. Blobs are like individual

files like images, videos, or documents.

These blobs are stored inside logical

containers or buckets in the cloud. Each

file gets a unique URL making it easy to

retrieve and serve over the web. There

are several advantages with using blob

storage like scalability, pay as you go

model, automatic replication, easy

access. A common use case is to stream

audio or video files to user

applications in real time. But streaming

the video file directly from blob

storage can be slow, especially if the

data is stored in a distant location.

For example, imagine you are in India

trying to watch a YouTube video that's

hosted on a server in California. Since

the video data has to travel across the

world, this could lead to buffering and

slow load times. A content delivery

network or CDN solves this problem by

delivering content faster to users based

on their location. A CDN is a global

network of distributed servers that work

together to deliver web content like

HTML pages, JavaScript files, images,

and videos to users based on their

geographic location. Since content is

served from the closest CDN server,

users experience faster load times with

minimal buffering. Let's move to the

next system design concept which can

help us build realtime applications.

Most web applications use HTTP which

follows a request response model. The

client sends a request. The server

processes the request and sends a

response. If the client needs new data,

it must send another request. This works

fine for static web pages but it's too

slow and inefficient for real-time

applications like live chat

applications, stock market dashboards or

online multiplayer games. With HTTP, the

only way to get real-time update is

through frequent polling, sending

repeated request every few seconds. But

polling is inefficient because it

increases the server load and waste

bandwidth. As most responses are empty

when there is no new data, webockets

solve this problem by allowing

continuous two-way communication between

the client and the server over a single

persistent connection. The client

initiates a websocket connection with

the server. Once established, the

connection remains open. The server can

push updates to the client at any time

without waiting for a request. The

client can also send messages instantly

to the server. This enables real-time

interactions and eliminates the need for

polling. Webockets enables real-time

communication between a client and a

server. But what if a server needs to

notify another server when an event

occurs? For example, when a user makes a

payment, the payment gateway needs to

notify your application instantly.

Instead of constantly pulling an API to

check if an event has occurred, web

hooks allow a server to send an HTTP

request to another server as soon as the

event occurs. Here is how it works. The

receiver, for example, your app

registers a web hook URL with the

provider. When an event occurs, the

provider sends a HTTP post request to

the web hook URL with event details.

This saves server resources and reduces

unnecessary API calls. Traditionally

applications were built using a

monolithic architecture where all

features are inside one large codebase.

This setup works fine for small

applications but for large scale systems

monoliths become hard to manage, scale

and deploy. The solution is to break

down your application into smaller

independent services called

microservices that work together. Each

microser handles a single responsibility

has its own database and logic so it can

scale independently. communicates with

other microservices using APIs or

message cues. This way, services can be

scared and deployed individually without

affecting the entire system. However,

when multiple microservices need to

communicate, direct API calls aren't

always efficient. This is where message

cues come in. Synchronous communication,

for example, waiting for immediate

responses doesn't scale well. A message

Q enables services to communicate

asynchronously, allowing requests to be

processed without blocking other

operations. Here is how it works. There

is a producer which places a message in

the queue. The queue temporarily host

the message. The consumer retrieves the

message and processes it. Using message

cues, we can decouple services and

improve the scalability and we can

prevent overload on internal services

within our system. But how do we prevent

overload for the public APIs and

services that we deploy? For that we use

rate limiting. Imagine a bot starts

making thousands of requests per second

to your website. Without restrictions,

this could crash your servers by

consuming all available resources and

degrade performance for legitimate

users. Rate limiting restricts the

number of requests a client can send

within a specific time frame. Every user

or IP address is assigned a request

kota, for example, 100 requests per

minute. If they exceed this limit, the

server blocks additional requests

temporarily and returns an error. There

are various rate limiting algorithms.

Some of the popular ones are fix window,

sliding window, and token bucket. We

don't need to implement our own rate

limiting system. This can be handled by

something called an API gateway. An API

gateway is a centralized service that

handles authentication, rate limiting,

logging, monitoring, request routing,

and much more. Imagine a

microservices-based application with

multiple services. Instead of exposing

each service directly, an API gateway

acts as a single entry point for all

client request. It routes the request to

the appropriate microser and the

response is sent back through the

gateway to the client. API gateway

simplifies API management and improves

the scalability and security. In

distributed systems, network failures

and service retries are common. If a

user accidentally refreshes a payment

page, the system might receive two

payment request instead of one. Adam

potency ensures that repeated request

produced the same result as if the

request was made only once. Here is how

it works. Each request is assigned a

unique ID. Before processing, the system

checks if the request has already been

handled. If yes, it ignores the

duplicate request. If no, it processes

the request normally. If you enjoyed

this video, I think you will love my

weekly newsletter where I dive deeper

into system design concepts with real

world examples. I also share articles on

system design interview questions and

tips to help you prepare for interviews.

You can subscribe it at

blog.algamaster.io. Thanks for watching

and I will see you in the next

Loading...

Loading video analysis...