Latency Numbers Programmer Should Know: Crash Course System Design #1
By ByteByteGo
Summary
## Key takeaways - **Relative Magnitudes Trump Exact Numbers**: It is not critical to know the exact numbers. Developing a sense of the relative orders of magnitude difference between these things is way more important. [00:15], [00:18] - **CPU Registers Sub-Nanosecond Access**: Accessing CPU registers is sub-nanosecond. It is super fast to access CPU registers, but there are very few of them. [01:22], [01:31] - **Main Memory 100-1000x Slower Than Registers**: For a modern processor like the Apple M1, referencing main memory is at the slow end of this range. In other words, main memory access on a modern CPU is a few hundred times slower than CPU register access. [02:02], [02:17] - **SSD Write 10x Slower Than Read**: The SSD write latency is about 10 times slower than the read latency, and it is at the top end of this range, taking close to a millisecond to write a page. [04:06], [04:15] - **Bcrypt Password Hash Takes 300ms**: It takes 300ms to bcrypt a password. It is slow enough to render brute force password cracking ineffective. [05:24], [05:32] - **1GB Cloud Transfer: 10 Seconds**: Transferring 1GB over the network within the same cloud region takes about 10 seconds. [05:59], [06:05]
Topics Covered
- Registers Beat Memory by Hundreds
- System Calls Cost Hundreds Nanoseconds
- Context Switches Take Microseconds
- SSDs Write 10x Slower Than Read
- Bcrypt Blocks Brute Force Effectively
Full Transcript
In this video, we hope to develop an intuition on some of the common latency numbers.
They could be very useful in system design.
It is not critical to know the exact numbers.
Developing a sense of the relative orders of magnitude difference between these things is way more important.
Some of these numbers like disk seek time have changed drastically as technology evolves, while others like network latency between countries stay pretty consistent because they have to obey the laws of physics.
We updated some of these numbers to more closely reflect reality in the 2020s.
But again, absolute accuracy is not the goal.
Developing an intuition of the relative differences is.
Here’s what we plan to do.
We will group the latency numbers by order of magnitude, starting with sub-nanoseconds, all the way up to seconds.
To lay the groundwork, let’s get a sense of what these time units are first.
1 nanosecond is 1 billionth of a second.
1 microsecond is 1 millionth of a second.
1 millisecond is 1 thousandth of a second.
So, here we go.
Let’s dive right in.
At the top is the sub-nanosecond range.
Accessing CPU registers is sub-nanosecond.
It is super fast to access CPU registers, but there are very few of them.
A clock cycle of a modern CPU is also in the sub-nanosecond range.
In the 1 to 10 ns range we have L1 and L2 cache accesses.
Some expensive CPU operations are also in this range.
Something like a branch mispredict penalty could cost up to 20 CPU clock cycles, which is also in this range.
The next range is 10 to 100ns.
L3 cache access is usually at the fast end of this range.
For a modern processor like the Apple M1, referencing main memory is at the slow end of this range.
In other words, main memory access on a modern CPU is a few hundred times slower than CPU register access.
The next range is from 100 to 1000 nanoseconds, or 1 microsecond.
The most useful thing to know in this range is the cost of a system call.
On Linux, making a simple system call takes several hundred nanoseconds.
This is just the direct cost of the trap into the kernel and back, it does not account for the cost of executing the system calls themselves.
It takes about 200ns to md5 hash a 64-bit number Next up, the 1 to 10 us range.
We’ve reached the level where things are about a thousand times slower than a CPU register access.
Context switching between Linux threads takes at least a few microseconds.
This is about the best-case scenario.
Depending on the workload, if the context switch involves bringing pages of data from memory for the new thread, it could take significantly longer.
To put it in perspective, copying 64KB from one main memory location to another also takes a few microseconds.
The next up is 10 to 100 microseconds.
At this level, things are slow enough that we can start to include some higher-level operations.
A network proxy like Nginx would take about 50 microseconds to process a typical HTTP request.
Reading 1MB of data sequentially from the main memory takes about 50 microseconds.
The read latency of the SSD is in this range, taking about 100 microseconds to read an 8K page.
The next range is 100 to 1000 microseconds or 1 millisecond.
This range has some interesting things.
The SSD write latency is about 10 times slower than the read latency, and it is at the top end of this range, taking close to a millisecond to write a page.
Intra-zone network round trip for modern cloud providers takes a few hundred microseconds.
This is one of the numbers updated for the 2020s.
These days they trend closer to the fast end, some even clocked in at less than 100 microseconds.
A typical Memcache or Redis get operation takes about 1 millisecond as measured by the client.
This includes the network round trip mentioned above.
Next up, 1 to 10 ms. Inter-zone network round trip of the modern cloud is in this range.
The seek time of the hard disk drive is about 5 milliseconds.
It takes time to move the arms. The next range is 10 to 100 ms. The network round trip between the US east and west coast, or the US east coast and Europe is in this range.
So is reading 1GB sequentially from main memory.
There are several interesting things in the 100 to 1000 ms range.
In one of our videos, we talked about using a slow hash function like bcrypt to encrypt a password.
It takes 300ms to bcrypt a password.
It is slow enough to render brute force password cracking ineffective.
TLS handshake is typically in the 250ms to 500ms range.
It adds several network round trips so the number depends on the distance between the machines.
The network round trip between the US west coast and Singapore is in this range.
Reading 1GB sequentially from an SSD is also in this range.
Lastly, here’s an example of something taking over a second.
Transferring 1GB over the network within the same cloud region takes about 10 seconds.
That’s all about latency numbers.
We hope you find them useful.
If you would like to learn more about system design, check out our books and weekly newsletter.
Please subscribe if you learned something new.
Thank you so much, and we’ll see you next time.
Loading video analysis...