Caching in System Design Interviews w/ Meta Staff Engineer

By Hello Interview - SWE Interview Preparation

Summary

## Key takeaways - **Cache Speed Disparity**: Accessing data from disk like an SSD in a database takes about a millisecond on average, while accessing from RAM takes about 100 nanoseconds, roughly 10,000 times faster, and this gap adds up quickly when serving thousands of requests per second. [01:17], [01:25] - **External vs In-Process Caching**: External caching uses a dedicated service like Redis or Memcached on its own server, allowing multiple application servers to share it and avoid redundant database hits, while in-process caching stores data directly in the application's memory for the fastest access without network hops but lacks sharing across servers, leading to potential inconsistencies. [02:03], [03:18] - **Cache-Aside Architecture Default**: In cache-aside, the application checks the cache first; on a hit, it returns data instantly, and on a miss, it fetches from the database, stores a copy in the cache, and returns it, keeping the cache lean by only storing requested data but adding latency on misses. [09:08], [09:31] - **Thundering Herd Problem**: A cache stampede or thundering herd occurs when a popular cache entry expires, causing a flood of requests to simultaneously rebuild it and overwhelm the database, as in a homepage feed with 100,000 requests per second expiring after 60 seconds; prevent it with request coalescing where only one request rebuilds while others wait, or cache warming by proactively refreshing just before expiration. [18:04], [19:14] - **Cache Consistency Strategies**: Cache consistency issues arise when the cache holds stale data after database updates, like an old profile picture served despite a recent change; mitigate by invalidating the cache key on write to force a fresh fetch on the next read, using short TTLs for frequently changing data, or accepting eventual consistency for non-critical items like feeds with a 5-minute TTL. [20:10], [21:20] - **Handling Hot Keys**: Hot keys like Taylor Swift's profile on Twitter receive millions of requests per second, bottlenecking a single cache node despite good overall hit rates; solve by replicating the hot key across multiple cache shards for load balancing or using local in-process caching as a fallback to avoid hitting the external cache repeatedly. [23:00], [24:00]

Topics Covered

Why cache memory over disk?
Does in-process caching beat external?
Cache-aside simplifies writes?
Hotkeys bottleneck even caches?
Justify cache before adding it?

Full Transcript

Hey everyone, welcome back to the channel.

For those of you who are new, I'm Evan.

I'm a former Meta Staff engineer and the current co-founder of Helloinview.

com. If you're preparing for software interviews, head over to Hello Interview.

We got everything you need, overwhelming majority of which is free.

Uh, in this particular video though, we're going to be covering the basics of caching and specifically in the context of system design interviews.

And so, we'll look at where caching fits into a system.

We'll talk about the most common caching architectures, typical eviction policies you'll want to understand.

We're also going to talk about the kinds of issues that show up when you introduce cashing.

These are things like consistency stampedes hotkeys the things that interviewers really love to dig into and that you'll want to be prepared for.

And then finally, we'll go over how to talk about cashing in an interview, when to bring it up, how much depth you should go you should go into, and the things that interviewers are usually looking for. So, this should be fun.

Without further ado, let's get after it.

Chances are you already have a pretty decent idea of what caching is.

Uh but let's go ahead and start from just the very basics really quickly to make sure that everybody's on the same page.

And so a cache is quite simply just a temporary storage that keeps recently used data handy and close by so that you can fetch it faster the next time.

So to see why this matters, let's take a look at an example here and consider the difference in speed between where data usually lives in a database and where it can live in a cache.

And so accessing data from disk like an SSD in the case of a database takes about a millisecond on average.

Accessing data from memory or RAM on the other hand takes about a 100 nonds. This is roughly 10,000 times faster. Now that gap adds up really quickly when you're serving thousands of requests per second.

And caching takes advantage of that big difference.

It keeps copies of frequently used data in a faster layer, oftentimes memory, but not always.

We'll talk about that later on so that systems don't have to reach all the way back into that slower source every single time.

So you have the basic idea.

Caching trades a bit of storage and complexity for speed. Now the next question is where should you cach your data?

And there's a few different layers in your system where caching can live.

Each of which have their own set of trade-offs of course. And so the first one and by far the most common in system design interviews is called external caching.

This is where you introduce a dedicated caching service like Reddus or memcache.

It runs importantly on its own server and manages its own memory and it's totally separate from your application or your database, right?

It's your own component in the system here.

And so when your application needs data, it first checks the cache.

If the data is found there, that's a cache hit and it returns your data instantly, super fast.

If it's not there, we call that a cache miss and it has to fall back to the database, fetch the data and it stores a copy of that data back in the cache and also returns it back to the client.

Right? Uh now the nice thing here is that in a scaled system which might have multiple application servers like we represented here, all of these different application servers can share that same external cache. This way once one server has fetched and cached the data, the others can all reuse it instantly instead of all hitting the database separately, right?

because this is a global view. It's a global cache that's shared by all of the different application servers.

Now the second option is what we call inprocess caching and this lets you skip the complexity of adding something like Reddus entirely.

uh it's often overlooked to be honest and I think in real system design outside of the context of interviews is probably overlooked more than it should be but it can be incredibly effective.

Now the key thing to note here is that modern application servers usually run on really big machines nowadays that have plenty of memory and you can actually use some of that memory in order to cache data right inside of the process.

Uh this is important because it's by far the fastest kind of caching, right?

You don't have to go and have an expensive network hop here to hit some external cache anymore. The data is already sitting in the same memory space as your application, right? So you don't have that expensive network hop.

It's already right there where you want it.

But of course, this this comes with trade-offs.

And the main trade-off is that unlike the external cache, each application server has its own inprocess memory.

And so this means that if one server caches something, the others won't see it. So you can end up with these inconsistencies or even wasted memory if you're not too careful.

And so the chances are at least in the context of a system design interview that you probably won't need to bring this up unless you're talking about a low-level optimization or have a use case where ultra low latency matters.

For example, if you need to cache config data or small lookup tables that every single request depends upon, then caching within the application server makes sense.

But your default should remain as external caching.

Next up, we have what are called CDNs or content delivery networks.

And so a CDN is a geographically distributed network of servers that can cache content closer to your users.

And I know that that sounds fancy and fancier really than it is.

It's just putting servers around the world so that they're close to the people who need them. Here, we're not optimizing for the difference between memory and disk speeds like we were before.

Instead, we're optimizing for network latency.

So without a CDN, every single request has to travel all the way to your origin server. If you had a server in Virginia, like was the case here, think that this is S3 or some blob storage, right?

And your user is all the way over here in Australia, then this round trip could take 300 to 350 milliseconds.

That's huge, especially when we're talking about disk access being just 1 millisecond. With a CDN on the other hand, that same request might hit an edge server that's just a few miles away, which may be 20 to 40 milliseconds roundtrip, which is a huge difference.

Now, the way that it works is that when a user requests something like an image, that request goes to the nearest CDN edge server, like we just said.

And if that image is already cached there, it's returned immediately.

Perfect. That's the happy case.

That's the cache hit. If not, if it's a cache mix, then the CDN itself goes and fetches that media or whatever you're looking for from uh your origin server like S3 or whatever blob storage you have, and then it's going to return that back to the CDN. The CDN will then cache it so that it has it for next time and return it back to the client.

Now, modern CDNs, and this is something people oftentimes forget, they can do a lot more than just cache static media, which is what they're most known for.

They can also cache public API responses, of course, HTML pages, run edge logic even to personalize content.

But as far as a system design interview goes, the most common and the most impactful use case to bring up is on media delivery.

So things like images, videos, or static assets, files, etc. that you want to load really quickly around the world. And so if you have global users who are accessing media regularly, then a CDN is probably a great fit for you.

Lastly, we have what's called client side caching.

So, this is when data is stored directly on the user's device, either in the browser or the app, which avoids unnecessary network calls.

So, in web apps, that might be something like the HTTP cache or local storage within the browser itself. For mobile apps, native mobile apps, this could be data kept in memory or even written to the local disk on device. And it's nice because it's obviously super fast.

the data never leaves the device, but it comes with a downside, of course, and that's that you have less control over it.

Data can go stale, validation, freshness, all of that is a bit harder.

And so, when it comes to your interview, you'll see this come up a lot less often.

Um, usually it's only relevant when your system involves some offline functionality or client heavy workloads.

For example, if a browser is reusing images it already downloaded or an app like Strava caching your run data locally while you're offline and then syncing it once you're reconnected.

We have a problem breakdown where we do exactly that.

But for all intents and purposes, this is the least important for you to know as it pertains to your system design interviews.

Really quick before we continue, let me tell you a little bit about the most popular feature on hello.com, guided practice.

It's an interactive tool that lets you practice system design interviews stepbyep using that hello interview delivery framework.

You're going to walk through everything from the non-functional requirements to the core entities, API routes, all the way through to your highle design and deep dives, drawing on the whiteboard and narrating your response, all while getting real-time feedback on what you're doing well and where you can improve by a model that Stephan and I have spent hundreds of hours tuning.

We've expanded the library to 25 of the most common system design interview questions now, and we're constantly adding more.

So, candidates absolutely love this feature. I think you will, too.

Check it out at hello.com.

Link will be in the description.

Now that we've covered where you can cache your data, let's talk a little about cache architectures.

Cache architectures just define how your application actually interacts with the cache.

And so specifically, this is defining the order in which reads and writes happen between both your cache and your database and your application service. And so the most common caching pattern by far and this is the one that you should default to in your interview uh is cash aside.

And so cache aside, what happens is that the application checks the cache first.

This is the one we brought up earlier, right?

If the data is there, it returns it.

That's a cache hit. Great.

If not, it goes and fetches it from the database and then it stores it in the cache and returns it to the user. And so cach aside is great because it keeps the cache really lean. You only cache data when you actually need it, right?

We're only caching the things that users actually requested.

If a user never requested anything, it never made it into the cache. But the downside is that a cache miss is going to add that latency since the request then has to go hit the database, request the data, the more expensive operation, store it in the cache, and return it back to the user.

Right? So, if you're only going to remember one caching architecture from this video, make it cash aside.

This is the one you're probably going to use in the interview, but it's important for us to talk about some of the other ones as well.

Starting with right through caching.

And so with write through, the application actually writes directly to the cache first and then the cache synchronously writes that data to the database before it returns to the user.

And so the write isn't considered complete until both the cache and the database have been updated.

In practice, this means that you need a caching library or a framework for this part right here that supports this right through behavior.

something that knows to trigger your database write logic automatically because tools like Reddus or Memcache, they don't natively support this.

And so you need to either handle this logic yourself in your application code by writing to both at the same time and there's all sorts of complications that come from that or you'll want to use a library something like Spring Cache or Hazelcast which can automatically do that right through for you.

Um, but the trade-off here becomes pretty obvious and that's that you have slower writes because you need to wait for both of these to happen.

Um, and then also you pollute your cache with all this data that may never be read again.

If we write everything to our cache, then it might be data that nobody ever actually accesses and we're just bloating our cache for no reason unlike cash aside that we talked about a moment ago, right?

Um, and then write through also suffers from what's called the dual write problem.

This would be that if the cache update succeeds but the database write fails or vice versa then the two enter an inconsistent state and so you would need fancy retry logic error handling all of these things to deal with this but in a distributed system this perfect consistency is incredibly hard to achieve. So there's all sorts of complications here.

Now what should you do in a system design interview?

Well, the reality is that right through is much less common than cache aside because it requires all the specialized infrastructure and has all these tricky edge cases around consistency.

And so you'd really only bring it up when reads must always return the fresh data and your system can tolerate some slightly slower writes.

If that's the case, this could be a good fit. But if your mind is going in this direction, really be sure that you can convince yourself that cash aside or some other uh designed system doesn't already satisfy your use case.

Next is right behind caching, also called write back sometimes.

And it's really similar to writethrough, which we just discussed, but instead of updating the database synchronously, the cache writes to the database asynchronously in the background.

And so the application only writes to the cache, just like we did for writethrough, but then the cache flushes those updates to the database, usually in batches later on. And so this makes writes much faster than writeth through.

We solve that problem, but it introduces new risk. And that's that if the cache were to crash or fail before this flush, then we would have data loss.

You would use this only when high write throughput is more important than immediate consistency or close to immediate consistency.

And so, for example, something like analytics or metric pipelines where some occasional data loss might be acceptable.

But here's the reality. While this pattern is useful to know and if you are an expert in caching and you have a use case for it, by all means use it.

If you're a novice, I wouldn't suggest using this unless you can strongly justify it.

There are other ways to solve the problems that we just discussed.

Um, and you're probably introducing or opening an opportunity for more questions than your interviewer than you would like. So, my honest suggestion to you is probably avoid it.

The lesson we'll discuss is read through caching.

It's incredibly similar to cache aside, the very first one we talked about, except that the cache handles the database lookup instead of the application.

And so on a cache miss whereas before the application server went and read from the database and updated the cache and returned the value.

Now the cache itself is going to do that.

So we try to read from the cache if we miss the cache goes and requests the data from the database stores it in the cache and returns it back to the application server.

You can basically think about it like cach aside but with the cache acting as a proxy.

And this is essentially how CDN's work right as we discussed a moment ago.

when it when you have a CDN miss, it fetches from the origin server, caches the results, and then serves it the next time.

Um, now, as far as system design interviews go, again, you'd really only bring this up in the context of CDNs or edge caching for most application level caching.

Cache aside is still the best default because it doesn't require a special framework or caching library to handle this part right here, right?

So, you can just use Reddis, memcache, something simple, um, and not have to have some adapter right there.

So, zooming out now. Let's see if we can fit all four of them in there. There we go.

We got cash aside, right through, right behind, and read through. And if you're anything like me, you're probably thinking, "These names are super confusing.

How am I supposed to remember all of them?" And I have good news for you.

You don't have to.

Interviewers don't care if you remember these exact names or exact terms. What matters is just that you can describe the behavior clearly.

And so, if you forget cache aside, no problem. Just say, "I'll check the cache first, and if it's not there, I'll go to the database and then update the cache.

" That's really all we as interviewers are looking for.

And so it's always better to show that you understand how caching works than to just be able to memorize any of these terms. So don't overwhelm yourself.

Understand how cash aside works.

Most importantly, be able to describe it.

Don't stress yourself on the naming.

Next up, let's talk a little bit about cash eviction policies. So one trade-off of storing data in memory instead of on disk is that memory is limited.

You can't fit your whole data set in the cache.

At least you usually can't.

And so you need a strategy for deciding what to keep and what to remove as new data comes in.

And that's exactly what your eviction policies do.

They determine which items stay in the cache and which ones get replaced when it fills up.

And so cash eviction policies are really straightforward.

And so we're not going to make them any more complicated than they need to be. There are four that you should know about. Three of which you might actually use in an interview.

One is just good to know because of its simplicity.

And so first we have least recently used or LRU. And just like the name suggests, this eviction policy evicts items that haven't been used very recently.

In practice, it's often implemented with a linked list or maybe a priority cue that tracks the access order, but you're rarely going to need to discuss the level of detail that level of detail in a system design interview.

Implementation for eviction policies is almost always out of scope, right?

And so the second one that you have here is least frequently used LFU.

really similar idea, but instead of evicting based on recency, it's based on how often something is accessed.

And so the least frequently used items are evicted first, even if they were used recently, right?

So even if it was just used once and that just used once was a second ago, it's going to get evicted because it was only used one time compared to everything else that that might have been accessed more frequently.

Um, third is first in first out.

This is that simple one, right?

It's exactly like what it sounds like.

your oldest item gets removed to make space for your newest item.

It's dead simple and it's rarely the right choice in in a system design interview.

Um, and then lastly, you have time to live or TTL.

And so here, each cached item has an expiration time. And once that time passes, say like 5 minutes, then the cache will automatically remove it.

This is great for data that can go stale like user sessions or API responses or any of that.

And so in system design interviews, least recently used is the most common and oftent times the default.

Um, least frequently used makes sense when your access pattern is highly skewed, meaning there's a few items that are read way more often than others.

And then TTL is super common as it's perfect for when freshness really matters more than recency or frequency.

So far, caching sounds pretty straightforward.

We store data in a faster layer.

We read it when we can, and everything just gets quicker.

But in practice, adding cash introduces a whole new set of challenges.

And they're challenges that interviewers love to ask about.

And so there's this famous saying in computer science that some of you might have heard, and it's that there's only two hard problems in computer science.

Naming things and cash invalidation.

And that second one exists for a reason.

Once you start caching, new problems start to show up. Things like uneven loads, stale data, or unexpected spikes in traffic.

And so we're going to go through a few of the most common issues that you should be ready to talk about in an interview. the ones that interviewers will kind of be the most likely to maybe probe into. And so let's start with what's called a cash stampede or oftentimes referred to as a thundering herd.

And so this happens when a popular cache entry expires via that TTL that we discussed a moment ago.

And suddenly a flood of requests all try to rebuild that cache at the same time.

And so even if that window lasts just a second, every single one of those cache misses is going to hit our database, turning one query into thousands or even millions and ultimately overwhelming the database.

And so let me give a concrete example.

Imagine that you have some website for which you cache the homepage feed with a TTL of 60 seconds, right?

You don't want to cach it too long because you don't want it to get too stale.

Um so 60 seconds is what you choose.

And then we get a 100,000 requests every second. And so all 100,000 requests because all users need that home feed, they hit us in the cache and everything works great. But after 60 seconds, it expires. And what happens when it expires in the case of cache aside is that we're then going to go ask for it from the database and update the cache.

But in that moment, a 100,000 requests come into the cache, they all fail, and then they all go try to hit the database, and it could take the database down, overwhelm it, and cause cascading failures.

Now, there's two common ways that you can prevent this.

The first is what's called request coalescing or single flight.

Both fairly fancy names, but the idea is actually really simple.

When a request tries to rebuild the same cache key or when multiple requests try to rebuild the same cache key, then only the first one should work and the rest of them should just wait for the results to come in and then read from the cache. So that's the first most common way to handle it.

The second, which is actually maybe equally as common, I'll contradict myself, is called cache warming. And the way that this works is that instead of waiting for popular keys to expire, waiting that full 60 seconds, you can proactively refresh them just before they do.

And so say at the 55 second mark, we could come in here and refresh the feed.

Um, thus giving it another 60 seconds and essentially preventing it from ever actually expiring, right? We just keep refreshing it every 55 seconds so that it remains fresh and not stale.

Um, but it never expires and causes this thundering herd.

Next up is cache consistency.

And this is probably actually the most common issues that interviewers like to ask about when caching comes up. And this happens because the cache and the database could return different values for the same data.

And it's really easy for this to occur, right?

Because most systems read from the cache, but they write to the database.

And so this creates this short window depending on your eviction policy where you can have stale data in the cache.

And let me give you a concrete example to make this click.

Imagine that you have some social network and a user updates their profile picture.

And so that new value is written to the database, but that old one is still sitting out there in the cache.

And so now other users who are requesting are hitting the cache and they're getting image two.

Oh, that's bad.

Copy and paste. Actually, let me update that.

Bang. Right. Or image three. Wait, no, I want one image one. Silly. Okay.

[snorts] They're getting the old one, the original one. Image one, despite the fact that we had already updated the database to image two. and they're going to continue to read this value until this ends up being evicted for some reason.

Right? Now, there's no perfect fix to this. It very much depends on how fresh your data needs to be. And that is a case- by case basis, right?

But there's some common strategies in order to answer the question if your if your interviewer asks about this or if this ends up being an issue. And so, the first is to invalidate on write.

And so if consistency is really important here, then when that profile picture came in and we updated it in the database, then we could also go delete that key proactively from the cache. And this way the next time a read comes in, well, we would see that it's missing and we would go grab it from the database and update the cache.

Right? That would be the process of invalidating on write and it's going to make sure that you're reading the latest data for the most part.

You can also use just short TTLs.

And so if some stay on this is acceptable, well, you can keep the cache entry here, but have a TTL that's really short.

This is like what we had a second ago, right?

So in the case of that news feed, we had a 60-second TTL, and maybe we keep this around for 60 seconds, then it'll be gone, and we'll go grab a new one because we know this is something that changes often.

It's potentially acceptable.

Alternatively, you might just accept that eventual consistency is fine.

And this is totally valid. Uh this is for things like feeds, analytics, metrics or brief delay is totally acceptable.

And so if I was discussing this specific problem that we have here, maybe I would say, you know, we have a five minute TTL on our profile data cached here.

And that's totally fine because what it means is that some users will see a stale profile image for five minutes and I'm okay with that because it's not the end of the world and it's fine if they see an old image for a while.

Life is still going to go on. Okay? Right?

Maybe that's something that I could justify based on the design and it would be a totally valid justification to make.

Another common issue which you've seen come up in many of the videos if you've watched any of our other content is called hotkeys.

And so hotkeys is a cache entry that gets way more traffic than everything else. Even if your overall cash hit rate is great, everything's working as planned, that single key can still become a large bottleneck for your system. And so, for example, imagine that you're building Twitter or X and everyone is viewing Taylor Swift's profile. that cache key for her user data could be receiving millions of requests per second and that one key can overload a single Reddus node or shard even though that shard or even though that cache is technically working as expected, right? And so this of course isn't just a problem with caching.

You've heard us talk about this if you've watched our other videos in other contexts too. Databases can have hot rows or partitions as well, but it's a common follow-up question in interviews.

Once you introduce caching, especially if you introduced caching in order to scale your reads, right?

So, caching does increase your overall read throughput, you're now hitting memory as opposed to disk, but it doesn't completely solve the problem if there's one piece of data that is so overwhelmingly popular like with Taylor Swift here.

And so, there's a couple things that you can do. The first and and the most common solution is to just replicate these hot keys. And so if everybody is trying to know about Taylor Swift, well, you can put Taylor Swift on each of the different shards or each of the different instances of the cache in your cache cluster as you've scaled up.

Uh if what I'm talking here about sharding and clustering is not familiar to you, I'm going to link a video uh on sharding below.

But this is basically the concept of a single cache isn't enough.

So we needed to add additional instances of caches and shard our data across them.

meaning split our data across them.

Okay, so now back to the issue at hand.

Replicating hotkeys, right? Means that we'll take Taylor Swift and we'll put her on each of these different caches and then now the application server can just load balance evenly amongst all of them.

So instead of all the traffic going to this cache or this node, it could hit any of the three here, right?

Another fairly popular thing to do is that you can add a local fallback cache.

basically use that inprocess caching idea that we talked about earlier in order to add a cache in order to add a cache here, right?

And so this way we keep extremely hot values like Taylor Swift in the app's memory so that repeated requests don't ever even need to go hit Reddus and we just stored them locally here.

Um, hotkeys, let's see, to wrap up here, they're a really good reminder that caching helps us scale reads. U, but it doesn't make your system magically infinite, and interviewers love to probe you on that. So, be ready.

[snorts] All right, let's wrap up with what is probably the most interesting to all of you who are preparing for your system design interviews, and that's how should you discuss caching in your actual system design interview.

There's two things that we'll talk about.

A, when should you bring it up? And then B, once you bring it up, how you should talk about it or how you should introduce it.

Right? And so the first and most important thing is don't just add a cash for the sake of adding a cash.

I see this all the time. I see candidates who just assume we always need a cash.

They throw it down without any proper justification.

And sometimes they're wrong.

And even if they're right, the lack of justification was a red flag.

And so you want to bring up caching typically when one of four things is true.

The first is that you have a really readheavy workload that's draining your database. And so you could say something like we're serving 100 million daily active users, each of which are making uh 20 requests per day.

That's two billion reads hitting the database.

Somebody check my math.

I I did that a little too quick in my head.

In any case, you know, it's more than our database can handle. So let's put a cache in front of it to take the read load off of our database. Cool.

For expensive queries, you could say something like in the case of newsfeed, right?

Computing a user's personalized nude feed newsfeed is going to require joining a bunch of posts, followers, likes, all these things across multiple tables.

Uh, and that's going to be really expensive to compute. And so what we can do is just cache that newsfeed uh store with a TTL of 60 seconds or so and then serve it really quickly from something like Reddus. And then high database CPU.

This isn't going to come up in an interview because you don't have these metrics. Uh, but certainly in real life, if your database is starting to peg out on its CPU, then it would make sense to add a cache in front of it.

And then latency requirements.

This certainly will come up in an interview.

So in your non-functional requirements, maybe you specified that you needed a 100 millisecond response time on some API endpoint.

Well, then you could argue or justify that that database query is going to take too long, especially if it has expensive queries, and so you'll have to cache it instead.

The pattern's pretty simple here.

Identify the bottleneck, be able to quantify it with some rough numbers, then explain how caching solves it. Now once you've done that, you'll want to actually introduce caching.

And this is where you talk about all the things that we discussed in this video. And so the first one being identify the bottleneck.

That's what we just we just mentioned a moment ago.

And then decide what to cache.

Not everything should be cached, of course.

Focused on the data that is causing the issue that needs to be read frequently.

Maybe it doesn't change often.

It's expensive to fetch or compute.

These are the things that you want to focus on caching.

Think about what the cache keys are going to be. be explicit about this.

Especially in junior or more mid-level interviews, I often see candidates say, I'm going to add a cache here and that's going to solve everything.

And my follow-up is always, well, what are you caching?

What is your cache key?

Uh, what values are in your cache? Right?

So, make sure you're proactive about that.

The second thing is to choose that cache architecture, right? So, this is everything that we were talking about with cach aides, rights, right behinds, etc. So, you can say I'll use cache aside on read. We check Reddus first.

If it's there, we return it. If not, we'll query the database, store the results in Reddus, and then return it back to a user, right?

And then you'll want to mention the eviction policy.

And so you'll say, we'll introduce a cache.

We'll either use LRU, LFU, or maybe you'll mention that you'll have a TTL for preventing stale data, but provide some justification there as it is relevant to your system.

And then lastly, address any potential downsides.

And so don't necessarily just go through the three that I mentioned, but think about your system. Which of these are relevant?

Do I have a TTL on a popular key that could cause a cash stampede or a thundering herd? Do I have issues with cache consistency where I'm going to potentially be serving data that is no longer fresh but instead stale?

Uh, and is that a problem for my system or is it not?

And do I have any issues maybe with hot keys or anything similar?

If you discuss in this order these five things, then you'll be a caching pro.

Your interviewer will be impressed.

Um, and you can continue to talk about the rest of the design. I think the last thing that I'll mention is that caching usually comes up during your deep dives when you're discussing scale.

And so when you get to your non-functional requirements and you're either talking about scale or latency, that's oftentimes the most appropriate time to bring up caching and try to figure out where it fits into your system.

And it's super common in these interviews.

So it's a really good thing to have down pat.

All right, there you have it, folks.

Hopefully you found this useful.

Um, any questions, anything you think I got wrong, go ahead and drop a comment.

I respond to as many of those as I can.

You'll have the Excalibur Draw that we used here in the description.

So, go ahead and check that out. My LinkedIn is there.

Connect with me. Um, I respond to as many messages as possible. I love to hear success stories, especially if you're finding what we're doing useful.

And most importantly, good luck with the upcoming interviews.

You guys are going to nail it. You're putting the work in.

See you soon.

Loading...

Loading video analysis...