
Caching Strategies for System Design Interviews
I've been breaking down system design concepts that seemed simple on the surface but turned out to have more depth than I expected. This one stuck with me because I kept getting the same interview feedback about caching patterns.
Every system design mock I bombed in my first year had the same note: "Your caching story is too shallow." I'd say "put Redis in front of the database" and move on. The interviewer would nod, then ask: "Which caching strategy?" And I'd freeze.
I didn't know there were different types of caching. I thought caching was just... caching.
It's not. There are four core caching strategies, and each one bets differently on consistency, latency, and failure behavior. Once I understood these tradeoffs, my system design answers got dramatically better. I could finally reason about why one approach fits a problem and another doesn't.
What Caching Actually Solves (and What It Doesn't)
Before jumping into caching strategies for system design, let's be honest about what caching does. It stores frequently accessed data in a faster storage layer, reducing latency and database load. That's it.
Caching doesn't solve write operations at scale. It doesn't fix a bad data model. It doesn't eliminate consistency problems. In many cases, it creates new ones.
The fundamental tension is between speed and truth. Your cache is fast because it's a copy. But copies go stale. Every caching strategy answers one question differently: "How much stale data can you tolerate, and who keeps things fresh?"
Two metrics define the landscape. Cache hit ratio measures how often your caching systems serve requests versus the database. Anything below 80% means your memory cache burns resources without pulling its weight.
The staleness window measures how long your cache might serve outdated data after the source changes. With that framing, the four strategies start making sense.
Cache-Aside (Lazy Loading)
This is the strategy most developers learn first. You're probably using it right now without calling it by name.
How It Works
The application manages the cache directly. On a read, the app checks the cache first. If the data is there (cache hit), it returns immediately. If not (cache miss), the app reads from the database, writes the result into the cache, then returns it.
On a write, the app updates the database and then removes the cached entry. The next read triggers a cache miss and loads fresh data in the cache.
def get_user(user_id):
# Check cache first
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss: read from DB
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Populate cache for next time
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
def update_user(user_id, data):
# Update database first
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Invalidate cache
redis.delete(f"user:{user_id}")When Cache-Aside Wins
Cache-aside works best for read-heavy workloads where data access patterns are unpredictable. Only requested data gets cached, so you don't waste memory on things nobody reads. This approach also handles failure well. If the cache goes down, the application falls back to the database. Slower, but functional.
E-commerce product catalogs are a classic example. Millions of products exist, but only a fraction get viewed frequently. Cache-aside naturally keeps popular items warm without you predicting which ones matter.
The Failure Mode You'll Hit
There's also a subtler problem: the stale data window. Between updating the database and removing the cache entry, any read gets the old value. In most apps this window lasts milliseconds and nobody notices. In financial systems, those milliseconds matter.
Read-Through Cache
Read-through looks similar to cache-aside from the outside, but the responsibility shifts. The cache itself handles data loading instead of the application.
How It Works
The application only talks to the cache. On a miss, the cache layer loads the data from the database, stores it, and returns it. The application never touches the database directly for reads. This simplifies your database queries across the codebase.
# With a read-through cache, your application code simplifies to:
def get_user(user_id):
# The cache handles miss logic internally
return cache.get(f"user:{user_id}")
# The cache is configured with a loader function:
# cache.set_loader(lambda key: db.query(
# "SELECT * FROM users WHERE id = %s", key
# ))When Read-Through Wins
The big advantage is a simpler application layer. Every developer reads from the cache using the same interface. Nobody accidentally bypasses the cache and hits the database directly. The data loading logic lives in one place.
Read-through pairs well with write-through for distributed cache setups where you want the cache to act as the primary data interface. Microservice architectures benefit here because each service treats its cache as a consistent data layer.
The Failure Mode You'll Hit
The cache becomes a single point of failure for reads. With cache-aside, a cache crash means slower response times from the database. With read-through, a cache crash can break reads entirely if the app has no fallback path.
Debugging also gets harder. When data is wrong, you check: Is the loader function buggy? Is the cache storing data correctly? Is the TTL triggering cache eviction too aggressively? The abstraction saves time during development but costs time during incidents.
Write-Through Cache
Now we shift from read strategies to write strategies. Write-through is the conservative choice, and that's exactly why banks use it.
How It Works
Every write goes to the cache and the database at the same time. The write only succeeds if both confirm it. This means the cache always holds the latest data.
def update_user(user_id, data):
# Write to both cache and DB in a single operation
# Both must succeed for the operation to complete
cache.put(f"user:{user_id}", data) # internally also writes to DB
# Under the hood, the cache layer does:
# 1. Write to cache
# 2. Write to database
# 3. Return success only if both succeedWhen Write-Through Wins
You get read-after-write consistency. The moment a write completes, any read sees the updated value. No staleness window. No eventual consistency.
This matters for user-facing write operations where people expect to see changes immediately. Profile updates, password changes, account settings. If someone changes their email and immediately loads their profile page, they need to see the new email. Write-through guarantees it.
The Failure Mode You'll Hit
Write latency doubles. Every write must succeed in two places before returning. For systems with low write volume, this is fine. For systems processing thousands of writes per second, this becomes a bottleneck that hurts response times.
Write-Behind (Write-Back) Cache
Write-behind is the aggressive strategy. It gives you the best write performance of any caching approach, but it asks you to accept a risk that makes most engineers nervous.
How It Works
Writes go to the cache immediately and return to the caller. The database update happens later. The cache batches up pending writes and flushes them to the database on a schedule or when the batch hits a size threshold.
# From the application's perspective:
def increment_score(player_id, points):
# Returns instantly after cache write
cache.put(f"score:{player_id}", new_score)
# DB update happens async, maybe seconds later
# Behind the scenes, the cache layer:
# 1. Writes to cache immediately
# 2. Adds the write to an async queue
# 3. Periodically flushes the queue to the database
# 4. Coalesces multiple writes to same keyWhen Write-Behind Wins
Gaming leaderboards are the textbook example. When millions of players update scores every second, you cannot write each update to a database one at a time. The database would collapse. Write-behind absorbs the write burst in memory and flushes to the database in controlled batches.
The coalescing feature is a hidden superpower. If a player's score changes 50 times in 10 seconds, write-behind only writes the final value to the database. That reduces database queries by 50x.
The Failure Mode You'll Hit
There's also an ordering problem. If flushes arrive at the database out of order, an older value can overwrite a newer one. Robust write-behind uses sequence numbers or timestamps to detect and resolve this.
Choosing the Right Caching Strategy
The choice isn't about which strategy is "best." It's about which tradeoff you can live with. Here are some real-world mappings for different types of caching patterns:
- E-commerce product catalog: Use cache-aside. Read-heavy, unpredictable access patterns, and a few seconds of stale data is invisible to users.
- Banking account balances: Use cache-aside with write-through. You need read-after-write consistency and can tolerate higher write latency because write volume is moderate.
- Gaming leaderboard: Use write-behind. Write volume is extreme, eventual consistency is fine, and losing a few seconds of cache updates on a crash is acceptable.
- Microservice config store: Use read-through. Every service reads configuration through a distributed cache, giving you a clean interface across dozens of services.
In practice, production caching systems often combine strategies. A common pattern uses read-through for reads and write-behind for writes, reducing latency for both read and write operations. Redis supports all four strategies depending on how you configure your client layer.
What I'd Say in a System Design Interview Now
If someone asked me to design caching for a system, I wouldn't start with "let's add Redis." I'd start with three questions:
- What's the read-to-write ratio? This tells me whether to optimize for reads or writes.
- How stale can the data be? This narrows down the consistency requirement.
- What happens if we lose the cache? This determines the failure tolerance and shapes the user experience.
From those three answers, the strategy picks itself. Then I'd name it: "I'd use a cache-aside strategy here because the access patterns are unpredictable and we can tolerate 30 seconds of stale data." That sentence shows you understand the tradeoff, not just the technology.
The failure mode is the part most candidates skip. Mentioning it unprompted shows the interviewer you've thought about what happens when things go wrong. That's the difference between "put Redis in front of the DB" and actually understanding caching strategies. For more system design concepts like this, check out Levelop's blog.
Frequently asked questions
What is the difference between cache-aside and read-through?
In cache-aside, your application code checks the cache, fetches from the database on a miss, and writes data back to the cache. In read-through, the cache handles all of this internally. You just call cache.get(key) and the cache manages database queries on its own. The practical difference is where the data loading logic lives.
When should you use write-behind instead of write-through?
Use write-behind when write volume is so high that synchronous database writes create a bottleneck, and you can tolerate losing a few seconds of writes if the cache crashes. Gaming leaderboards, real-time analytics, and session tracking are typical examples. Use write-through when every write must be durably stored the moment it completes.
What happens when a write-behind cache crashes before flushing?
The unflushed writes are lost. They existed only in memory and hadn't reached the database. This is the core tradeoff: faster writes in exchange for accepting data loss risk. You can reduce this risk with replication across multiple cache nodes, write-ahead logs that survive restarts, and shorter flush intervals.
Can you combine multiple caching strategies?
Yes, and production systems commonly do. The most frequent combination uses read-through for reads and write-behind for writes. This gives you a clean read interface and high write throughput. Another common pattern uses cache-aside for most data and write-through for critical data.
What is cache stampede and how do you prevent it?
Cache stampede happens when a popular cache entry expires and hundreds of concurrent requests hit the database at once. You can prevent it with cache locking, staggered TTLs that add random jitter to expiration times, and background refresh that loads new data before the current entry expires.
