Caching — The Art of Remembering What's Expensive to Compute
A visual deep dive into caching. From CPU caches to CDNs — understand cache strategies, eviction policies, and the hardest problem in computer science: cache invalidation.
⚡
The fastest computation is
the one you never do.
Caching is the most important performance technique in all of computing.
Your CPU uses it. Your browser uses it. Netflix, Google, and every app you’ve ever used —
they all live and die by how well they cache.
↓ Scroll to understand the technique behind everyone’s performance gains
Why Caching Matters: The Latency Gap
Different storage systems have wildly different speeds. The gap between “fast” and “slow” is enormous — and growing.
Cache Hit vs Cache Miss
A cache is simple: a fast layer that stores copies of frequently accessed data. When you look for something, two things can happen:
Cache Hit Rate — the metric that matters
The percentage of requests served directly from cache without touching the slower backend. This single number tells you how effective your cache is — aim for 95%+ in production.
Hit Rate = hits / (hits + misses) A 99% hit rate means only 1 in 100 requests reaches the database. This is the standard target for production systems — the remaining 1% of misses barely impacts average latency.
Your real-world latency is a weighted average of cache speed and database speed. With 99% hit rate, 1ms cache, and 50ms database: 0.99×1ms + 0.01×50ms = 1.49ms — a 33× improvement over hitting the database every time.
Avg Latency = hit_rate × cache_latency + miss_rate × db_latency Your cache has a 95% hit rate. Cache latency is 2ms, database latency is 100ms. What's the average request latency?
💡 Use the formula: hit_rate × cache_latency + miss_rate × db_latency...
Average = 0.95 × 2ms + 0.05 × 100ms = 1.9ms + 5ms = 6.9ms ≈ 7ms. Without the cache, every request would take 100ms. That's a 14x speedup from caching, even with a 5% miss rate!
Cache Strategies: Read vs Write
How do you keep the cache and database in sync? Different strategies have different trade-offs.
Your e-commerce site needs to cache product inventory counts. Which strategy should you use?
💡 What happens if the cache says 'in stock' but the database says 'sold out'?
Inventory is a consistency-critical field — you can't show '5 in stock' when there are actually 0 (overselling). Write-Through ensures every inventory update is written to both the cache and database synchronously, so they're always consistent. The slower write speed is worth it to avoid selling items you don't have.
Cache Eviction: When Space Runs Out
Caches have limited space. When they’re full and a new item needs to be stored, which old item gets removed? This is the eviction policy.
Your cache holds 4 items and uses LRU eviction. Access pattern: A, B, C, D, A, E. Which item gets evicted when E is accessed?
💡 Trace through the access pattern. Which item has the oldest 'last used' timestamp after A is re-accessed?
After accessing A, B, C, D, the cache is full: [A, B, C, D]. Then A is accessed again, making it 'recently used'. The order from most to least recent is now: A, D, C, B. When E needs space, B is evicted because it's the least recently used. Note: A was saved because it was re-accessed!
The Hardest Problem: Cache Invalidation
When the source data changes, the cache becomes stale — it holds an old version. Serving stale data can mean showing the wrong price, the wrong balance, or the wrong inventory. Getting invalidation right is critical.
A user updates their profile photo. Your app uses 5-minute TTL caching. What happens for the next 5 minutes?
💡 If the cache has a 5-minute TTL and the user updates at minute 1, what does the cache serve for the next 4 minutes?
With TTL-only caching, the old photo sits in the cache as stale data until it expires (up to 5 minutes). During that window, other users see the old photo. Solutions: (1) Invalidate the cache key immediately on update. (2) Use a shorter TTL. (3) For user-facing changes, bypass the cache on the 'my profile' page.
Caching in the Real World
Netflix serves video to 200+ million users globally. Where does most of the video content come from?
💡 If 200 million users all hit one server in the US, what would happen?
Netflix's Open Connect network places custom cache servers directly inside ISPs worldwide. When you watch a popular show, it's likely served from a cache server in your city or even your ISP's building — not from Netflix's US data center. This is caching at global scale: 95%+ of Netflix traffic is served from these edge caches.
🎓 What You Now Know
✓ Caches trade space for speed — Store computed results in a fast layer to avoid slow re-computation.
✓ Cache-Aside is the default pattern — App checks cache first, falls back to DB on miss, stores result.
✓ LRU is the most common eviction policy — When space runs out, evict whoever was used longest ago.
✓ Invalidation is the hard part — TTL for simplicity, event-based for freshness. Usually both.
✓ CDNs are caching at global scale — Edge servers near users power Netflix, YouTube, and every fast website.
Caching is the foundation of every fast system. Whether you’re building a startup or operating at Netflix scale, the concepts are the same: put frequently accessed data closer to the user, evict smartly, and invalidate carefully. 🚀
↗ Keep Learning
Database Sharding — Scaling Beyond One Machine
A visual deep dive into database sharding. From single-server bottlenecks to consistent hashing — understand how companies scale their databases to billions of rows.
Vector Databases — Search by Meaning, Not Keywords
A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.
Comments
No comments yet. Be the first!