Scaling strategy

Caching

1 min read

Store hot data close to the request path to cut latency and reduce origin load.

Caching keeps hot reads close to the caller. Use it when read traffic is high and the underlying data can tolerate brief staleness.

How It Works

Caching places frequently accessed data in a fast-access layer (memory, CDN, or local disk) between the client and the authoritative data store. The goal is to reduce latency for reads and offload the origin database. Common patterns include read-through caches (cache checks itself, fills on miss), write-through (writes go to cache first, then origin), and write-behind (async origin update). Cache invalidation is the hard problem — TTL-based expiry is simplest but can serve stale data; event-driven invalidation is precise but adds complexity. In interviews, always name the invalidation strategy alongside the caching layer.

Real-World Example

Twitter uses a multi-tier cache: a request-level cache in the API layer for the current user's timeline, a distributed Memcached cluster for tweet objects, and a CDN for media. When a tweet is posted, timeline fan-out invalidates the per-user cache entries for all followers. The cache hit rate for tweet reads is >99%.

Test Yourself

When is caching the wrong first fix?

Get notified when we launch

One email when the full practice product is live. No spam.

NextQueues→