Concept Library

Master the building blocks of massive scale. Organized by the 7 dimensions used to evaluate your system design interviews.

Requirements & Scoping

Break down vague prompts into actionable requirements.

Consistency→

Keep replicas and reads aligned enough for the product guarantees you promised.

1 min read

Scope Decomposition→

Break a vague problem into functional requirements, non-functional requirements, and explicit non-goals.

1 min read

SLA Definition→

Define measurable availability, latency, and throughput targets before designing.

1 min read

Constraint Identification→

Surface the hard constraints (regulatory, latency, budget) that narrow the design space.

1 min read

Clarifying Questions Checklist→

Before drawing boxes, ask five questions. The answers change every downstream decision.

1 min read

Functional vs Non-functional Requirements→

FRs describe WHAT the system does. NFRs describe HOW WELL. Miss either and you're designing in the dark.

1 min read

Non-goal Articulation→

Explicitly saying what you won't build is as valuable as saying what you will.

1 min read

Scale Estimation

Convert user counts into infrastructure numbers.

Back-of-Envelope Math→

Estimate QPS, storage, and bandwidth from DAU using simple arithmetic.

1 min read

Capacity Planning→

Size infrastructure for current load plus growth with headroom for spikes.

1 min read

Traffic Modeling→

Map read/write ratios and access patterns to identify which operations dominate.

1 min read

Bandwidth Arithmetic→

Compute the network bandwidth your system actually needs — most designs miss this until it's too late.

1 min read

Concurrent Connections Math→

Real-time systems are bounded by concurrent connection count — count them before you count QPS.

1 min read

Latency Percentiles→

Understand what p50/p95/p99 mean and why averages lie about latency.

1 min read

Storage Growth Projection→

Extrapolate how much storage you'll need in 6 and 12 months — not just day one.

1 min read

API Design

Design clean contracts and handle edge cases.

REST Contract Design→

Design clean API endpoints with proper resource naming, methods, and response shapes.

1 min read

Pagination Patterns→

Choose between offset, cursor, and keyset pagination based on data characteristics.

1 min read

Idempotency→

Ensure repeated requests produce the same result — critical for payment and write-heavy APIs.

1 min read

API Versioning→

Pick a versioning strategy before your first breaking change forces one — three options, one easy answer.

1 min read

Data Model Design→

Design the data model before sketching endpoints — storage layout constrains every API choice downstream.

1 min read

Error Response Shapes→

Consistent error responses prevent clients from writing error handling as an afterthought.

1 min read

Real-time API Patterns→

Long-polling, Server-Sent Events, and WebSockets — pick based on direction, frequency, and client capability.

1 min read

High-Level Design

Decompose systems into components with clear boundaries.

Component Decomposition→

Break a system into services/components with clear responsibilities and interfaces.

1 min read

Data Flow Mapping→

Trace how data moves through the system for each key operation (read path, write path).

1 min read

Service Boundaries→

Draw boundaries so each service owns its data and communicates through APIs, not shared DBs.

1 min read

Diagramming Conventions→

A sketch becomes a diagram when every box and arrow carries meaning. Four conventions do most of the work.

1 min read

Event-driven vs RPC→

RPC says "do this and tell me the result." Events say "this happened, fan out." Architectures get complex when you conflate them.

1 min read

Sync vs Async Communication→

Synchronous calls block the caller; async calls don't. Most coupling bugs come from mixing these up.

1 min read

Bottleneck Analysis

Identify and resolve performance chokepoints.

Hotspot Identification→

Find the specific key, partition, or path that receives disproportionate traffic.

1 min read

Thundering Herd→

Prevent all clients from hitting the origin simultaneously when a cache entry expires.

1 min read

Connection Pool Exhaustion→

Pooled connections are a finite resource — one slow query can block the entire app.

1 min read

Latency Percentiles→

Understand what p50/p95/p99 mean and why averages lie about latency.

1 min read

Lock Contention Analysis→

A shared lock can make a 64-core server behave like a 1-core one — identify it before you blame hardware.

1 min read

Queue Saturation & Backpressure→

When consumers fall behind producers, queues grow unbounded — and then everything crashes at once.

1 min read

Read/Write Amplification→

One logical operation often triggers many physical ones — spot when 1 → N is breaking you.

1 min read

SPOF Detection→

Find the single point of failure — especially the hidden ones nobody thinks about.

1 min read

Tail Latency Reasoning→

Reason about p95/p99, not just averages — tail latency is what users actually feel.

1 min read

Scaling Strategy

Choose the right replication, sharding, and caching patterns.

Caching→

Store hot data close to the request path to cut latency and reduce origin load.

1 min read

Queues→

Decouple producers and consumers so slow work can run asynchronously.

1 min read

Load Balancing→

Spread traffic across workers so no single instance becomes the bottleneck.

1 min read

Sharding Strategies→

Partition data across multiple database instances to distribute write load.

1 min read

Replication Topologies→

Choose single-leader, multi-leader, or leaderless replication based on availability and consistency needs.

1 min read

Consensus Protocols→

When multiple nodes must agree on a single value, you need consensus. Raft is the right answer 95% of the time.

1 min read

Failure Mode Planning→

Design for the failures you expect, not the happy path you hope for. Three categories: slow, broken, wrong.

1 min read

Geo-distribution & Multi-region→

Multi-region is expensive. Know the specific reason you need it — latency, availability, or regulation — before you do.

1 min read

Stateless Service Design→

Stateless services scale horizontally by adding instances. Stateful ones require coordination. Most "why won't this scale" problems start with accidental state.

1 min read

Trade-Offs

Reason about consistency, availability, and latency.

Queues→

Decouple producers and consumers so slow work can run asynchronously.

1 min read

Consistency→

Keep replicas and reads aligned enough for the product guarantees you promised.

1 min read

CAP Reasoning→

Explicitly state your availability vs consistency choice and justify it for the use case.

1 min read

Latency-Consistency Spectrum→

Map where your feature sits on the spectrum from low-latency/eventual to high-latency/strong.

1 min read

Build vs Buy→

Building in-house is cheaper at small scale, more expensive at large scale — and the crossover is different for every category.

1 min read

Cost vs Performance→

Performance optimizations have costs. Make the tradeoff explicit — "this costs $X to save Y ms" — or you'll over-engineer.

1 min read

Monolith vs Microservices→

Microservices add network, deployment, and operational complexity. Monoliths have fewer moving parts. Pick based on team coordination pain, not headcount.

1 min read

Push vs Pull Fan-out→

For social fan-out, push-on-write is fast to read but expensive to write. Pull-on-read is the opposite. Real systems use both, chosen by follower count.

1 min read

Strong vs Weak Consistency Per Feature→

Different features in the same system can have different consistency needs. Don't pay for strong where eventual is fine.

1 min read

Other Concepts

Concepts that span multiple dimensions.

Top-Down Structuring→

1 min read

Interview Time Management→

1 min read

Proactive Signposting→

1 min read