Scale estimationBottleneck analysis

Latency Percentiles

1 min read

Understand what p50/p95/p99 mean and why averages lie about latency.

How It Works

Latency percentiles describe how a service performs across many requests. p50 (median) means 50% of requests finish faster than this number and 50% are slower. p95 means 95% finish faster, 5% are slower. p99 means 99% finish faster, so 1% of requests hit the slow tail. p999 (three nines) refers to 99.9% — the 0.1% that are slower. The key insight: averages hide the tail. A service with average latency of 20ms but p99 of 2 seconds is effectively broken for 1% of users, which at 1M requests per second means 10,000 unhappy users every second.

Real-World Example

AWS publishes p50 and p99 latency SLAs separately for services like DynamoDB — p50 under 10ms, p99 under 20ms for single-row reads. Their architecture is designed around the p99 target, not the median, because p50 speed does not matter if 1% of requests stall long enough to time out.

Test Yourself

Scenario: A payments API reports p50 latency of 40ms and p99 latency of 1.5s, running at 20,000 requests/sec steady-state. The team says "average latency is great, users are happy." Is that true? Do the math and explain the actual user impact.

Get notified when we launch

One email when the full practice product is live. No spam.

Previous← Geo-distribution & Multi-region

NextLock Contention Analysis→