Latency Percentiles
1 min read
Understand what p50/p95/p99 mean and why averages lie about latency.
Understand what p50/p95/p99 mean and why averages lie about latency.
How It Works
Latency percentiles describe how a service performs across many requests. p50 (median) means 50% of requests finish faster than this number and 50% are slower. p95 means 95% finish faster, 5% are slower. p99 means 99% finish faster, so 1% of requests hit the slow tail. p999 (three nines) refers to 99.9% — the 0.1% that are slower. The key insight: averages hide the tail. A service with average latency of 20ms but p99 of 2 seconds is effectively broken for 1% of users, which at 1M requests per second means 10,000 unhappy users every second.
Real-World Example
AWS publishes p50 and p99 latency SLAs separately for services like DynamoDB — p50 under 10ms, p99 under 20ms for single-row reads. Their architecture is designed around the p99 target, not the median, because p50 speed does not matter if 1% of requests stall long enough to time out.
Test Yourself
Scenario: A payments API reports p50 latency of 40ms and p99 latency of 1.5s, running at 20,000 requests/sec steady-state. The team says "average latency is great, users are happy." Is that true? Do the math and explain the actual user impact.
Get notified when we launch
One email when the full practice product is live. No spam.