Bottleneck analysis

Lock Contention Analysis

2 min read

A shared lock can make a 64-core server behave like a 1-core one — identify it before you blame hardware.

How It Works

Lock contention happens when multiple threads or requests serialize on the same resource — each waits its turn before doing work. The tell-tale signature is CPU underutilization plus high latency under load: CPUs sit idle because threads are blocked waiting for the lock, not because they are busy doing work. Classic offenders: global sequence counters (everyone increments the same variable), singleton configuration caches, shared rate limiters, hot rows in a database, single-writer leader nodes. Mitigations depend on the access pattern — shard the lock (split one lock into N smaller ones on different keys), use lock-free data structures (atomic compare-and-swap operations instead of locks), partition counters (each shard increments its own, reconcile periodically), or switch to optimistic concurrency (retry on conflict instead of blocking).

Real-World Example

Pre-5.1 MySQL had a global AUTO-INC mutex (a single lock for auto-incrementing primary keys) on InnoDB tables — every INSERT serialized on it regardless of which rows were touched. A 64-core server could only achieve about 10,000 inserts/sec because the lock became the bottleneck well before CPUs or disks did. The 5.1 fix was partitioning the lock per-table. The same anti-pattern appears constantly in distributed caches with a shared stats counter.

Test Yourself

Scenario: An inventory service running on a 32-core server handles ~2,000 QPS at 40ms p50 during the day. During a flash sale, QPS climbs to 8,000 and p50 jumps to 600ms, but CPU utilization sits at just 18% and disk I/O is idle. Diagnose.

Get notified when we launch

One email when the full practice product is live. No spam.

Previous← Latency Percentiles

NextMonolith vs Microservices→