Scaling strategyTrade-offs

Queues

1 min read

Decouple producers and consumers so slow work can run asynchronously.

Queues move slow or bursty work off the request path. They are the cleanest way to absorb spikes without making users wait.

How It Works

Message queues (Kafka, SQS, RabbitMQ) sit between services to buffer requests, absorb traffic spikes, and decouple processing speed from user-facing latency. The producer writes a message and returns immediately; the consumer processes it at its own pace. Key design decisions: at-least-once vs exactly-once delivery, ordering guarantees, dead-letter queues for failed messages, and consumer group scaling. In interviews, queues solve the "slow downstream" problem — any time a request triggers work that takes seconds, put a queue in between.

Real-World Example

YouTube uses a job queue between upload ingestion and transcoding. When a user uploads a video, the API writes metadata to the database and enqueues a transcode job. A fleet of GPU workers pulls jobs, transcodes to multiple resolutions, and writes outputs to object storage. Upload latency is ~2 seconds regardless of video length.

Test Yourself

What does a queue buy you between an API and a worker fleet?

Get notified when we launch

One email when the full practice product is live. No spam.

Previous← Caching

NextConsistency→