Message Queues & Async Processing

~25 min · Foundations · DDIA §11 · Alex Xu Vol 2, Ch 4

Ref

Primary Source

DDIA — Chapter 11: Stream Processing

Kleppmann's treatment of message streams, logs, and stream processing is the deepest available. Start with the first half (message brokers). Also: ByteByteGo's "Message Queue" video for visual intuition.

Why Async Processing?

Synchronous architectures are brittle. If your checkout service calls the email service directly, a slow email server makes checkout slow. A crashed email service can crash checkout. Message queues decouple services in time and space — producers enqueue work and move on; consumers process it independently.

Sync vs async — decoupling with a queue makes each service independently scalable and fault-tolerant

Key Benefits of Message Queues

Decoupling — producer and consumer don't know about each other. Change one without touching the other.
Traffic spike buffering — if 10,000 orders come in at once, the queue absorbs the spike while workers process at their own pace.
Retry on failure — if a consumer crashes mid-processing, the message goes back to the queue for another worker.
Independent scaling — scale producers and consumers separately based on their own load.
Fan-out — one message can be delivered to many consumers (pub/sub).

Point-to-Point vs Pub/Sub

Point-to-Point Queue

One producer → Queue → One consumer

Each message is delivered to
exactly one consumer.

Use: task distribution, work queues
Example: "Process this image"
Tools: SQS, RabbitMQ (default)

Pub/Sub (Publish-Subscribe)

Publisher → Topic → Many subscribers

Each subscriber gets a copy
of every message.

Use: event broadcasting, fanout
Example: "User signed up" → email, analytics, CRM
Tools: Kafka, Google Pub/Sub, SNS

Kafka vs Traditional Queues

Feature	Kafka	RabbitMQ / SQS
Model	Distributed log — messages persisted and replayed	Queue — messages deleted after consumption
Consumer groups	Multiple groups each get all messages independently	Competing consumers share the queue
Ordering	Strict ordering within a partition	FIFO within a queue (SQS FIFO, RabbitMQ)
Throughput	Millions of messages/second	Thousands to low millions
Replay	Yes — reprocess from any offset	No — consumed messages are gone
Complexity	High — needs cluster management, ZooKeeper/KRaft	Low — managed services available
Use when	Event streaming, audit logs, data pipelines	Task queues, simple async processing

💡 Interview rule

Use Kafka when you need replay, high throughput, or fan-out to multiple consumer groups. Use SQS/RabbitMQ for simpler task distribution. Don't over-engineer — SQS works for most things.

Delivery Guarantees

Guarantee	Meaning	Trade-off
At-most-once	Message delivered 0 or 1 times. May be lost.	Fastest. OK for metrics, logs where loss is acceptable.
At-least-once	Message delivered 1 or more times. May be duplicated.	Most common. Consumer must be idempotent.
Exactly-once	Delivered exactly once.	Very expensive. Requires distributed transactions. Kafka supports it with significant overhead.

Idempotency is essential

With at-least-once delivery (the default), your consumer must handle duplicate messages safely. Charge a credit card twice? That's a serious bug. Use a unique message ID to deduplicate: if already_processed(message_id): return.

Backpressure

If consumers are slower than producers, the queue grows unbounded. This is a backpressure problem. Mitigations:

Add more consumers — scale horizontally
Rate limit producers — slow down the source
Set queue size limits — reject or drop messages when full (only for non-critical data)
Dead-letter queue (DLQ) — messages that fail repeatedly go to a DLQ for inspection

Check Your Understanding

1. When a user places an order, you need to send a confirmation email, update analytics, and trigger warehouse processing. Which pattern fits best?

2. Your consumer uses at-least-once delivery and processes payment charges. What must you implement?

3. You need to replay all events from the last week to rebuild a corrupted analytics database. Which tool supports this?

🎓 Message queues appear in almost every case study. Ask me how queues are used in the notification system (Lesson 21), or the video processing pipeline in YouTube (Lesson 25), or how to choose between Kafka and SQS for a specific use case.