← Course Index

Message Queues & Async Processing

~25 min · Foundations · DDIA §11 · Alex Xu Vol 2, Ch 4

Ref
Primary Source
DDIA — Chapter 11: Stream Processing

Kleppmann's treatment of message streams, logs, and stream processing is the deepest available. Start with the first half (message brokers). Also: ByteByteGo's "Message Queue" video for visual intuition.

Why Async Processing?

Synchronous architectures are brittle. If your checkout service calls the email service directly, a slow email server makes checkout slow. A crashed email service can crash checkout. Message queues decouple services in time and space — producers enqueue work and move on; consumers process it independently.

❌ Synchronous (Coupled) Checkout Email Svc Analytics Svc (slow) Slow analytics → checkout hangs. Analytics crash → checkout fails. ✅ Async (Decoupled) Checkout Queue (Kafka / SQS) Returns immediately ✓ Email Analytics Workers fail independently. Queue buffers spikes. Scale workers independently of producers.
Sync vs async — decoupling with a queue makes each service independently scalable and fault-tolerant

Key Benefits of Message Queues

Point-to-Point vs Pub/Sub

Point-to-Point Queue
One producer → Queue → One consumer

Each message is delivered to
exactly one consumer.

Use: task distribution, work queues
Example: "Process this image"
Tools: SQS, RabbitMQ (default)
Pub/Sub (Publish-Subscribe)
Publisher → Topic → Many subscribers

Each subscriber gets a copy
of every message.

Use: event broadcasting, fanout
Example: "User signed up" → email, analytics, CRM
Tools: Kafka, Google Pub/Sub, SNS

Kafka vs Traditional Queues

FeatureKafkaRabbitMQ / SQS
ModelDistributed log — messages persisted and replayedQueue — messages deleted after consumption
Consumer groupsMultiple groups each get all messages independentlyCompeting consumers share the queue
OrderingStrict ordering within a partitionFIFO within a queue (SQS FIFO, RabbitMQ)
ThroughputMillions of messages/secondThousands to low millions
ReplayYes — reprocess from any offsetNo — consumed messages are gone
ComplexityHigh — needs cluster management, ZooKeeper/KRaftLow — managed services available
Use whenEvent streaming, audit logs, data pipelinesTask queues, simple async processing
💡 Interview rule

Use Kafka when you need replay, high throughput, or fan-out to multiple consumer groups. Use SQS/RabbitMQ for simpler task distribution. Don't over-engineer — SQS works for most things.

Delivery Guarantees

GuaranteeMeaningTrade-off
At-most-onceMessage delivered 0 or 1 times. May be lost.Fastest. OK for metrics, logs where loss is acceptable.
At-least-onceMessage delivered 1 or more times. May be duplicated.Most common. Consumer must be idempotent.
Exactly-onceDelivered exactly once.Very expensive. Requires distributed transactions. Kafka supports it with significant overhead.
Idempotency is essential

With at-least-once delivery (the default), your consumer must handle duplicate messages safely. Charge a credit card twice? That's a serious bug. Use a unique message ID to deduplicate: if already_processed(message_id): return.

Backpressure

If consumers are slower than producers, the queue grows unbounded. This is a backpressure problem. Mitigations:

Check Your Understanding

1. When a user places an order, you need to send a confirmation email, update analytics, and trigger warehouse processing. Which pattern fits best?
2. Your consumer uses at-least-once delivery and processes payment charges. What must you implement?
3. You need to replay all events from the last week to rebuild a corrupted analytics database. Which tool supports this?

🎓 Message queues appear in almost every case study. Ask me how queues are used in the notification system (Lesson 21), or the video processing pipeline in YouTube (Lesson 25), or how to choose between Kafka and SQS for a specific use case.