📨 Messaging Systems - Kafka, RabbitMQ, SQS & Pub/Sub
The Senior Mindset: Don’t ask “which is better?” Ask “What are the delivery guarantees I need?” and “How do I want to handle state?” Choosing a broker is a trade-off between throughput, latency, and the complexity of the consumer logic.
🚦 Broker Architectures: Two Main Philosophies
Section titled “🚦 Broker Architectures: Two Main Philosophies”1. Message Queues (Smart Broker, Dumb Consumer)
Section titled “1. Message Queues (Smart Broker, Dumb Consumer)”The broker tracks which messages are consumed and deletes them once acknowledged.
- RabbitMQ: High feature set (routing, priorities). Best for complex workflows.
- AWS SQS: Fully managed, infinitely scalable, but simpler routing logic.
- Behavior: Usually Point-to-Point. One message is processed by exactly one consumer.
2. Log-based Streaming (Dumb Broker, Smart Consumer)
Section titled “2. Log-based Streaming (Dumb Broker, Smart Consumer)”The broker is a distributed append-only log. It doesn’t track consumption; consumers track their own “offset” (position in the log).
- Apache Kafka / Amazon MSK: Built for massive throughput and data retention.
- Google Cloud Pub/Sub: A managed hybrid that scales globally.
- Behavior: Fan-out. The same stream of events can be read by multiple different service groups simultaneously.
🔍 The Big Three Comparison
Section titled “🔍 The Big Three Comparison”| Feature | RabbitMQ | Apache Kafka | AWS SQS |
|---|---|---|---|
| Model | Push (Broker pushes to consumer) | Pull (Consumer requests data) | Pull (Short/Long polling) |
| Persistence | Deleted after ACK | Persistent (Retention policy) | Deleted after ACK |
| Ordering | Guaranteed within a queue | Guaranteed within a Partition | Best-effort (or strict with FIFO) |
| Scaling | Vertical / Cluster-based | Horizontal (Adding partitions) | Native / Serverless |
| Best For | Task Queues, RPC, Complex Routing | Log Aggregation, Event Sourcing, Big Data | Decoupling Microservices (Cloud-native) |
🛠️ Delivery Guarantees (The Senior Perspective)
Section titled “🛠️ Delivery Guarantees (The Senior Perspective)”You must choose which “lie” you can live with:
- At-Most-Once: Messages may be lost, but never duplicated. (Fastest, lowest overhead).
- At-Least-Once: Messages are never lost, but may be delivered more than once. (The industry standard). Requires consumers to be idempotent.
- Exactly-Once: Theoretically impossible across a network, but “effectively” achieved by Kafka through transactional writes and idempotent producers. (Highest overhead).
⚖️ Strategic Decision Framework
Section titled “⚖️ Strategic Decision Framework”When to choose RabbitMQ?
Section titled “When to choose RabbitMQ?”- You need complex routing (e.g., using Header or Topic exchanges).
- You need built-in Message Priority.
- You are working with legacy protocols like AMQP or MQTT.
When to choose Kafka?
Section titled “When to choose Kafka?”- You need to replay data (e.g., rebuilding a database from an event log).
- You have massive throughput requirements (millions of events per second).
- You are implementing Event Sourcing or CQRS.
When to choose SQS?
Section titled “When to choose SQS?”- You are in the AWS ecosystem and want Zero Maintenance.
- You need to handle huge spikes in volume without managing a cluster.
- Your architecture is mostly “Fire and Forget” task processing.
🚩 Common Pitfalls for Seniors
Section titled “🚩 Common Pitfalls for Seniors”The “Poison Pill” Message
Section titled “The “Poison Pill” Message”A message that causes a consumer to crash every time it’s read.
- Solution: Use Dead Letter Queues (DLQ). After X failed retries, the broker moves the message to a separate queue for manual debugging.
Backpressure & Slow Consumers
Section titled “Backpressure & Slow Consumers”If the producer is faster than the consumer, the queue grows.
- RabbitMQ: Can run out of memory and crash.
- Kafka: Disk space fills up, but doesn’t affect broker performance as much.
- Strategy: Monitor Consumer Lag religiously. If lag increases, scale your consumer instances.
💡 Seniority Note: A message broker is stateful infrastructure. It is much harder to maintain than a stateless API. Before adding Kafka to your stack, ask if a simple Redis Pub/Sub or even a Database-backed queue (like Postgres
SKIP LOCKED) is enough for your current scale.
🔗 Related Links
Section titled “🔗 Related Links”- [[Event-Driven-Architecture]]
- [[Architecture-Resilience-Patterns]]
- [[Infrastructure-Cloud-Providers]]