📨 Messaging Systems - Kafka, RabbitMQ, SQS & Pub/Sub

The Senior Mindset: Don’t ask “which is better?” Ask “What are the delivery guarantees I need?” and “How do I want to handle state?” Choosing a broker is a trade-off between throughput, latency, and the complexity of the consumer logic.

🚦 Broker Architectures: Two Main Philosophies

1. Message Queues (Smart Broker, Dumb Consumer)

The broker tracks which messages are consumed and deletes them once acknowledged.

RabbitMQ: High feature set (routing, priorities). Best for complex workflows.
AWS SQS: Fully managed, infinitely scalable, but simpler routing logic.
Behavior: Usually Point-to-Point. One message is processed by exactly one consumer.

2. Log-based Streaming (Dumb Broker, Smart Consumer)

The broker is a distributed append-only log. It doesn’t track consumption; consumers track their own “offset” (position in the log).

Apache Kafka / Amazon MSK: Built for massive throughput and data retention.
Google Cloud Pub/Sub: A managed hybrid that scales globally.
Behavior: Fan-out. The same stream of events can be read by multiple different service groups simultaneously.

🔍 The Big Three Comparison

Feature	RabbitMQ	Apache Kafka	AWS SQS
Model	Push (Broker pushes to consumer)	Pull (Consumer requests data)	Pull (Short/Long polling)
Persistence	Deleted after ACK	Persistent (Retention policy)	Deleted after ACK
Ordering	Guaranteed within a queue	Guaranteed within a Partition	Best-effort (or strict with FIFO)
Scaling	Vertical / Cluster-based	Horizontal (Adding partitions)	Native / Serverless
Best For	Task Queues, RPC, Complex Routing	Log Aggregation, Event Sourcing, Big Data	Decoupling Microservices (Cloud-native)

🛠️ Delivery Guarantees (The Senior Perspective)

You must choose which “lie” you can live with:

At-Most-Once: Messages may be lost, but never duplicated. (Fastest, lowest overhead).
At-Least-Once: Messages are never lost, but may be delivered more than once. (The industry standard). Requires consumers to be idempotent.
Exactly-Once: Theoretically impossible across a network, but “effectively” achieved by Kafka through transactional writes and idempotent producers. (Highest overhead).

⚖️ Strategic Decision Framework

When to choose RabbitMQ?

You need complex routing (e.g., using Header or Topic exchanges).
You need built-in Message Priority.
You are working with legacy protocols like AMQP or MQTT.

When to choose Kafka?

You need to replay data (e.g., rebuilding a database from an event log).
You have massive throughput requirements (millions of events per second).
You are implementing Event Sourcing or CQRS.

When to choose SQS?

You are in the AWS ecosystem and want Zero Maintenance.
You need to handle huge spikes in volume without managing a cluster.
Your architecture is mostly “Fire and Forget” task processing.

🚩 Common Pitfalls for Seniors

The “Poison Pill” Message

A message that causes a consumer to crash every time it’s read.

Solution: Use Dead Letter Queues (DLQ). After X failed retries, the broker moves the message to a separate queue for manual debugging.

Backpressure & Slow Consumers

If the producer is faster than the consumer, the queue grows.

RabbitMQ: Can run out of memory and crash.
Kafka: Disk space fills up, but doesn’t affect broker performance as much.
Strategy: Monitor Consumer Lag religiously. If lag increases, scale your consumer instances.

💡 Seniority Note: A message broker is stateful infrastructure. It is much harder to maintain than a stateless API. Before adding Kafka to your stack, ask if a simple Redis Pub/Sub or even a Database-backed queue (like Postgres SKIP LOCKED) is enough for your current scale.

[[Event-Driven-Architecture]]
[[Architecture-Resilience-Patterns]]
[[Infrastructure-Cloud-Providers]]