Message Queues — RabbitMQ, Kafka, and SQS Deep Dive

Q: When should I choose Kafka over RabbitMQ?

Choose Kafka when you need event streaming, replay, or very high throughput (100K+ messages/sec). Kafka treats the log as the primary abstraction, making it ideal for event sourcing, log aggregation, and real-time analytics. Choose RabbitMQ when you need complex routing (exchanges, routing keys), RPC patterns, or priority queues. RabbitMQ is more flexible for traditional messaging patterns; Kafka is better for data pipelines.

Q: How do I prevent duplicate message processing?

Use idempotent consumers: design your processing logic so that processing the same message twice produces the same result. Store a processed-message ID in a database with a unique constraint. Alternatively, use exactly-once semantics where supported (Kafka transactions, SQS FIFO with deduplication). But remember: exactly-once has performance overhead and complexity. Idempotency is simpler and more reliable.

Overview

Message queues are the nervous system of distributed systems. They decouple producers from consumers, absorb traffic spikes, and enable async workflows that would be impossible with synchronous HTTP alone. But choosing the wrong queue or using it incorrectly turns a resilience tool into a source of data loss, ordering bugs, and operational nightmares. This guide compares RabbitMQ, Kafka, and AWS SQS — the three most common choices — and explains when to use each, how to configure them, and what pitfalls to avoid.

When to Use

Use this guide when:

You need to choose between RabbitMQ, Kafka, and SQS for a new architecture
You are experiencing message loss, duplicate processing, or ordering issues in your queue system
Your team is designing event-driven microservices and needs to understand queue patterns

Solution

Comparison Matrix

Concern	RabbitMQ	Kafka	AWS SQS
Throughput	10K–50K msg/s	1M+ msg/s	Unlimited (auto-scales)
Ordering	Per-queue (FIFO via plugin)	Per-partition	FIFO queues available
Persistence	Disk + memory	Durable by design	14 days max retention
Delivery Model	Push to consumer	Pull by consumer	Pull by consumer
Replay	No (unless DLQ)	Yes (any offset)	No
Routing	Exchange + routing keys	Topic partitions	No native routing
Operational Model	Self-hosted / managed	Self-hosted / managed	Fully managed
Best For	Task queues, RPC	Event streaming, logs	Simple decoupling

RabbitMQ Task Queue Example

# Producer
import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)

message = "Process order #12345"
channel.basic_publish(
    exchange='',
    routing_key='task_queue',
    body=message,
    properties=pika.BasicProperties(delivery_mode=2)  # persistent
)
connection.close()

# Consumer
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
channel.basic_qos(prefetch_count=1)  # fair dispatch

def callback(ch, method, properties, body):
    print(f"Received {body.decode()}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

channel.basic_consume(queue='task_queue', on_message_callback=callback)
channel.start_consuming()

Kafka Producer/Consumer

from kafka import KafkaProducer, KafkaConsumer
import json

# Producer
producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',  # wait for all replicas
    retries=3
)
producer.send('orders', {'order_id': '12345', 'amount': 99.99})
producer.flush()

# Consumer
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['kafka:9092'],
    group_id='order-processors',
    auto_offset_reset='earliest',
    enable_auto_commit=False  # manual commit for at-least-once
)

for message in consumer:
    process_order(message.value)
    consumer.commit()  # commit after successful processing

SQS Producer/Consumer (Boto3)

import boto3

sqs = boto3.client('sqs')
queue_url = 'https://sqs.us-east-1.amazonaws.com/123456789/my-queue'

# Send
sqs.send_message(
    QueueUrl=queue_url,
    MessageBody='Process order #12345',
    MessageAttributes={
        'OrderType': {'StringValue': 'premium', 'DataType': 'String'}
    }
)

# Receive
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=10,
    WaitTimeSeconds=20,  # long polling
    VisibilityTimeout=300
)

for msg in response.get('Messages', []):
    process_message(msg['Body'])
    sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=msg['ReceiptHandle'])

Dead Letter Queue Pattern

# Pseudo-code for DLQ routing
MAX_RETRIES = 3

def process_with_dlq(message):
    try:
        process(message)
        ack(message)
    except Exception as e:
        retry_count = get_retry_count(message)
        if retry_count >= MAX_RETRIES:
            send_to_dlq(message, reason=str(e))
        else:
            nack_with_delay(message, delay=2 ** retry_count)

Explanation

The fundamental difference between these systems is push vs pull and persistence model. RabbitMQ pushes messages to consumers and keeps them in memory (with disk backup); this gives low latency but limits throughput. Kafka consumers pull from durable logs; this enables replay and massive throughput but introduces pull latency. SQS is a managed pull system with no replay but infinite scalability.

Ordering is a common source of confusion. RabbitMQ guarantees order within a single queue only if you have one consumer. Kafka guarantees order within a partition, but if you scale to multiple partitions, order is only preserved per partition key. SQS standard queues offer no ordering; FIFO queues do but at lower throughput.

The dead letter queue is not a luxury — it is a requirement. Without it, poison messages (messages that always fail processing) will block your queue indefinitely. Every production queue system should have a DLQ with alerting.

Variants

Pattern	Queue Type	Use Case
Work Queue	RabbitMQ, SQS	Distribute tasks among workers
Pub/Sub	RabbitMQ (fanout), Kafka	Broadcast events to multiple consumers
Event Sourcing	Kafka	Immutable event log as system of record
Request/Reply	RabbitMQ (RPC over queues)	Async request-response
Scheduled Jobs	RabbitMQ (delayed plugin), SQS	Delayed or recurring task execution
Priority Queue	RabbitMQ	Process high-priority messages first

Best Practices

Always use durable queues and persistent messages in production; memory-only queues lose data on restart
Set visibility timeouts longer than your max processing time; SQS will re-deliver mid-processing messages
Implement idempotent consumers; at-least-once delivery means you will process duplicates
Monitor consumer lag (Kafka) or approximate age of oldest message (SQS); lag is your canary
Use schema validation (Avro, JSON Schema) before publishing; invalid messages are expensive to debug in production

Common Mistakes

Treating queues as databases; queues are for transient messaging, not long-term storage
Not handling message ordering explicitly; assuming global order when only per-partition/per-queue order exists
Using auto-commit in Kafka without understanding the at-most-once vs at-least-once trade-off
Not configuring DLQs; one poison message can block an entire queue
Ignoring backpressure; if consumers are slower than producers, your queue grows until it crashes

Frequently Asked Questions

When should I choose Kafka over RabbitMQ?

Choose Kafka when you need event streaming, replay, or very high throughput (100K+ messages/sec). Kafka treats the log as the primary abstraction, making it ideal for event sourcing, log aggregation, and real-time analytics. Choose RabbitMQ when you need complex routing (exchanges, routing keys), RPC patterns, or priority queues. RabbitMQ is more flexible for traditional messaging patterns; Kafka is better for data pipelines.

How do I prevent duplicate message processing?

Use idempotent consumers: design your processing logic so that processing the same message twice produces the same result. Store a processed-message ID in a database with a unique constraint. Alternatively, use exactly-once semantics where supported (Kafka transactions, SQS FIFO with deduplication). But remember: exactly-once has performance overhead and complexity. Idempotency is simpler and more reliable.

What is consumer lag and how do I fix it?

Consumer lag is the difference between the newest message in the queue and the oldest unprocessed message. High lag means your consumers are slower than your producers. Fix it by: (1) scaling out consumers (add more instances), (2) optimizing processing time (profile your consumer code), (3) reducing message size, or (4) splitting into more partitions (Kafka) or queues (RabbitMQ). If lag is chronic, your architecture may need redesign — queues absorb spikes, not sustained overload.

Message Queues — RabbitMQ, Kafka, and SQS Deep Dive

Overview

When to Use

Solution

Comparison Matrix

RabbitMQ Task Queue Example

Kafka Producer/Consumer

SQS Producer/Consumer (Boto3)

Dead Letter Queue Pattern

Explanation

Variants

Best Practices

Common Mistakes

Frequently Asked Questions

When should I choose Kafka over RabbitMQ?

How do I prevent duplicate message processing?

What is consumer lag and how do I fix it?

Message Processing Idempotency

Dead Letter Queues

Event-Driven Microservices

Event Streaming with Apache Kafka and Node.js

Task Queues and RPC with RabbitMQ and AMQP

Overview

When to Use

Solution

Comparison Matrix

RabbitMQ Task Queue Example

Kafka Producer/Consumer

SQS Producer/Consumer (Boto3)

Dead Letter Queue Pattern

Explanation

Variants

Best Practices

Common Mistakes

Frequently Asked Questions

When should I choose Kafka over RabbitMQ?

How do I prevent duplicate message processing?

What is consumer lag and how do I fix it?

Related Resources

Message Processing Idempotency

Dead Letter Queues

Event-Driven Microservices

Event Streaming with Apache Kafka and Node.js

Task Queues and RPC with RabbitMQ and AMQP