NoSQL Database Selection — MongoDB, DynamoDB, Cassandra
A practical guide to choosing the right NoSQL database. Compare document, key-value, wide-column, and graph stores with selection criteria and migration tips.
NoSQL Database Selection
Introduction
NoSQL databases trade the strict consistency and relational model of SQL for flexibility, horizontal scalability, and specialized access patterns. Choosing the right one means matching your data shape, query patterns, and consistency requirements to the right store.
The Four NoSQL Families
| Family | Structure | Best For | Examples |
|---|---|---|---|
| Document | JSON-like documents with nested structures | Content management, user profiles, catalogs | MongoDB, Firestore, Couchbase |
| Key-Value | Simple key → value lookups | Sessions, caching, feature flags | Redis, DynamoDB, Riak |
| Wide-Column | Column families with rows as sparse maps | Time-series, high-write telemetry, messaging | Cassandra, HBase, ScyllaDB |
| Graph | Nodes and relationships with properties | Social networks, recommendation engines, fraud detection | Neo4j, Amazon Neptune |
Document Stores: MongoDB
When to Choose
- Rich, nested data structures with arrays and subdocuments
- Flexible schema that evolves over time
- Need for secondary indexes and aggregation pipelines
- Queries that look like JavaScript object matching
Example
// A product document with nested reviews and variants
db.products.insertOne({
sku: "SHOE-001",
name: "Trail Runner",
price: 89.99,
attributes: { color: "red", size: 42 },
reviews: [
{ user_id: 42, rating: 5, comment: "Great grip!" }
],
tags: ["running", "trail", "waterproof"]
})
// Flexible query with nested matching
db.products.find({ "reviews.rating": { $gte: 4 }, tags: "trail" })
Trade-offs
| Pro | Con |
|---|---|
| Flexible schema | Schema validation must be configured explicitly |
| Rich query language | Joins are expensive and limited |
| Secondary indexes | Indexes consume RAM and slow writes |
| Horizontal scaling (sharding) | Sharding adds operational complexity |
Key-Value Stores: DynamoDB and Redis
DynamoDB (AWS)
Best for: predictable latency at any scale, simple read/write patterns, serverless architectures.
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
# Get by partition key (single-digit ms latency)
table.get_item(Key={'user_id': 'user-123'})
# Query by partition + sort key
table.query(
KeyConditionExpression=Key('user_id').eq('user-123') &
Key('timestamp').gt('2024-01-01')
)
Critical design constraint: Access patterns must be known upfront. DynamoDB is optimized for known query paths, not ad-hoc exploration.
Redis
Best for: caching, real-time leaderboards, rate limiting, session stores.
# Cache a computed value for 5 minutes
SET user:123:profile '{"name":"Alice"}' EX 300
# Atomic increment for rate limiting
INCR rate_limit:ip:192.168.1.1
EXPIRE rate_limit:ip:192.168.1.1 60
Critical constraint: All data must fit in RAM. Redis is not a primary data store for large datasets.
Wide-Column Stores: Cassandra
When to Choose
- Write-heavy workloads (time-series, IoT, messaging)
- Need linear scalability across commodity hardware
- Tolerance for eventual consistency and CQL (Cassandra Query Language)
Data Model
-- Time-series sensor data
CREATE TABLE sensor_readings (
sensor_id UUID,
day TEXT, -- partition key component
timestamp TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY ((sensor_id, day), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
-- Query: last 100 readings for a sensor today
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND day = '2024-06-12'
LIMIT 100;
Cassandra is query-first: tables are designed around specific read queries, not normalized entities.
Trade-offs
| Pro | Con |
|---|---|
| Massive write throughput | No JOINs, no subqueries, no aggregations across partitions |
| Linear scalability | Operational complexity (gossip, repairs, compaction) |
| Multi-datacenter replication | Eventual consistency by default |
| Tunable consistency | CQL is limited compared to SQL |
Decision Matrix
| Requirement | Best Choice | Why |
|---|---|---|
| Flexible, nested JSON documents | MongoDB | Native document model, rich query language |
| Predictable low-latency key lookups at scale | DynamoDB | Single-digit ms, auto-scaling, serverless |
| High-throughput time-series writes | Cassandra | Log-structured storage, excellent write performance |
| Caching and ephemeral data | Redis | In-memory speed, rich data structures |
| Complex relationship traversal | Neo4j | Optimized graph traversals |
| Multi-item ACID transactions | PostgreSQL | NoSQL stores typically lack cross-document transactions |
Migration Tips from SQL
| SQL Habit | NoSQL Adaptation |
|---|---|
| Normalized tables | Embed related data when accessed together; reference when accessed separately |
| JOINs everywhere | Design tables/collections around query patterns, not entities |
| Auto-increment IDs | Use UUIDs or composite keys (user_id + timestamp) |
| Ad-hoc analytics | Use change data capture (CDC) to stream to a data warehouse |
| Single source of truth | Accept that different stores may have different views of truth (CQRS) |
Best Practices
- Model for your reads, not your writes — NoSQL performance is access-pattern dependent
- Avoid hot partitions — distribute writes evenly across partition keys (use random suffixes or time bucketing)
- Set TTLs where appropriate — expire old data automatically instead of running cleanup jobs
- Test with production-like data volumes — behavior at 1K rows is not predictive of behavior at 1B rows
- Have a migration path — data gravity is real; choose carefully because migrating later is expensive
Common Mistakes
- Using MongoDB as a cache (Redis is cheaper and faster)
- Using DynamoDB for ad-hoc analytics (Athena/BigQuery are better suited)
- Using Cassandra for OLTP with complex queries (Cassandra excels at simple, partition-scoped queries)
- Treating NoSQL as “scales better SQL” — the data model is fundamentally different
- Ignoring operational complexity — Cassandra and sharded MongoDB require dedicated operational expertise
Frequently Asked Questions
Should I migrate from PostgreSQL to MongoDB for flexibility?
Not for flexibility alone. PostgreSQL has JSONB, which gives you document flexibility while keeping ACID transactions. Migrate to MongoDB when you need horizontal sharding or a document-native query language.
Can I use multiple NoSQL databases in one application?
Yes, and it is common. Use Redis for cache/sessions, DynamoDB for user profiles, and Elasticsearch for search. This is polyglot persistence. The trade-off is operational complexity.
How do I handle transactions across NoSQL databases?
Most NoSQL stores do not support cross-document or cross-table ACID transactions. Use sagas, outbox patterns, or idempotent operations with at-least-once delivery to achieve eventual consistency.