Skip to content
SP StackPractices
intermediate

NoSQL Database Selection — MongoDB, DynamoDB, Cassandra

A practical guide to choosing the right NoSQL database. Compare document, key-value, wide-column, and graph stores with selection criteria and migration tips.

Topics: databases

NoSQL Database Selection

Introduction

NoSQL databases trade the strict consistency and relational model of SQL for flexibility, horizontal scalability, and specialized access patterns. Choosing the right one means matching your data shape, query patterns, and consistency requirements to the right store.

The Four NoSQL Families

FamilyStructureBest ForExamples
DocumentJSON-like documents with nested structuresContent management, user profiles, catalogsMongoDB, Firestore, Couchbase
Key-ValueSimple key → value lookupsSessions, caching, feature flagsRedis, DynamoDB, Riak
Wide-ColumnColumn families with rows as sparse mapsTime-series, high-write telemetry, messagingCassandra, HBase, ScyllaDB
GraphNodes and relationships with propertiesSocial networks, recommendation engines, fraud detectionNeo4j, Amazon Neptune

Document Stores: MongoDB

When to Choose

  • Rich, nested data structures with arrays and subdocuments
  • Flexible schema that evolves over time
  • Need for secondary indexes and aggregation pipelines
  • Queries that look like JavaScript object matching

Example

// A product document with nested reviews and variants
db.products.insertOne({
  sku: "SHOE-001",
  name: "Trail Runner",
  price: 89.99,
  attributes: { color: "red", size: 42 },
  reviews: [
    { user_id: 42, rating: 5, comment: "Great grip!" }
  ],
  tags: ["running", "trail", "waterproof"]
})

// Flexible query with nested matching
db.products.find({ "reviews.rating": { $gte: 4 }, tags: "trail" })

Trade-offs

ProCon
Flexible schemaSchema validation must be configured explicitly
Rich query languageJoins are expensive and limited
Secondary indexesIndexes consume RAM and slow writes
Horizontal scaling (sharding)Sharding adds operational complexity

Key-Value Stores: DynamoDB and Redis

DynamoDB (AWS)

Best for: predictable latency at any scale, simple read/write patterns, serverless architectures.

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')

# Get by partition key (single-digit ms latency)
table.get_item(Key={'user_id': 'user-123'})

# Query by partition + sort key
table.query(
    KeyConditionExpression=Key('user_id').eq('user-123') &
                           Key('timestamp').gt('2024-01-01')
)

Critical design constraint: Access patterns must be known upfront. DynamoDB is optimized for known query paths, not ad-hoc exploration.

Redis

Best for: caching, real-time leaderboards, rate limiting, session stores.

# Cache a computed value for 5 minutes
SET user:123:profile '{"name":"Alice"}' EX 300

# Atomic increment for rate limiting
INCR rate_limit:ip:192.168.1.1
EXPIRE rate_limit:ip:192.168.1.1 60

Critical constraint: All data must fit in RAM. Redis is not a primary data store for large datasets.

Wide-Column Stores: Cassandra

When to Choose

  • Write-heavy workloads (time-series, IoT, messaging)
  • Need linear scalability across commodity hardware
  • Tolerance for eventual consistency and CQL (Cassandra Query Language)

Data Model

-- Time-series sensor data
CREATE TABLE sensor_readings (
    sensor_id UUID,
    day TEXT,        -- partition key component
    timestamp TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY ((sensor_id, day), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

-- Query: last 100 readings for a sensor today
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND day = '2024-06-12'
LIMIT 100;

Cassandra is query-first: tables are designed around specific read queries, not normalized entities.

Trade-offs

ProCon
Massive write throughputNo JOINs, no subqueries, no aggregations across partitions
Linear scalabilityOperational complexity (gossip, repairs, compaction)
Multi-datacenter replicationEventual consistency by default
Tunable consistencyCQL is limited compared to SQL

Decision Matrix

RequirementBest ChoiceWhy
Flexible, nested JSON documentsMongoDBNative document model, rich query language
Predictable low-latency key lookups at scaleDynamoDBSingle-digit ms, auto-scaling, serverless
High-throughput time-series writesCassandraLog-structured storage, excellent write performance
Caching and ephemeral dataRedisIn-memory speed, rich data structures
Complex relationship traversalNeo4jOptimized graph traversals
Multi-item ACID transactionsPostgreSQLNoSQL stores typically lack cross-document transactions

Migration Tips from SQL

SQL HabitNoSQL Adaptation
Normalized tablesEmbed related data when accessed together; reference when accessed separately
JOINs everywhereDesign tables/collections around query patterns, not entities
Auto-increment IDsUse UUIDs or composite keys (user_id + timestamp)
Ad-hoc analyticsUse change data capture (CDC) to stream to a data warehouse
Single source of truthAccept that different stores may have different views of truth (CQRS)

Best Practices

  • Model for your reads, not your writes — NoSQL performance is access-pattern dependent
  • Avoid hot partitions — distribute writes evenly across partition keys (use random suffixes or time bucketing)
  • Set TTLs where appropriate — expire old data automatically instead of running cleanup jobs
  • Test with production-like data volumes — behavior at 1K rows is not predictive of behavior at 1B rows
  • Have a migration path — data gravity is real; choose carefully because migrating later is expensive

Common Mistakes

  • Using MongoDB as a cache (Redis is cheaper and faster)
  • Using DynamoDB for ad-hoc analytics (Athena/BigQuery are better suited)
  • Using Cassandra for OLTP with complex queries (Cassandra excels at simple, partition-scoped queries)
  • Treating NoSQL as “scales better SQL” — the data model is fundamentally different
  • Ignoring operational complexity — Cassandra and sharded MongoDB require dedicated operational expertise

Frequently Asked Questions

Should I migrate from PostgreSQL to MongoDB for flexibility?

Not for flexibility alone. PostgreSQL has JSONB, which gives you document flexibility while keeping ACID transactions. Migrate to MongoDB when you need horizontal sharding or a document-native query language.

Can I use multiple NoSQL databases in one application?

Yes, and it is common. Use Redis for cache/sessions, DynamoDB for user profiles, and Elasticsearch for search. This is polyglot persistence. The trade-off is operational complexity.

How do I handle transactions across NoSQL databases?

Most NoSQL stores do not support cross-document or cross-table ACID transactions. Use sagas, outbox patterns, or idempotent operations with at-least-once delivery to achieve eventual consistency.