Seed Database
How to seed databases with realistic data for development, testing, and staging environments using seed scripts, migrations, and factories across PostgreSQL, MongoDB, and Prisma.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
A seeded database is the fastest way to onboard new developers, reproduce bugs, and run integration tests that depend on realistic data distributions. Without seeding, every environment starts empty, forcing developers to manually create accounts, orders, and relationships before they can test any feature. This recipe shows safe, repeatable seeding strategies that do not pollute production.
When to Use
- Onboarding new developers who need a working local database in minutes
- Integration and E2E tests that require users, products, or transactions to exist
- Staging environments that should mirror production data distributions
- Load testing with realistic data volumes and relationships
- Demonstrating features to stakeholders without manually creating demo data
When NOT to Use
- Seeding production databases — use controlled migrations and import scripts instead
- Environments with PII or sensitive data — never seed real customer data from exports
- Systems where the seed script takes longer than the CI timeout (typically 10 minutes)
- Microservices with event-sourced architectures where seeding must emit domain events
Step-by-Step Implementation
PostgreSQL (SQL + pg)
# Directory: db/seeds/
# 01_users.sql
INSERT INTO users (email, name, role, created_at)
VALUES
('admin@example.com', 'Admin User', 'admin', NOW()),
('user1@example.com', 'Alice', 'user', NOW()),
('user2@example.com', 'Bob', 'user', NOW())
ON CONFLICT (email) DO NOTHING;
# 02_products.sql
INSERT INTO products (sku, name, price, stock)
VALUES
('SKU-001', 'Wireless Mouse', 29.99, 150),
('SKU-002', 'Mechanical Keyboard', 89.99, 75),
('SKU-003', 'USB-C Hub', 49.99, 200)
ON CONFLICT (sku) DO NOTHING;
# 03_orders.sql (with references)
INSERT INTO orders (user_id, total, status, created_at)
SELECT id, 119.98, 'completed', NOW() - INTERVAL '2 days'
FROM users WHERE email = 'user1@example.com';
// seed.js — Node.js runner using pg
import { Client } from 'pg';
import { readdirSync, readFileSync } from 'fs';
import { join } from 'path';
async function seed() {
const client = new Client({ connectionString: process.env.DATABASE_URL });
await client.connect();
const seedsDir = './db/seeds';
const files = readdirSync(seedsDir)
.filter(f => f.endsWith('.sql'))
.sort();
for (const file of files) {
const sql = readFileSync(join(seedsDir, file), 'utf8');
console.log(`Running: ${file}`);
await client.query(sql);
}
await client.end();
console.log('Seeding complete');
}
seed().catch(console.error);
# Run with environment guard
if [ "$NODE_ENV" != "production" ]; then
node db/seed.js
else
echo "Refusing to seed production database"
exit 1
fi
MongoDB (mongosh / Node.js)
// seeds/users.js
const { MongoClient } = require('mongodb');
async function seed() {
const client = new MongoClient(process.env.MONGODB_URI);
await client.connect();
const db = client.db('myapp');
// Idempotent: drop and recreate, or use ordered inserts with upsert
const users = db.collection('users');
await users.deleteMany({ email: { $regex: '@example\\.com$' } });
await users.insertMany([
{ email: 'admin@example.com', name: 'Admin', role: 'admin', createdAt: new Date() },
{ email: 'user1@example.com', name: 'Alice', role: 'user', createdAt: new Date() }
]);
// Products with embedded reviews
const products = db.collection('products');
await products.deleteMany({ sku: /^SKU-/ });
await products.insertMany([
{
sku: 'SKU-001',
name: 'Wireless Mouse',
price: 29.99,
reviews: [
{ userId: 'user1@example.com', rating: 5, comment: 'Great mouse' }
]
}
]);
// Create indexes that the application expects
await users.createIndex({ email: 1 }, { unique: true });
await products.createIndex({ sku: 1 }, { unique: true });
await client.close();
}
seed().catch(console.error);
Prisma (TypeScript)
// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function main() {
// Upsert ensures idempotency
const admin = await prisma.user.upsert({
where: { email: 'admin@example.com' },
update: {},
create: {
email: 'admin@example.com',
name: 'Admin User',
role: 'ADMIN'
}
});
const categories = await prisma.$transaction(
['Electronics', 'Books', 'Home'].map(name =>
prisma.category.upsert({
where: { name },
update: {},
create: { name }
})
)
);
const product = await prisma.product.create({
data: {
sku: 'SKU-001',
name: 'Wireless Mouse',
price: 29.99,
categoryId: categories[0].id,
stock: { create: { quantity: 150 } }
}
});
console.log(`Created ${admin.name}, ${product.name}`);
}
main()
.catch(e => { console.error(e); process.exit(1); })
.finally(() => prisma.$disconnect());
// package.json
{
"scripts": {
"db:seed": "ts-node prisma/seed.ts"
},
"prisma": {
"seed": "ts-node prisma/seed.ts"
}
}
# Prisma native seed command
npx prisma db seed
# Reset database and re-seed (development only)
npx prisma migrate reset --force
Python (SQLAlchemy + Alembic)
# scripts/seed_database.py
import os
import sys
from sqlalchemy.orm import Session
from app.models import User, Product, Order
from app.database import SessionLocal
from app.core.security import get_password_hash
def seed():
if os.getenv("ENV") == "production":
raise RuntimeError("Cannot seed production database")
db: Session = SessionLocal()
try:
# Idempotent seeding
if db.query(User).filter(User.email == "admin@example.com").first():
print("Database already seeded")
return
admin = User(
email="admin@example.com",
name="Admin User",
role="admin",
hashed_password=get_password_hash("changeme")
)
db.add(admin)
products = [
Product(sku="SKU-001", name="Wireless Mouse", price=29.99, stock=150),
Product(sku="SKU-002", name="Keyboard", price=89.99, stock=75),
]
db.add_all(products)
db.flush()
order = Order(
user_id=admin.id,
total=119.98,
status="completed"
)
db.add(order)
db.commit()
print("Seeding complete")
finally:
db.close()
if __name__ == "__main__":
seed()
Best Practices
- Make seeds idempotent. Use
ON CONFLICT DO NOTHING,upsert, or existence checks so running the seed script twice does not create duplicates or crash. - Never seed production. Gate seed scripts with an environment check. Production data should enter through controlled migrations, admin tools, or ETL pipelines.
- Keep seed data realistic but small. 10-50 representative rows per table is enough for development. Use factories (not seed scripts) for load testing that needs millions of rows.
- Version seed files like migrations. Name them
01_users.sql,02_products.sqlso they run in deterministic order and can be tracked in git. - Seed in CI for integration tests. A
db:seedstep before the test suite ensures every CI run starts from a known state.
Common Mistakes
- Hardcoding auto-increment IDs. Inserting
id = 1into an auto-increment column can cause conflicts when the application later creates records. Let the database assign IDs or use UUIDs. - Seeding without foreign key order. Inserting an order before the user exists causes a foreign key violation. Order seed files by dependency graph.
- Forgetting to create indexes. Seeding bypasses the application, so indexes that the app relies on may not exist if the seed script skips
CREATE INDEXstatements. - Using production dumps as seeds. A production SQL dump may contain PII, GDPR-sensitive data, or internal IDs that should not be in git or developer machines.
- Non-deterministic seeds. Randomly generated seed data makes reproducing bugs across environments impossible. Use a fixed seed for random generators in seed scripts.