SQL Performance Tuning — Indexes, Queries, and Explain Plans
A practical guide to optimizing SQL queries: indexing strategies, query rewriting, EXPLAIN plan analysis, and common anti-patterns to avoid.
SQL Performance Tuning
Introduction
Slow queries are one of the most common causes of application performance problems. This guide covers practical techniques to identify, diagnose, and fix SQL performance issues across PostgreSQL, MySQL, and SQL Server.
Finding Slow Queries
PostgreSQL
-- pg_stat_statements extension
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Active slow queries
SELECT pid, query, now() - query_start AS duration
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '1 second'
ORDER BY duration DESC;
MySQL
-- Slow query log (enable in my.cnf)
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
-- Performance Schema
SELECT sql_text, COUNT_STAR, AVG_TIMER_WAIT/1000000000 AS avg_sec
FROM performance_schema.events_statements_summary_by_digest
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
Understanding EXPLAIN Plans
The EXPLAIN plan reveals how the database executes your query.
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
Key Plan Operations
| Operation | Meaning | Performance Impact |
|---|---|---|
| Seq Scan | Table scan (reads every row) | Slow on large tables; needs index |
| Index Scan | Reads index, then fetches matching rows | Fast for selective queries |
| Index Only Scan | Reads only the index, no table access | Fastest; requires covering index |
| Bitmap Heap Scan | Builds bitmap from index, then fetches rows in batches | Good for moderate selectivity |
| Nested Loop | For each row in outer table, scan inner table | Fine for small outer sets |
| Hash Join | Builds hash table of inner table, probes with outer | Good for large joins |
| Merge Join | Sorts both inputs, merges them | Good for pre-sorted data |
Indexing Strategies
B-Tree Indexes (Default)
Best for equality and range queries:
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Good for: WHERE user_id = ? AND status = ?
-- Bad for: WHERE status = ? (user_id not leading)
Index Column Order
Place columns in this priority:
- Equality filters (
=) - Range filters (
>,<,BETWEEN,LIKE 'prefix%') - Columns used in ORDER BY
- Columns used in SELECT (for covering indexes)
Covering Indexes
An index that contains all columns needed for the query, avoiding table lookups:
CREATE INDEX idx_orders_covering
ON orders(user_id, status, created_at, total)
INCLUDE (id);
-- Query can be satisfied entirely from the index
SELECT id, created_at, total
FROM orders
WHERE user_id = 123 AND status = 'shipped';
Partial Indexes
Index only a subset of rows, reducing size and maintenance cost:
-- Only index active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE is_active = true;
Query Rewriting Techniques
1. Avoid SELECT *
-- Bad
SELECT * FROM orders WHERE user_id = 123;
-- Good: fetch only needed columns
SELECT id, status, total FROM orders WHERE user_id = 123;
2. Use EXISTS Instead of IN for Subqueries
-- Bad: materializes full subquery result
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- Good: stops at first match
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.amount > 1000
);
3. Avoid Functions on Indexed Columns
-- Bad: function prevents index usage
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';
-- Good: range query uses index
SELECT * FROM orders
WHERE created_at >= '2024-01-15'
AND created_at < '2024-01-16';
4. Prefer JOINs Over Correlated Subqueries
-- Bad: correlated subquery runs once per row
SELECT name,
(SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count
FROM users;
-- Good: JOIN is more efficient
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
5. Batch Updates Instead of One-by-One
-- Bad: N+1 updates
UPDATE orders SET status = 'shipped' WHERE id = 1;
UPDATE orders SET status = 'shipped' WHERE id = 2;
...
-- Good: single UPDATE with WHERE IN or JOIN
UPDATE orders
SET status = 'shipped'
WHERE id IN (1, 2, 3, ...);
Common Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| N+1 Queries | One query per row | Use JOIN or WHERE IN |
| Missing LIMIT | Fetching millions of rows | Add LIMIT and pagination |
| Implicit conversions | Function on column prevents index use | Cast the constant, not the column |
SELECT DISTINCT to fix duplicates | Hides a join problem | Fix the join or schema |
| Counting all rows | SELECT COUNT(*) on huge tables | Use approximate counts or triggers |
| No connection pooling | Connection overhead dominates | Use pgBouncer, HikariCP, etc. |
Best Practices
- Index foreign keys automatically — joins depend on them
- Monitor slow query logs weekly and address the top offenders
- Analyze tables regularly —
ANALYZEupdates statistics for the query planner - Avoid over-indexing — each index slows writes and consumes space
- Use appropriate data types —
INTEGERis faster thanVARCHARfor IDs - Partition large tables by date or range when they exceed 10M rows
Frequently Asked Questions
Q: How many indexes is too many?
A: There is no fixed number, but each index slows INSERT/UPDATE/DELETE. Audit indexes quarterly and drop unused ones. PostgreSQL’s pg_stat_user_indexes shows index usage.
Q: Should I index every column used in WHERE? A: No. Composite indexes often serve multiple queries. Also, the query planner may choose a sequential scan if the table is small or the query returns most rows.
Q: Why does my query use a sequential scan when I have an index? A: The planner estimates that reading the whole table is faster than reading the index plus random table lookups. This is often correct for queries returning >5-10% of rows.