Skip to content
SP StackPractices
intermediate By StackPractices

Database Indexing Strategies — From B-Trees to BRIN

A practical guide to database indexes: B-Trees, Hash, GIN, GiST, BRIN, and partial indexes. Learn when to use each and how to avoid common indexing mistakes.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Indexes are the primary mechanism for speeding up database queries. They are data structures that allow the database to locate rows without scanning every record. But indexes are not free — they consume storage, slow down writes, and can hurt performance if used incorrectly. Understanding the different index types and when to apply them is one of the highest-leverage database skills.

When to Use

  • Queries filtering on specific columns (WHERE, JOIN)
  • Sorting large result sets (ORDER BY)
  • Enforcing uniqueness constraints
  • Accelerating full-text search and geospatial queries

B-Tree Indexes

The default index type in most relational databases. B-Trees maintain sorted data that allows O(log n) lookups, range scans, and ordered traversal.

CREATE INDEX idx_users_email ON users(email);

-- Uses index: exact match
SELECT * FROM users WHERE email = 'alice@example.com';

-- Uses index: range scan
SELECT * FROM users WHERE email BETWEEN 'a' AND 'c';

-- Uses index: ORDER BY
SELECT * FROM users ORDER BY email LIMIT 10;

Composite B-Tree Indexes

Column order matters. A composite index on (a, b, c) supports queries on a, (a, b), and (a, b, c).

CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at);

-- Uses index: matches leading column
SELECT * FROM orders WHERE customer_id = 42;

-- Uses index: matches leading columns
SELECT * FROM orders WHERE customer_id = 42 AND created_at > '2024-01-01';

-- Does NOT use index: skips leading column
SELECT * FROM orders WHERE created_at > '2024-01-01';

Hash Indexes

Optimized for equality comparisons only. Smaller and faster than B-Trees for exact matches, but cannot support range queries or sorting.

CREATE INDEX idx_sessions_token ON sessions USING HASH(token);

-- Fast: equality
SELECT * FROM sessions WHERE token = 'abc123';

-- Cannot use hash index: range
SELECT * FROM sessions WHERE token > 'abc';

GIN Indexes (Generalized Inverted Index)

Designed for multi-value columns and full-text search. Efficient for arrays, JSONB, and tsvector.

-- Array containment
CREATE INDEX idx_products_tags ON products USING GIN(tags);
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'wireless'];

-- JSONB search
CREATE INDEX idx_events_data ON events USING GIN(data jsonb_path_ops);
SELECT * FROM events WHERE data @> '{"status": "error"}';

-- Full-text search (PostgreSQL)
CREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('database & indexing');

GiST Indexes (Generalized Search Tree)

A framework for building indexes on complex data types: geometric, range, and nearest-neighbor queries.

-- Geospatial (PostGIS)
CREATE INDEX idx_locations_geom ON locations USING GIST(geom);
SELECT * FROM locations WHERE ST_DWithin(geom, ST_Point(0,0)::geography, 1000);

-- Range queries
CREATE INDEX idx_reservations_period ON reservations USING GIST(period);
SELECT * FROM reservations WHERE period && daterange('2024-01-01', '2024-01-10');

BRIN Indexes (Block Range Indexes)

Compact indexes for very large, naturally ordered tables. Store min/max values per block instead of per row.

-- Time-series data: logs, events, metrics
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

-- Size: ~1% of B-Tree, but only useful for ordered data
-- Best for: billions of rows, time-series, append-only workloads

Partial Indexes

Index only a subset of rows, reducing size and improving write performance.

-- Only index active users (80% of queries filter on active)
CREATE INDEX idx_users_active_email ON users(email) WHERE active = true;

-- Only index unpaid orders for aging reports
CREATE INDEX idx_orders_unpaid ON orders(created_at) WHERE status = 'unpaid';

Covering Indexes (Index-Only Scans)

Include additional columns so the database can answer queries without touching the heap.

-- PostgreSQL: INCLUDE adds columns to the index leaf
CREATE INDEX idx_orders_customer_total ON orders(customer_id) INCLUDE(total, status);

-- Query uses index only — no heap access
SELECT total, status FROM orders WHERE customer_id = 42;

-- MySQL: composite index naturally covers
CREATE INDEX idx_orders_customer_total ON orders(customer_id, total, status);

Index Selection Matrix

Index TypeBest ForAvoid When
B-TreeEquality, range, sortingHigh-cardinality text search
HashExact match on large textRange queries needed
GINArrays, JSONB, full-textSimple scalar columns
GiSTGeospatial, rangesStandard scalar lookups
BRINLarge ordered datasetsRandom access patterns
PartialFrequently filtered subsetsQueries scan all rows

Common Mistakes

  • Indexing every column — slows writes dramatically; indexes have maintenance cost
  • Wrong column order in composites — the leading column must be the most selective
  • Indexing low-cardinality columns — gender, boolean flags; bitmap indexes handle these better
  • Ignoring partial indexes — indexing 100% of rows when queries always filter
  • Not updating statistics — stale stats lead to bad index choices by the query planner

Monitoring Index Usage

-- PostgreSQL: find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY schemaname, tablename, indexname;

-- MySQL: index usage via performance_schema
SELECT object_schema, object_name, index_name, count_read
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
ORDER BY count_read DESC;

FAQ

How many indexes is too many? There is no fixed number, but each index adds write overhead. Audit unused indexes quarterly.

Should I index foreign keys? Yes. The referencing side (many side) of a foreign key should almost always be indexed for JOIN performance.

Do indexes slow down INSERT? Yes. Every index on a table adds write amplification. Batch inserts and consider dropping indexes during bulk loads.