Find and Remove Duplicate Rows in SQL

Q: Can I delete duplicates in batches?

Yes. Add AND id IN (SELECT id FROM duplicates WHERE rn > 1 LIMIT 1000) and run the delete repeatedly until no duplicates remain.

Overview

Duplicate rows creep into tables through application bugs, import scripts, or race conditions. They waste space, distort analytics, and can break unique constraints you intended to enforce. Finding them requires grouping by the columns that define uniqueness, and removing them safely means keeping one canonical row while deleting the rest without losing related data.

When to Use

Use this resource when:

You need to identify duplicate records in a table.
A unique constraint violation prevents adding a required index.
You are cleaning data after an import or migration.
You want to deduplicate before enforcing a new primary key or unique index.

Solution

Find duplicates in PostgreSQL

-- Find duplicate emails in the users table
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Keep the oldest row and delete the rest
WITH duplicates AS (
  SELECT id,
         ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM duplicates WHERE rn > 1
);

Explanation

The first query groups rows by the column that should be unique and uses HAVING COUNT(*) > 1 to return only duplicates. The second query uses a common table expression (CTE) with ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at). Each group of duplicates gets numbered starting from 1, and we delete every row except the first one. The ORDER BY clause determines which row is kept; here we keep the oldest record. Always run the SELECT version of the CTE before DELETE to confirm what will be removed.

Variants

Database	Technique	Notes
PostgreSQL	`ROW_NUMBER() OVER`	Flexible and safe
MySQL 8+	`ROW_NUMBER() OVER`	Same syntax as PostgreSQL
MySQL 5.7	Self-join	Use `MIN(id)` to keep one row
SQLite	`DELETE` with `IN` subquery	Works with window functions in 3.25+

Best Practices

Always preview before deleting. Run the CTE as a SELECT first to see which rows will be kept.
Back up the table or use a transaction. A single bad DELETE can remove thousands of rows.
Choose the canonical row with business logic. Oldest, newest, or most complete record depends on the use case.
Add a unique constraint after cleanup. This prevents duplicates from returning.
Consider foreign keys. Deleting a parent row may orphan child rows unless you use ON DELETE CASCADE or update references first.

Common Mistakes

Deleting without a WHERE clause. A missing WHERE turns the query into a table wipe.
Keeping the wrong row. If you order randomly, you may discard the most valuable duplicate.
Ignoring NULL values. NULL does not equal NULL, so duplicates with NULL keys may not be detected by GROUP BY.
Running on production during peak traffic. Lock contention can block writes; use a batch approach or low-traffic window.
Forgetting to update related sequences. If you delete the highest id, you may need to reset a sequence, though it is rarely required.

Frequently Asked Questions

Q: What if duplicates have different values in other columns? A: Choose the canonical row by business rules, then either merge the data or keep the row with the most complete or most recent data.

Q: Can I delete duplicates in batches? A: Yes. Add AND id IN (SELECT id FROM duplicates WHERE rn > 1 LIMIT 1000) and run the delete repeatedly until no duplicates remain.

Q: How do I prevent duplicates from reappearing? A: Add a unique constraint or unique index on the columns that define uniqueness, and handle duplicate key exceptions in your application.

Find and Remove Duplicate Rows in SQL

Overview

When to Use

Solution

Find duplicates in PostgreSQL

Explanation

Variants

Best Practices

Common Mistakes

Frequently Asked Questions

Read Replicas — Scale Reads Without Changing Application Logic

SQL CTEs — Common Table Expressions Explained

Database Failover Runbook

Database Schema Documentation Template

Full-Text Search — Implement Search That Actually Works

Overview

When to Use

Solution

Find duplicates in PostgreSQL

Explanation

Variants

Best Practices

Common Mistakes

Frequently Asked Questions

Related Resources

Read Replicas — Scale Reads Without Changing Application Logic

SQL CTEs — Common Table Expressions Explained

Database Failover Runbook

Database Schema Documentation Template

Full-Text Search — Implement Search That Actually Works