Implement Mutation Testing

Overview

Code coverage tells you which lines were executed, but not whether the tests would fail if the behavior changed. Mutation testing addresses this by introducing small, semantically meaningful bugs (mutants) into your code — changing + to -, inverting a condition, removing a method call — and verifying that at least one test fails. A high mutation score means your tests are genuinely asserting behavior, not just passing through code.

When to Use

Coverage is high (80%+) but bugs still slip through to production
You suspect tests lack meaningful assertions (mock-heavy tests that verify nothing)
Critical business logic needs confidence beyond line coverage
You are refactoring legacy code and want to ensure tests will catch regressions
Code reviews repeatedly surface “this test passes even if I delete the implementation”

When NOT to Use

The codebase has low coverage to begin with — fix coverage before mutation testing
Test suites already take 30+ minutes — mutation testing multiplies that time significantly
You are in early prototype phase and tests are intentionally minimal
The team lacks time to investigate and strengthen tests that fail to kill mutants

Step-by-Step Implementation

Python (MutPy)

# Install
pip install mutpy

# Run mutation testing on a module
mutpy --target mymodule --unit-test tests/ --runner pytest

# Generate HTML report
mutpy --target mymodule --unit-test tests/ --runner pytest --report-html mutpy_report/

# Show surviving mutants (tests that should have failed but didn't)
mutpy --target mymodule --unit-test tests/ --runner pytest --show-mutants

# .mutpy.yml configuration
target:
  - myproject/core/
tests:
  - tests/core/
runner: pytest
show_mutants: true
exclude:
  - "*/migrations/*"
  - "*/tests/*"

# Mutation operators to apply
operators:
  - AOR  # Arithmetic operator replacement (+ to -, etc.)
  - ROR  # Relational operator replacement (> to >=, etc.)
  - COR  # Conditional operator replacement (and to or)
  - UOI  # Unary operator insertion/deletion
  - ABS  # Absolute value insertion

# Example: a test that survives mutation (bad)
def test_calculate_discount():
    result = calculate_discount(100, 0.2)
    # No assertion — any mutant survives

# Strengthened test that kills mutants
def test_calculate_discount():
    result = calculate_discount(100, 0.2)
    assert result == 80  # Mutants returning 81, 79, 100, 0 will fail
    assert isinstance(result, (int, float))

JavaScript (Stryker)

# Install
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner

# Initialize configuration
npx stryker init

// stryker.config.mjs
export default {
  testRunner: 'jest',
  reporters: ['progress', 'clear-text', 'html'],
  mutate: [
    'src/**/*.js',
    '!src/**/*.test.js',
    '!src/vendor/**'
  ],
  coverageAnalysis: 'perTest',
  thresholds: {
    high: 80,
    low: 60,
    break: 50    // CI fails below 50% mutation score
  }
};

# Run mutation testing
npx stryker run

# Check mode (faster, doesn't generate full report)
npx stryker run --check

# Dashboard reporter for PR comments
npx stryker run --reporters dashboard

// Example: surviving mutant detection
// Original code
function isEligible(age, income) {
  return age >= 18 && income > 30000;  // Stryker mutates >= to >
}

// Weak test (survives the >= to > mutant)
test('eligibility', () => {
  expect(isEligible(18, 40000)).toBe(true);
});

// Strong test (kills the mutant)
test('eligibility boundary', () => {
  expect(isEligible(18, 30000)).toBe(false);  // income > 30000 fails here
  expect(isEligible(17, 40000)).toBe(false);  // age >= 18 fails here
  expect(isEligible(18, 40000)).toBe(true);
});

Java (PIT)

<!-- pom.xml -->
<plugin>
  <groupId>org.pitest</groupId>
  <artifactId>pitest-maven</artifactId>
  <version>1.15.0</version>
  <configuration>
    <targetClasses>
      <param>com.example.core.*</param>
    </targetClasses>
    <targetTests>
      <param>com.example.core.*Test</param>
    </targetTests>
    <mutators>
      <mutator>CONDITIONALS_BOUNDARY</mutator>
      <mutator>MATH</mutator>
      <mutator>NEGATE_CONDITIONALS</mutator>
      <mutator>RETURN_VALS</mutator>
      <mutator>VOID_METHOD_CALLS</mutator>
    </mutators>
    <thresholds>
      <mutation>70</mutation>
      <coverage>80</coverage>
    </thresholds>
    <outputFormats>
      <format>HTML</format>
      <format>XML</format>
    </outputFormats>
  </configuration>
</plugin>

# Run PIT mutation testing
mvn org.pitest:pitest-maven:mutationCoverage

# Faster with incremental analysis (only changed code)
mvn org.pitest:pitest-maven:mutationCoverage -DwithHistory

Interpreting Results

Term	Meaning
Mutant	A modified version of your code with one semantic change
Killed	At least one test failed on the mutant — good
Survived	All tests passed on the mutant — test gap found
Timeout	Mutant caused an infinite loop or extreme slowdown
Equivalent	Mutant behaves identically to original (false positive)
Mutation Score	Killed / (Killed + Survived) × 100

# Sample Stryker output
Ran 12.4k mutants in 4m 32s
- Killed: 10,210 (82%)
- Survived: 2,134 (17%)
- Timed out: 56 (<1%)
- Equivalent: ~120 (excluded from score)

Surviving mutants:
src/cart.js:45  # changed >= to > in calculateTotal
src/cart.js:67  # removed null check in applyDiscount

Best Practices

Target high-value code first. Run mutation testing on core business logic, not on controller wiring or DTO mappings. Mutation testing is expensive; focus where it matters.
Distinguish equivalent mutants from real gaps. An equivalent mutant (a + 0 changed to a - 0) cannot be killed. Mark them in configuration to avoid noise.
Use incremental mode in CI. PIT’s history mode and Stryker’s incremental analysis only mutate changed files, reducing runtime from hours to minutes.
Set realistic thresholds. A 100% mutation score is usually not worth the effort. 70-80% on core modules is a strong signal of test quality.
Treat surviving mutants as tickets. Each surviving mutant is a potential bug hiding in production. Prioritize them like code review comments.

Common Mistakes

Running mutation tests on the full suite without filtering. A large codebase can take hours. Start with one package or module.
Chasing 100% mutation score. The last 10% often requires testing trivial getters or logging calls that provide no business value.
Ignoring equivalent mutants. They create noise and make developers distrust the tool. Configure exclusions or annotations.
Using mutation score as a team KPI. It encourages writing tests specifically to kill mutants rather than testing real requirements.
Running mutation tests on unmocked integration tests. Database calls and HTTP requests make mutation testing impossibly slow; target unit tests.