Skip to content
SP StackPractices
beginner

Regular Expressions

How to use regular expressions for pattern matching, validation, and text extraction across Python, JavaScript, and Java.

Topics: data

Overview

Regular expressions (regex) are sequences of characters that define search patterns. They are the standard tool for text validation, extraction, substitution, and parsing across virtually every programming language and text editor.

Despite their cryptic syntax, regex is indispensable for working with unstructured text, form validation, log parsing, and data cleaning.

When to Use

Use this recipe when:

  • Validating email addresses, phone numbers, or IDs
  • Extracting data from unstructured text or log files
  • Replacing or formatting strings with complex rules
  • Splitting text on dynamic delimiters
  • Searching for patterns within large documents

Solution

Python

import re

text = "Contact us at support@example.com or sales@example.org"

# Search for email pattern
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
matches = re.findall(pattern, text)
print(matches)  # ['support@example.com', 'sales@example.org']

# Extract groups
match = re.search(r'(\w+)@(\w+\.\w+)', text)
if match:
    print(match.group(1))  # support
    print(match.group(2))  # example.com

# Replace
new_text = re.sub(r'\b\w+@\w+\.\w+\b', '[REDACTED]', text)
print(new_text)  # Contact us at [REDACTED] or [REDACTED]

JavaScript

const text = "Contact us at support@example.com or sales@example.org";

// Match all emails
const pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;
const matches = text.match(pattern);
console.log(matches);  // ['support@example.com', 'sales@example.org']

// Extract groups
const groupPattern = /(\w+)@(\w+\.\w+)/;
const match = text.match(groupPattern);
if (match) {
  console.log(match[1]); // support
  console.log(match[2]); // example.com
}

// Replace
const newText = text.replace(/\b\w+@\w+\.\w+\b/g, '[REDACTED]');
console.log(newText); // Contact us at [REDACTED] or [REDACTED]

Java

import java.util.regex.*;

String text = "Contact us at support@example.com or sales@example.org";

Pattern pattern = Pattern.compile("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b");
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    System.out.println(matcher.group());  // support@example.com, sales@example.org
}

// Extract groups
Pattern groupPattern = Pattern.compile("(\\w+)@(\\w+\\.\\w+)");
Matcher groupMatcher = groupPattern.matcher(text);
if (groupMatcher.find()) {
    System.out.println(groupMatcher.group(1));  // support
    System.out.println(groupMatcher.group(2));  // example.com
}

Explanation

  • Pattern: The regex string that defines what to search for
  • Matcher / Match object: Holds the result of applying a pattern to text
  • Groups (()): Capture sub-expressions for extraction
  • Flags (i, g, m): Modify behavior (case-insensitive, global, multiline)
  • Character classes ([a-z], \d, \w): Match sets of characters

Common Patterns

PatternDescriptionExample
\d{3}-\d{2}-\d{4}US Social Security Number123-45-6789
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\bIPv4 address192.168.1.1
https?://[^\s]+URLhttps://example.com
^\d{4}-\d{2}-\d{2}$ISO date (YYYY-MM-DD)2024-03-15
^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$Email (basic)user@domain.com

Best Practices

  • Always escape special characters when building regex dynamically
  • Use raw strings in Python (r'...') to avoid double escaping
  • Prefer explicit character classes over . (dot) for predictable matching
  • Anchor your patterns with ^ and $ when validating entire strings
  • Test with edge cases: empty strings, Unicode, very long inputs
  • Document complex patterns with comments or the (?x) verbose flag

Common Mistakes

  • Forgetting to escape backslashes (use raw strings in Python)
  • Using greedy quantifiers (.*) when non-greedy (.*?) is needed
  • Not anchoring validation patterns, allowing partial matches
  • Ignoring Unicode and international characters in real-world text
  • Writing overly complex regex when a simple string function suffices

Frequently Asked Questions

Q: Should I use regex to parse HTML? A: No. HTML is not a regular language. Use a proper HTML parser (BeautifulSoup, DOM API, Jsoup).

Q: What is the difference between match() and search() in Python? A: match() checks only at the beginning of the string. search() scans the entire string.

Q: How do I make a regex case-insensitive? A: Use the i flag (JavaScript), re.IGNORECASE (Python), or Pattern.CASE_INSENSITIVE (Java).