Regular Expressions
How to use regular expressions for pattern matching, validation, and text extraction across Python, JavaScript, and Java.
Overview
Regular expressions (regex) are sequences of characters that define search patterns. They are the standard tool for text validation, extraction, substitution, and parsing across virtually every programming language and text editor.
Despite their cryptic syntax, regex is indispensable for working with unstructured text, form validation, log parsing, and data cleaning.
When to Use
Use this recipe when:
- Validating email addresses, phone numbers, or IDs
- Extracting data from unstructured text or log files
- Replacing or formatting strings with complex rules
- Splitting text on dynamic delimiters
- Searching for patterns within large documents
Solution
Python
import re
text = "Contact us at support@example.com or sales@example.org"
# Search for email pattern
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
matches = re.findall(pattern, text)
print(matches) # ['support@example.com', 'sales@example.org']
# Extract groups
match = re.search(r'(\w+)@(\w+\.\w+)', text)
if match:
print(match.group(1)) # support
print(match.group(2)) # example.com
# Replace
new_text = re.sub(r'\b\w+@\w+\.\w+\b', '[REDACTED]', text)
print(new_text) # Contact us at [REDACTED] or [REDACTED]
JavaScript
const text = "Contact us at support@example.com or sales@example.org";
// Match all emails
const pattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;
const matches = text.match(pattern);
console.log(matches); // ['support@example.com', 'sales@example.org']
// Extract groups
const groupPattern = /(\w+)@(\w+\.\w+)/;
const match = text.match(groupPattern);
if (match) {
console.log(match[1]); // support
console.log(match[2]); // example.com
}
// Replace
const newText = text.replace(/\b\w+@\w+\.\w+\b/g, '[REDACTED]');
console.log(newText); // Contact us at [REDACTED] or [REDACTED]
Java
import java.util.regex.*;
String text = "Contact us at support@example.com or sales@example.org";
Pattern pattern = Pattern.compile("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group()); // support@example.com, sales@example.org
}
// Extract groups
Pattern groupPattern = Pattern.compile("(\\w+)@(\\w+\\.\\w+)");
Matcher groupMatcher = groupPattern.matcher(text);
if (groupMatcher.find()) {
System.out.println(groupMatcher.group(1)); // support
System.out.println(groupMatcher.group(2)); // example.com
}
Explanation
- Pattern: The regex string that defines what to search for
- Matcher / Match object: Holds the result of applying a pattern to text
- Groups (
()): Capture sub-expressions for extraction - Flags (
i,g,m): Modify behavior (case-insensitive, global, multiline) - Character classes (
[a-z],\d,\w): Match sets of characters
Common Patterns
| Pattern | Description | Example |
|---|---|---|
\d{3}-\d{2}-\d{4} | US Social Security Number | 123-45-6789 |
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b | IPv4 address | 192.168.1.1 |
https?://[^\s]+ | URL | https://example.com |
^\d{4}-\d{2}-\d{2}$ | ISO date (YYYY-MM-DD) | 2024-03-15 |
^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ | Email (basic) | user@domain.com |
Best Practices
- Always escape special characters when building regex dynamically
- Use raw strings in Python (
r'...') to avoid double escaping - Prefer explicit character classes over
.(dot) for predictable matching - Anchor your patterns with
^and$when validating entire strings - Test with edge cases: empty strings, Unicode, very long inputs
- Document complex patterns with comments or the
(?x)verbose flag
Common Mistakes
- Forgetting to escape backslashes (use raw strings in Python)
- Using greedy quantifiers (
.*) when non-greedy (.*?) is needed - Not anchoring validation patterns, allowing partial matches
- Ignoring Unicode and international characters in real-world text
- Writing overly complex regex when a simple string function suffices
Frequently Asked Questions
Q: Should I use regex to parse HTML? A: No. HTML is not a regular language. Use a proper HTML parser (BeautifulSoup, DOM API, Jsoup).
Q: What is the difference between match() and search() in Python?
A: match() checks only at the beginning of the string. search() scans the entire string.
Q: How do I make a regex case-insensitive?
A: Use the i flag (JavaScript), re.IGNORECASE (Python), or Pattern.CASE_INSENSITIVE (Java).
Related Resources
Parse JSON
How to parse JSON strings into native data structures across multiple programming languages.
RecipeHandle Errors in APIs
Patterns for consistent, predictable API error handling across multiple languages and frameworks.
RecipeSort an Array
How to sort arrays and lists in ascending, descending, and custom order across multiple languages.