Skip to content
SP StackPractices
beginner By StackPractices

Generate URL Slugs

How to generate clean, URL-friendly slugs from strings in multiple programming languages.

Topics: data

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

URL slugs are human-readable identifiers used in web addresses. A well-formed slug improves SEO, usability, and shareability. Generating slugs involves transliterating non-ASCII characters, removing special symbols, collapsing whitespace, and ensuring uniqueness.

When to Use

Use this resource when:

  • Converting article titles, product names, or user-generated content into permalink URLs
  • Building CMSs, blogs, e-commerce platforms, or any app with user-facing URLs
  • Normalizing filenames for uploaded assets to avoid encoding issues
  • Creating SEO-friendly paths for multi-language content

Solution

Python

import re
import unicodedata

def generate_slug(text):
    # Normalize unicode and remove accents
    text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('ascii')
    # Lowercase and replace non-alphanumeric with hyphens
    text = re.sub(r'[^\w\s-]', '', text.lower())
    # Collapse multiple hyphens/whitespace
    text = re.sub(r'[-\s]+', '-', text).strip('-_')
    return text

print(generate_slug("Hello, World! 2024"))  # hello-world-2024
print(generate_slug("Café & Crème Brûlée"))  # cafe-creme-brulee

JavaScript

function generateSlug(text) {
    return text
        .normalize('NFKD')
        .replace(/[\u0300-\u036f]/g, '')  // Remove diacritics
        .toLowerCase()
        .trim()
        .replace(/[^\w\s-]/g, '')
        .replace(/[-\s]+/g, '-');
}

console.log(generateSlug("Hello, World! 2024"));  // hello-world-2024
console.log(generateSlug("Café & Crème Brûlée"));  // cafe-creme-brulee
// Using the popular slugify library
// npm install slugify
import slugify from 'slugify';

slugify('Hello, World! 2024');  // hello-world-2024
slugify('Café & Crème Brûlée');  // cafe-creme-brulee

Java

import java.text.Normalizer;
import java.util.Locale;

public class SlugGenerator {
    public static String generateSlug(String input) {
        String normalized = Normalizer.normalize(input, Normalizer.Form.NFKD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
        return normalized.toLowerCase(Locale.ROOT)
            .replaceAll("[^\\w\\s-]", "")
            .replaceAll("[-\\s]+", "-")
            .replaceAll("^-+|$-+", "");
    }

    public static void main(String[] args) {
        System.out.println(generateSlug("Hello, World! 2024"));  // hello-world-2024
        System.out.println(generateSlug("Café & Crème Brûlée"));  // cafe-creme-brulee
    }
}

Explanation

The core algorithm is consistent across languages:

  1. Unicode normalization (NFKD) decomposes accented characters into base + combining mark, allowing removal of diacritics
  2. Lowercasing ensures case-insensitive URLs
  3. Strip special characters except alphanumeric, hyphens, and spaces
  4. Collapse separators into single hyphens and trim leading/trailing ones

This produces ASCII-only, lowercase, hyphen-delimited strings that are safe for URLs, filenames, and IDs.

Variants

TechnologyLibraryApproachNotes
Pythonpython-slugifyslugify()Handles unicode, supports language-specific rules
Pythonunicodedata + reManualNo dependencies, full control
JavaScriptslugifyslugify()Lightweight, supports custom replacements
JavaScriptManualnormalize() + regexZero dependencies
Javaslugify (Maven)SlugifySupports custom replacements and language rules
JavaNormalizer + regexManualBuilt-in JDK, no external deps

Best Practices

  • Always normalize Unicode before stripping accents to handle é, ñ, 中文 correctly
  • Enforce maximum length (e.g., 100 characters) to prevent URL bloat and database issues
  • Check for uniqueness against existing slugs in your database; append -2, -3 if needed
  • Avoid trailing/leading hyphens that look unprofessional and may break relative URL resolution
  • Use lowercase exclusively; URLs are case-sensitive on most Unix servers

Common Mistakes

  • Stripping accents without NFKD normalization: é stays as é instead of becoming e
  • Allowing reserved URL characters: #, %, ?, & have special meaning in URLs and must be removed
  • Not limiting slug length: Extremely long slugs hurt readability and may exceed database column limits
  • Ignoring duplicate slugs: Two articles with the same title will collide without a uniqueness strategy
  • Translating instead of transliterating: Translating “hello” to Spanish is not the same as making it URL-safe

Frequently Asked Questions

How do I handle completely non-Latin scripts like Chinese or Arabic?

For Chinese, use Pinyin romanization libraries (pypinyin in Python, pinyin in JS). For Arabic, Persian, or Cyrillic, use transliteration libraries (unidecode, transliteration). As a fallback, generate a random ID or use a numeric suffix.

Should slugs be unique globally or per user?

It depends on your URL structure. If URLs are /posts/:slug, slugs must be globally unique. If they are /:username/:slug, uniqueness only needs to be enforced per user. Always index the slug column for fast lookups.

Can I change a slug after publishing?

Changing slugs breaks existing links and bookmarks. If you must change a slug, implement a 301 redirect from the old slug to the new one. Store the old slug in a redirects table to preserve SEO value.