Truncate Text
How to truncate text with ellipsis and word boundaries in Python, Java, and JavaScript.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Truncating text is a common UI and data-processing task: previews, notification snippets, search result summaries, and CSV exports all need to cut long strings down to a maximum length without breaking words or HTML. This recipe covers character-based, word-boundary, and HTML-aware truncation in Python, JavaScript, and Java.
When to Use
Use this resource when:
- Displaying article previews, comment summaries, or product descriptions with “Read more” links
- Exporting report data to fixed-width columns or spreadsheets
- Generating email subject lines or push notification bodies with platform length limits
- Trimming user-generated content before storing or indexing
Solution
Python
# Character-based truncation with ellipsis
def truncate(text: str, max_length: int = 100) -> str:
if len(text) <= max_length:
return text
return text[:max_length - 3].rstrip() + '...'
print(truncate("This is a very long sentence that needs to be shortened."))
# Output: 'This is a very long sentence that needs to be shor...'
# Word-boundary truncation with textwrap
import textwrap
def truncate_words(text: str, max_length: int = 100) -> str:
if len(text) <= max_length:
return text
shortened = textwrap.shorten(text, width=max_length, placeholder='...')
return shortened
print(truncate_words("This is a very long sentence that needs to be shortened."))
# Output: 'This is a very long sentence that needs to be...'
JavaScript
// Character-based truncation
function truncate(text, maxLength = 100) {
if (text.length <= maxLength) return text;
return text.slice(0, maxLength - 3).trimEnd() + '...';
}
console.log(truncate("This is a very long sentence that needs to be shortened."));
// Output: 'This is a very long sentence that needs to be shor...'
// Word-boundary truncation
function truncateWords(text, maxLength = 100) {
if (text.length <= maxLength) return text;
const truncated = text.slice(0, maxLength - 3);
const lastSpace = truncated.lastIndexOf(' ');
return (lastSpace > 0 ? truncated.slice(0, lastSpace) : truncated) + '...';
}
console.log(truncateWords("This is a very long sentence that needs to be shortened."));
// Output: 'This is a very long sentence that needs to be...'
Java
// Apache Commons Lang StringUtils
// Maven: org.apache.commons:commons-lang3
import org.apache.commons.lang3.StringUtils;
public class TextTruncator {
public static String truncate(String text, int maxLength) {
return StringUtils.abbreviate(text, maxLength);
}
}
// truncate("This is a very long sentence...", 30)
// Output: "This is a very long sente..."
// Word-boundary truncation with Streams
import java.util.Arrays;
import java.util.stream.Collectors;
public class WordTruncator {
public static String truncateWords(String text, int maxLength) {
String[] words = text.split(" ");
StringBuilder result = new StringBuilder();
for (String word : words) {
if (result.length() + word.length() + 1 > maxLength) break;
if (result.length() > 0) result.append(" ");
result.append(word);
}
return result.toString() + (result.length() < text.length() ? "..." : "");
}
}
Explanation
Character truncation is straightforward but can split words in half, producing awkward output like “shor…”. Word-boundary truncation searches backward from the cutoff point to the nearest space, preserving readability. textwrap.shorten (Python) handles both character and word truncation with a single call. JavaScript requires manual slicing and index search. Java’s StringUtils.abbreviate defaults to character truncation; word-boundary logic must be built manually or with a library like Truncation.
HTML-aware truncation is more complex: you must close any opened tags before appending the ellipsis, or use a dedicated HTML parser. For plain text, word-boundary truncation is usually the best balance of simplicity and readability.
Variants
| Technology | Library / Approach | Strategy | Notes |
|---|---|---|---|
| Python | Slicing + ellipsis | Character | Fast, simple, may split words |
| Python | textwrap.shorten | Word + character | Stdlib, handles word breaks gracefully |
| JavaScript | slice + trimEnd | Character | Fast, built-in, no dependencies |
| JavaScript | lastIndexOf(' ') | Word | Manual, no dependencies |
| Java | StringUtils.abbreviate | Character | Apache Commons, configurable placeholder |
| Java | Custom stream builder | Word | Full control over delimiter and ellipsis |
Best Practices
- Respect word boundaries for UI text: “Readability is more important than exact character count in user-facing strings”
- Use character truncation for machine output: Fixed-width files, database columns, and logs need exact lengths
- Strip trailing whitespace before measuring: Leading/trailing spaces skew length calculations and produce
"..."on empty strings - Handle surrogate pairs and combining characters: JavaScript
lengthcounts UTF-16 code units, not grapheme clusters; useIntl.Segmenterfor proper Unicode counting - Add title attributes for truncated links:
<a title="Full text">truncated...</a>improves accessibility
Common Mistakes
- Splitting HTML tags: Truncating raw HTML at position 100 can break
<a href="...mid-tag; use an HTML parser or strip tags first - Forgetting to add ellipsis length: A 100-char limit with
...means the slice should end at 97, not 100 - Not handling multibyte characters: A 20-character slice of Japanese text may cut a 2-byte kanji in half in some encodings
- Trimming before length check:
trim()then slice can still exceed the limit if the original string had no trailing spaces - Assuming spaces are the only word boundary: Hyphens, em-dashes, and CJK characters have different boundary rules
Frequently Asked Questions
How do I truncate HTML without breaking tags?
Use an HTML-aware library. Python has html-truncate and BeautifulSoup; JavaScript has truncate-html; Java has Jsoup combined with manual node traversal. The rule is: count visible text characters, and when the limit is reached, close all open tags before appending the ellipsis.
How do I handle Unicode grapheme clusters when truncating?
A grapheme cluster is what a human perceives as one character (e.g., emoji with skin-tone modifiers). JavaScript’s .length counts UTF-16 code units, not graphemes. Use Intl.Segmenter (modern browsers) or the grapheme-splitter package. In Python, len() counts code points; use the grapheme library for true cluster counting. In Java, use BreakIterator.getCharacterInstance().
Should I truncate on the client or the server?
For UI previews, client-side truncation with CSS (text-overflow: ellipsis) is simplest and preserves the full text for screen readers. For fixed-length exports, database constraints, or search result snippets, truncate on the server. Server truncation is required when the full text is too large to transfer to the client.