Write Large Files
How to write large files efficiently using buffered and streaming output.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
Writing massive datasets or logs to disk requires buffered and streaming techniques to avoid memory spikes and I/O bottlenecks. This recipe covers efficient file writing patterns across Python, JavaScript, and Java.
When to Use
Use this resource when:
- Generating large export files (CSV, JSONL, XML) from database queries
- Appending to ever-growing log files in long-running services
- Streaming transformed data to disk without holding the entire payload in memory
Solution
Python
# Buffered text writing
with open('output.log', 'w', encoding='utf-8') as f:
for record in data_source:
f.write(f"{record}\n")
# Chunked binary writing
with open('output.bin', 'wb') as f:
for chunk in byte_generator():
f.write(chunk)
JavaScript
const fs = require('fs');
// Stream writer
const stream = fs.createWriteStream('output.log');
for (const record of dataSource) {
stream.write(`${record}\n`);
}
stream.end();
// Promise-based completion
await new Promise((resolve, reject) => {
stream.on('finish', resolve);
stream.on('error', reject);
});
Java
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class LargeFileWriter {
// Buffered text writer
public void writeLines(String path, Iterable<String> lines) throws IOException {
try (BufferedWriter writer = new BufferedWriter(new FileWriter(path))) {
for (String line : lines) {
writer.write(line);
writer.newLine();
}
}
}
// Chunked binary writer
public void writeChunks(String path, Iterable<byte[]> chunks) throws IOException {
try (FileChannel channel = FileChannel.open(Paths.get(path),
StandardOpenOption.CREATE, StandardOpenOption.WRITE)) {
for (byte[] chunk : chunks) {
channel.write(ByteBuffer.wrap(chunk));
}
}
}
}
Explanation
Buffered writers reduce the number of system calls by accumulating data in memory before flushing to disk. Streaming writes process and emit data incrementally, keeping memory usage flat regardless of total output size. FileChannel in Java provides direct buffer-to-channel transfers, minimizing copies between user and kernel space.
Variants
| Technology | Approach | Notes |
|---|---|---|
| Python | tempfile + atomic rename | Write to temp, then move for crash safety |
| JavaScript | pipeline() | Backpressure-aware piping between streams |
| Java | FileOutputStream with BufferedOutputStream | Classic IO, simpler but slightly slower than NIO |
Best Practices
- Always close or end streams to flush internal buffers and release file descriptors
- Use atomic rename patterns (write to temp file, then rename) to prevent partial files on crash
- Tune buffer sizes based on disk block size (typically 4 KB or 8 KB)
- Handle stream errors to avoid silent data loss
- For concurrent writers, use file locking or append-only modes
Common Mistakes
- Building a giant string in memory before writing rather than streaming
- Ignoring write stream errors, which can leave files truncated
- Using synchronous write calls in performance-critical loops
- Not flushing before process exit, losing buffered data
- Overwriting original files in-place without a backup strategy
Frequently Asked Questions
Should I use append mode or rewrite?
Use append ('a' in Python, 'a' flag in Node, StandardOpenOption.APPEND in Java) for logs. Use atomic rename for data files that must remain consistent.
How do I handle disk-full errors?
Catch IOException (Java), error event on streams (JS), or OSError (Python). Pre-checking available space with shutil.disk_usage (Python) or fs.statvfs (Node) can help.
Is BufferedWriter faster than FileWriter?
Yes. BufferedWriter batches writes, reducing syscalls. The difference is dramatic for many small writes and negligible for large block writes.
Related Resources
Read Large Files
How to read large files efficiently without running out of memory.
RecipeFile Upload Validation
How to handle file uploads securely with size, type, and content validation.
RecipeGenerate PDFs
How to generate PDF documents programmatically from HTML, templates, or raw data.
RecipeProcess Large Files with Streams
How to read, transform, and write large files efficiently using streams without loading entire files into memory in Python, Node.js, and Java.
PatternAbstract Factory Pattern
Create families of related objects without specifying concrete classes. A creational design pattern for consistent object families.