Logging Standards Document
A document template for defining structured logging conventions, log levels, retention, and observability requirements across services.
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Overview
A Logging Standards Document defines how services, applications, and infrastructure produce logs. Consistent logging makes debugging, monitoring, security investigation, and compliance easier. This template covers log levels, structured formats, required fields, retention, sampling, and security rules.
When to Use
- Onboarding a new service or development team.
- Consolidating logs from multiple systems into a central observability platform.
- Preparing for a security audit or compliance review.
- Investigating a production incident where logs are incomplete or inconsistent.
- Defining a logging strategy for microservices or serverless environments.
Prerequisites
- A log aggregation platform such as ELK, Splunk, Datadog, Grafana Loki, or CloudWatch.
- A shared timestamp standard and timezone policy.
- A list of critical events that must always be logged.
- Agreement on sensitive data classification and log redaction rules.
Solution
Document
1. Log Levels
| Level | Use | Example |
|---|---|---|
| DEBUG | Detailed diagnostic information during development | cache miss for key user:1234 |
| INFO | Normal application events | user logged in, order completed |
| WARN | Unexpected but recoverable situations | connection timeout, retrying |
| ERROR | Failures that affect operation | payment gateway returned 500 |
| FATAL | Critical failures requiring immediate attention | database unavailable, service shutdown |
Guidelines:
- DEBUG must be off in production by default.
- INFO is the default production level for most services.
- ERROR must trigger an alert or ticket.
- FATAL must page the on-call team.
2. Structured Log Format
All logs must be emitted as JSON with the following required fields:
| Field | Type | Description | Example |
|---|---|---|---|
timestamp | ISO 8601 | Event time in UTC | 2026-06-27T14:30:00Z |
level | string | Log level | INFO |
service | string | Service or application name | payment-service |
environment | string | Environment | production |
message | string | Human-readable summary | Order 12345 completed |
correlation_id | string | Request trace ID | abc-123-def |
span_id | string | OpenTelemetry span ID | span-xyz-789 |
Optional fields:
user_id: Identity of the user associated with the event.tenant_id: Identifier for multi-tenant isolation.duration_ms: Time taken to complete an operation.error_code: Stable error code for programmatic handling.source_file: File and line where the log was emitted.
3. Required Event Categories
| Category | Events to Log | Level |
|---|---|---|
| Authentication | Login, logout, failed login, MFA challenge | INFO / WARN |
| Authorization | Access denied, permission escalation | WARN |
| Data changes | Create, update, delete on sensitive records | INFO |
| Errors | Exceptions, external failures, retries | ERROR |
| Performance | Slow queries, high latency, timeouts | WARN |
| Security | Suspicious activity, rate limit hits, blocked requests | WARN |
| Operational | Startup, shutdown, configuration changes | INFO |
| Business | Order placed, payment received, workflow completed | INFO |
4. Sensitive Data and Redaction
| Data Type | Logged | Redaction |
|---|---|---|
| Passwords | Never | Redact or exclude |
| Credit card numbers | Never | Tokenize or exclude |
| API keys | Never | Redact or exclude |
| Personal names | With approval | Mask if not required |
| Email addresses | Allowed | Partial mask for non-admins |
| IP addresses | Allowed | Allowed for security logs |
| User IDs | Allowed | Allowed |
Rules:
- Never log secrets or credentials.
- Use allowlists for personal data fields.
- Redact or tokenize values before logging.
- Encrypt logs if they contain sensitive data.
5. Retention and Sampling
| Log Type | Retention | Sampling | Notes |
|---|---|---|---|
| Application logs | 30 days | 100% | Keep all for debugging |
| Security logs | 1 year | 100% | Compliance requirement |
| Audit logs | 7 years | 100% | Legal and regulatory |
| Debug logs | 7 days | 100% | Only when enabled |
| High-volume trace logs | 14 days | 1% or dynamic | Cost control |
6. Log Aggregation and Transport
| Requirement | Rule |
|---|---|
| Transport | Send logs to the central platform with backpressure handling. |
| Ordering | Use timestamps for ordering; tolerate minor clock skew. |
| Buffering | Buffer locally if the collector is unavailable. |
| Encoding | Use UTF-8 JSON. |
| Backups | Replicate critical logs to a secondary storage. |
| Alerting | Route ERROR and FATAL logs to the alerting system. |
Explanation
Consistent logging transforms noisy text files into searchable, structured data. By defining levels, fields, and retention, teams can correlate events across services, investigate incidents faster, and meet compliance requirements. Structured logs also integrate with tracing and metrics to create a complete observability picture.
Variants
- Cloud logging standards: Tailored for AWS CloudWatch, Azure Monitor, or Google Cloud Logging.
- Container and Kubernetes logging: Covers sidecar log shippers, Fluentd, and pod log conventions.
- Security-focused logging: Emphasizes audit events, integrity, and tamper detection.
- Serverless logging: Addresses short-lived functions, cold starts, and centralized log collection.
- Mobile or client logging: Focuses on privacy, batching, and offline buffering.
Best Practices
- Use a single structured format across all services.
- Include a correlation ID in every request to enable distributed tracing.
- Log outcomes at business boundaries, not every internal step.
- Keep log messages concise and add context as structured fields.
- Avoid logging sensitive data by default.
- Use log levels consistently so alerts are meaningful.
- Review retention policies against cost and compliance needs.
- Test log parsing and alerting rules as part of deployments.
Common Mistakes
- Logging everything at INFO, making it hard to spot real issues.
- Writing logs as plain text that cannot be parsed automatically.
- Omitting timestamps or using inconsistent formats.
- Including passwords or tokens in logs.
- Not including enough context to reproduce a failure.
- Keeping logs forever and increasing storage costs unnecessarily.
- Not correlating logs across services during an incident.
FAQs
Should we log in production at DEBUG level?
No, DEBUG should be off by default. Enable it temporarily for targeted troubleshooting, and disable it when the issue is resolved.
What is a correlation ID?
A correlation ID is a unique identifier passed through all services that handle a single request. It allows you to group related log entries across a distributed system.
How do we handle sensitive data in logs?
Use an allowlist approach: only log fields that are explicitly approved, and redact or tokenize sensitive values before they reach the log stream.