Skip to content
SP StackPractices
beginner By StackPractices

Logging Standards Document

A document template for defining structured logging conventions, log levels, retention, and observability requirements across services.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

A Logging Standards Document defines how services, applications, and infrastructure produce logs. Consistent logging makes debugging, monitoring, security investigation, and compliance easier. This template covers log levels, structured formats, required fields, retention, sampling, and security rules.

When to Use

  • Onboarding a new service or development team.
  • Consolidating logs from multiple systems into a central observability platform.
  • Preparing for a security audit or compliance review.
  • Investigating a production incident where logs are incomplete or inconsistent.
  • Defining a logging strategy for microservices or serverless environments.

Prerequisites

  • A log aggregation platform such as ELK, Splunk, Datadog, Grafana Loki, or CloudWatch.
  • A shared timestamp standard and timezone policy.
  • A list of critical events that must always be logged.
  • Agreement on sensitive data classification and log redaction rules.

Solution

Document

1. Log Levels

LevelUseExample
DEBUGDetailed diagnostic information during developmentcache miss for key user:1234
INFONormal application eventsuser logged in, order completed
WARNUnexpected but recoverable situationsconnection timeout, retrying
ERRORFailures that affect operationpayment gateway returned 500
FATALCritical failures requiring immediate attentiondatabase unavailable, service shutdown

Guidelines:

  • DEBUG must be off in production by default.
  • INFO is the default production level for most services.
  • ERROR must trigger an alert or ticket.
  • FATAL must page the on-call team.

2. Structured Log Format

All logs must be emitted as JSON with the following required fields:

FieldTypeDescriptionExample
timestampISO 8601Event time in UTC2026-06-27T14:30:00Z
levelstringLog levelINFO
servicestringService or application namepayment-service
environmentstringEnvironmentproduction
messagestringHuman-readable summaryOrder 12345 completed
correlation_idstringRequest trace IDabc-123-def
span_idstringOpenTelemetry span IDspan-xyz-789

Optional fields:

  • user_id: Identity of the user associated with the event.
  • tenant_id: Identifier for multi-tenant isolation.
  • duration_ms: Time taken to complete an operation.
  • error_code: Stable error code for programmatic handling.
  • source_file: File and line where the log was emitted.

3. Required Event Categories

CategoryEvents to LogLevel
AuthenticationLogin, logout, failed login, MFA challengeINFO / WARN
AuthorizationAccess denied, permission escalationWARN
Data changesCreate, update, delete on sensitive recordsINFO
ErrorsExceptions, external failures, retriesERROR
PerformanceSlow queries, high latency, timeoutsWARN
SecuritySuspicious activity, rate limit hits, blocked requestsWARN
OperationalStartup, shutdown, configuration changesINFO
BusinessOrder placed, payment received, workflow completedINFO

4. Sensitive Data and Redaction

Data TypeLoggedRedaction
PasswordsNeverRedact or exclude
Credit card numbersNeverTokenize or exclude
API keysNeverRedact or exclude
Personal namesWith approvalMask if not required
Email addressesAllowedPartial mask for non-admins
IP addressesAllowedAllowed for security logs
User IDsAllowedAllowed

Rules:

  • Never log secrets or credentials.
  • Use allowlists for personal data fields.
  • Redact or tokenize values before logging.
  • Encrypt logs if they contain sensitive data.

5. Retention and Sampling

Log TypeRetentionSamplingNotes
Application logs30 days100%Keep all for debugging
Security logs1 year100%Compliance requirement
Audit logs7 years100%Legal and regulatory
Debug logs7 days100%Only when enabled
High-volume trace logs14 days1% or dynamicCost control

6. Log Aggregation and Transport

RequirementRule
TransportSend logs to the central platform with backpressure handling.
OrderingUse timestamps for ordering; tolerate minor clock skew.
BufferingBuffer locally if the collector is unavailable.
EncodingUse UTF-8 JSON.
BackupsReplicate critical logs to a secondary storage.
AlertingRoute ERROR and FATAL logs to the alerting system.

Explanation

Consistent logging transforms noisy text files into searchable, structured data. By defining levels, fields, and retention, teams can correlate events across services, investigate incidents faster, and meet compliance requirements. Structured logs also integrate with tracing and metrics to create a complete observability picture.

Variants

  • Cloud logging standards: Tailored for AWS CloudWatch, Azure Monitor, or Google Cloud Logging.
  • Container and Kubernetes logging: Covers sidecar log shippers, Fluentd, and pod log conventions.
  • Security-focused logging: Emphasizes audit events, integrity, and tamper detection.
  • Serverless logging: Addresses short-lived functions, cold starts, and centralized log collection.
  • Mobile or client logging: Focuses on privacy, batching, and offline buffering.

Best Practices

  • Use a single structured format across all services.
  • Include a correlation ID in every request to enable distributed tracing.
  • Log outcomes at business boundaries, not every internal step.
  • Keep log messages concise and add context as structured fields.
  • Avoid logging sensitive data by default.
  • Use log levels consistently so alerts are meaningful.
  • Review retention policies against cost and compliance needs.
  • Test log parsing and alerting rules as part of deployments.

Common Mistakes

  • Logging everything at INFO, making it hard to spot real issues.
  • Writing logs as plain text that cannot be parsed automatically.
  • Omitting timestamps or using inconsistent formats.
  • Including passwords or tokens in logs.
  • Not including enough context to reproduce a failure.
  • Keeping logs forever and increasing storage costs unnecessarily.
  • Not correlating logs across services during an incident.

FAQs

Should we log in production at DEBUG level?

No, DEBUG should be off by default. Enable it temporarily for targeted troubleshooting, and disable it when the issue is resolved.

What is a correlation ID?

A correlation ID is a unique identifier passed through all services that handle a single request. It allows you to group related log entries across a distributed system.

How do we handle sensitive data in logs?

Use an allowlist approach: only log fields that are explicitly approved, and redact or tokenize sensitive values before they reach the log stream.