Observability Dashboards with Grafana and Prometheus
Build interactive Grafana dashboards that visualize Prometheus metrics with panels, variables, and alerts for comprehensive service observability
Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.
Observability Dashboards with Grafana and Prometheus
Create rich, interactive dashboards in Grafana to visualize Prometheus metrics and understand service behavior at a glance. This recipe covers panel types, template variables, row organization, and dashboard-as-code practices for consistent observability across teams.
When to Use This
- Teams need a centralized view of service health and performance
- On-call engineers must quickly identify which service is failing
- Business stakeholders want uptime and latency visibility without querying metrics directly
Solution
1. Provision Data Sources
# provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
2. Dashboard JSON Model
{
"dashboard": {
"title": "API Service Overview",
"tags": ["api", "production"],
"timezone": "utc",
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (route)",
"legendFormat": "{{ route }}"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"min": 0
}
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
},
{
"title": "P95 Latency",
"type": "timeseries",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))",
"legendFormat": "{{ route }}"
}
],
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"drawStyle": "line",
"lineWidth": 2
}
}
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
},
{
"title": "Error Rate",
"type": "stat",
"targets": [
{
"expr": "sum(rate(http_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))",
"legendFormat": "Error %"
}
],
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"thresholds": {
"steps": [
{ "color": "green", "value": 0 },
{ "color": "yellow", "value": 0.01 },
{ "color": "red", "value": 0.05 }
]
}
}
},
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 }
}
]
}
}
3. Template Variables for Dynamic Filtering
{
"templating": {
"list": [
{
"name": "service",
"type": "query",
"query": "label_values(http_requests_total, job)",
"multi": true,
"includeAll": true
},
{
"name": "route",
"type": "query",
"query": "label_values(http_requests_total{job=~\"$service\"}, route)",
"multi": true,
"includeAll": true
}
]
}
}
4. Dashboard Provisioning
# provisioning/dashboards/dashboards.yml
apiVersion: 1
providers:
- name: default
folder: Services
type: file
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
5. Dashboard as Code with Terraform
# terraform/grafana.tf
resource "grafana_dashboard" "api" {
config_json = jsonencode({
title = "API Overview"
panels = [
{
title = "Request Rate"
type = "timeseries"
targets = [{
expr = "sum(rate(http_requests_total[5m]))"
}]
}
]
})
}
How It Works
- Panels display queries in tables, graphs, gauges, and stat formats
- Variables allow filtering by service, region, or route dynamically
- Rows organize panels into collapsible sections for focused views
- Alerts can be configured directly in Grafana or via Prometheus Alertmanager
Variation: Node Exporter System Dashboard
# CPU usage
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes
# Disk I/O
rate(node_disk_io_time_seconds_total[5m])
Production Considerations
- Use dashboard provisioning to version control dashboards in Git
- Set appropriate refresh intervals; 5s for real-time, 30s-1m for overview
- Limit dashboard variables to prevent expensive queries on large labels
Common Mistakes
- Overloading a single dashboard with 50+ panels, making it slow to load
- Not using variables, leading to duplicated dashboards per service
- Forgetting to set min/max thresholds on stat panels for quick health assessment
FAQ
Q: How does Grafana compare to Prometheus built-in UI? A: Grafana is a dedicated visualization platform with rich panel types, variables, and layout options. The Prometheus UI is useful for ad-hoc queries but lacks dashboard composition features.
Q: Can I use Grafana with other data sources? A: Yes. Grafana supports Elasticsearch, InfluxDB, CloudWatch, Loki, Jaeger, and many others natively.