Skip to content
SP StackPractices
intermediate By StackPractices

Expose Custom Application Metrics with Python and Prometheus

Build a custom Prometheus metrics exporter in Python using prometheus_client for counters, gauges, histograms, and summaries.

Note: This guide follows English-language naming conventions and terminology standards common in international development teams. Examples use English identifiers and comments to maximize compatibility across codebases and tooling.

Overview

Prometheus is a pull-based monitoring system. Your application exposes a /metrics endpoint, and Prometheus scrapes it at regular intervals. The prometheus_client Python library provides built-in metric types (counter, gauge, histogram, summary) and an HTTP server to expose them. This recipe shows how to instrument a Python app with custom metrics.

When to Use

  • You need application-level metrics (request count, latency, queue depth)
  • You use Prometheus or Grafana for monitoring and alerting
  • You want to track custom business metrics (active users, orders processed)
  • You need to expose metrics from a Python service for scraping

Solution

Basic metrics endpoint

from prometheus_client import start_http_server, Counter, Gauge, Histogram
import time
import random

# Define metrics
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency in seconds",
    ["endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

ACTIVE_CONNECTIONS = Gauge(
    "active_connections",
    "Number of active connections"
)

QUEUE_DEPTH = Gauge(
    "queue_depth",
    "Number of items in the processing queue",
    ["queue_name"]
)

def handle_request(method: str, endpoint: str):
    start = time.time()
    status = 200

    try:
        time.sleep(random.uniform(0.01, 0.3))
        if random.random() < 0.05:
            status = 500
    except Exception:
        status = 500

    REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=str(status)).inc()
    REQUEST_LATENCY.labels(endpoint=endpoint).observe(time.time() - start)

if __name__ == "__main__":
    start_http_server(8000)  # Metrics on port 8000
    print("Metrics server on http://localhost:8000/metrics")

    while True:
        handle_request("GET", "/api/users")
        handle_request("POST", "/api/orders")
        ACTIVE_CONNECTIONS.set(random.randint(1, 50))
        QUEUE_DEPTH.labels(queue_name="email").set(random.randint(0, 100))
        time.sleep(0.1)

Integrating with Flask

from flask import Flask, request
from prometheus_client import Counter, Histogram, make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
import time

app = Flask(__name__)

REQUEST_COUNT = Counter(
    "flask_requests_total",
    "Total Flask requests",
    ["method", "endpoint", "status"]
)

REQUEST_LATENCY = Histogram(
    "flask_request_duration_seconds",
    "Flask request latency",
    ["endpoint"]
)

@app.before_request
def before_request():
    request.start_time = time.time()

@app.after_request
def after_request(response):
    endpoint = request.path
    method = request.method
    status = response.status_code

    REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=str(status)).inc()
    REQUEST_LATENCY.labels(endpoint=endpoint).observe(time.time() - request.start_time)

    return response

@app.route("/health")
def health():
    return {"status": "healthy"}, 200

@app.route("/api/users")
def get_users():
    return {"users": []}, 200

# Mount Prometheus metrics endpoint
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
    "/metrics": make_wsgi_app()
})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Integrating with FastAPI

from fastapi import FastAPI, Request
from prometheus_client import Counter, Histogram, make_asgi_app
import time

app = FastAPI()

REQUEST_COUNT = Counter(
    "fastapi_requests_total",
    "Total FastAPI requests",
    ["method", "endpoint", "status"]
)

REQUEST_LATENCY = Histogram(
    "fastapi_request_duration_seconds",
    "FastAPI request latency",
    ["endpoint"]
)

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    duration = time.time() - start_time

    REQUEST_COUNT.labels(
        method=request.method,
        endpoint=request.url.path,
        status=str(response.status_code)
    ).inc()
    REQUEST_LATENCY.labels(endpoint=request.url.path).observe(duration)

    return response

@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.get("/api/users")
async def get_users():
    return {"users": []}

# Mount Prometheus metrics
app.mount("/metrics", make_asgi_app())

Custom collector for external data

from prometheus_client import CollectorRegistry, Gauge, generate_latest
import requests

class DatabaseCollector:
    """Custom collector that scrapes database stats."""

    def __init__(self, db_url: str):
        self.db_url = db_url
        self.registry = CollectorRegistry()

        self.active_queries = Gauge(
            "db_active_queries",
            "Number of active database queries",
            registry=self.registry
        )
        self.connection_pool = Gauge(
            "db_connection_pool_size",
            "Database connection pool size",
            ["state"],
            registry=self.registry
        )

    def collect(self):
        # Fetch stats from database
        stats = requests.get(f"{self.db_url}/stats").json()

        self.active_queries.set(stats["active_queries"])
        self.connection_pool.labels(state="idle").set(stats["pool"]["idle"])
        self.connection_pool.labels(state="active").set(stats["pool"]["active"])
        self.connection_pool.labels(state="waiting").set(stats["pool"]["waiting"])

        yield from self.registry.collect()

# Usage in a metrics endpoint
from flask import Flask, Response

app = Flask(__name__)
collector = DatabaseCollector("http://localhost:8080")

@app.route("/metrics")
def metrics():
    collector.collect()
    return Response(
        generate_latest(collector.registry),
        mimetype="text/plain; version=0.0.4; charset=utf-8"
    )

Summary metric for percentiles

from prometheus_client import Summary

REQUEST_SIZE = Summary(
    "request_size_bytes",
    "Request payload size in bytes",
    ["endpoint"]
)

# Summary provides _sum, _count, and quantiles (0.5, 0.9, 0.99 by default)
REQUEST_SIZE.labels(endpoint="/upload").observe(1024)
REQUEST_SIZE.labels(endpoint="/upload").observe(5120)
REQUEST_SIZE.labels(endpoint="/upload").observe(256)

# Access quantiles: p50, p90, p99

Prometheus scrape configuration

# prometheus.yml
scrape_configs:
    - job_name: "python-app"
      scrape_interval: 15s
      metrics_path: /metrics
      static_configs:
          - targets: ["localhost:8000"]

    - job_name: "flask-app"
      scrape_interval: 15s
      static_configs:
          - targets: ["localhost:5000"]

Docker Compose with Prometheus + Grafana

# docker-compose.yml
services:
    app:
        build: .
        ports:
            - "8000:8000"

    prometheus:
        image: prom/prometheus:v2.52.0
        ports:
            - "9090:9090"
        volumes:
            - ./prometheus.yml:/etc/prometheus/prometheus.yml

    grafana:
        image: grafana/grafana:11.0.0
        ports:
            - "3000:3000"
        environment:
            - GF_SECURITY_ADMIN_PASSWORD=admin

Explanation

Prometheus metric types:

  • Counter: Monotonically increasing value. Use for request counts, error counts, bytes sent. Never decreases. Use .inc() to add 1 or .inc(value) to add a specific amount.
  • Gauge: Value that goes up and down. Use for active connections, queue depth, memory usage. Use .set(value) to set, .inc() / .dec() to change.
  • Histogram: Groups observations into buckets. Use for latency distributions. Provides _bucket, _sum, and _count time series. Define custom buckets based on your SLOs.
  • Summary: Similar to histogram but computes quantiles on the client side. Use when you need exact percentiles and have few instances.

Labels add dimensions to metrics. Each label combination creates a separate time series. Avoid high-cardinality labels (user IDs, request IDs) — they explode the number of time series.

The prometheus_client library exposes metrics in the Prometheus text exposition format at /metrics. Prometheus scrapes this endpoint at the configured scrape_interval.

Variants

Metric TypeUse CaseExample
CounterCumulative countsTotal requests, errors sent
GaugeCurrent stateActive connections, queue depth
HistogramDistributionsRequest latency, response size
SummaryClient-side quantilesp99 latency per instance

Guidelines

  • Use counters for monotonically increasing values (requests, errors, bytes).
  • Use gauges for values that go up and down (connections, queue depth, memory).
  • Use histograms for latency distributions. Define buckets that match your SLOs.
  • Keep label cardinality low. Avoid user IDs, session IDs, or request IDs as labels.
  • Use the default buckets or define custom ones. Default buckets are optimized for ~0.005s to ~10s.
  • Expose metrics on a separate port or path from your application.
  • Set scrape_interval to 15-60s. Higher intervals reduce storage but miss short spikes.
  • Use middleware to instrument all requests automatically (Flask before_request, FastAPI middleware).
  • Track business metrics alongside technical ones (orders processed, active users).
  • Monitor your metrics endpoint health — if it goes down, Prometheus has no data.

Common Mistakes

  • Using high-cardinality labels. Each unique label combination creates a new time series. A label with 10K user IDs creates 10K series per metric.
  • Using a gauge for counters. Counters should never decrease. Using a gauge loses the rate calculation capability.
  • Not defining custom histogram buckets. Default buckets may not match your latency profile.
  • Exposing metrics on the same port as the app without authentication. In production, protect the metrics endpoint.
  • Forgetting to call .inc() or .observe(). Metrics that are never updated are useless.
  • Using summaries instead of histograms. Summaries cannot be aggregated across instances. Use histograms for distributed systems.
  • Not instrumenting error paths. If you only track successful requests, your error rate appears as zero.

Frequently Asked Questions

What is the difference between a histogram and a summary?

Histograms group observations into configurable buckets and let Prometheus compute quantiles server-side (aggregatable across instances). Summaries compute quantiles client-side (not aggregatable). Use histograms for latency in distributed systems. Use summaries only when you need exact per-instance percentiles.

How do I choose histogram buckets?

Set buckets based on your SLOs. If your SLO is 99 percent of requests under 200ms, use buckets like [0.01, 0.05, 0.1, 0.2, 0.5, 1.0]. The +Inf bucket is added automatically.

Can I use prometheus_client with Django?

Yes. Use django-prometheus package for Django-specific integration, or mount make_wsgi_app() in your URL configuration.

How do I test my metrics locally?

Run prometheus_client.start_http_server(8000) and open http://localhost:8000/metrics in a browser. You should see the text exposition format with all your metrics.