HAProxy Monitoring: What Breaks in Production (Metrics That Actually Matter)

Article Map

TL;DR

"Process up" monitoring is operationally meaningless for HAProxy. The HAProxy process can be running while the service is completely unusable-all backends down, queues saturated, or reloads stuck. This guide covers the metrics that actually matter: queue depth as the earliest failure signal, retry storms that mask degradation, backend health flapping, latent saturation that averages hide, TLS handshake bottlenecks, and alerts that wake you up for the right reasons.

Queue depth is the earliest failure signal-it increases before error rates spike
Retry storms amplify backend load and mask partial failures until cascading collapse
Backend health flapping causes traffic shifts that overload healthy backends
Percentiles reveal saturation that averages hide-p99 latency matters, not averages
File descriptor exhaustion happens before process crashes-monitor resource limits
Alert on queue depth, per-backend health, and resource usage-not process status

For HAProxy architecture patterns and failure modes, see our production operations guide. For safe HAProxy automation strategies and HAProxy on Kubernetes deployment patterns, see our deep-dive guides.

Why "Process Up" Is a Useless Signal

The HAProxy process can be healthy while the service is unusable. We've seen incidents where process monitoring showed HAProxy running, but all backends were down. We've seen incidents where the process was running, but queues were saturated. We've seen incidents where the process was running, but reloads were stuck.

Process monitoring tells you the binary is executing. It doesn't tell you if HAProxy is serving traffic, if backends are healthy, or if queues are building. You need application-level monitoring that checks if HAProxy is actually working.

We've seen teams rely on "process up" checks and miss failures. In one incident, HAProxy was running, but TLS certificates had expired. The process accepted connections, but TLS handshakes failed. Health checks that didn't use TLS passed, masking the failure until users reported errors.

In another incident, HAProxy was running, but all backends were marked down due to health check misconfiguration. The process was healthy, but no traffic could be served. Process monitoring showed green, but the service was completely unavailable.

Process monitoring is necessary but not sufficient. You need to monitor queue depth, backend health, connection counts, and resource usage. These metrics reveal failures that process status hides.

Real Incident:

HAProxy process was running, but file descriptors were exhausted. New connections failed silently. Process monitoring showed healthy, but the service was degraded. Monitoring file descriptor usage would have caught this before users noticed.

Queue Depth: The Earliest Signal You're Already Missing

Queue buildup is the earliest failure signal. When backends slow down or fail, requests queue in HAProxy. Queue depth increases before error rates spike. We monitor queue depth as a leading indicator of backend problems.

Queue buildup happens when backends respond slowly, causing requests to queue. It happens when backends fail health checks, reducing available capacity. It happens when connection limits are reached, preventing new connections. It happens during reloads when connection draining can't keep up with new connection rates.

Frontend vs Backend Queues

Frontend queues are requests waiting for backend capacity. When all backends are busy, new requests queue in the frontend. Frontend queue depth is the earliest indicator of capacity problems.

Backend queues are requests waiting for specific backend responses. When a backend slows down, requests queue waiting for that backend. Backend queue depth indicates per-backend problems.

Both matter, but frontend queues are the earliest indicator. When frontend queue depth increases, capacity is constrained. When backend queue depth increases, specific backends are degraded.

We've seen teams miss queue buildup because they only monitor error rates. By the time error rates spike, queues are saturated and recovery takes longer. Monitor queue depth, not just error rates.

Key Takeaway:

Queue depth increases before latency becomes visible. Monitor frontend and backend queue depth as leading indicators. Alert when queue depth exceeds ~10% of maximum queue capacity as a starting point, adjusted per workload and traffic patterns.

Retry Storms and Silent Amplification

Retry storms occur when applications retry failed requests, amplifying backend load. HAProxy health checks can trigger retry storms when they're too aggressive. We've seen health checks that mark backends down, causing applications to retry, which overloads backends, which fail health checks, creating a feedback loop.

How retries mask failures: Applications retry failed requests, increasing backend load. When backends are partially degraded, retries amplify load on remaining healthy backends. This amplification can cause cascading failures that look like sudden collapse but were building over time.

Silent Amplification Under Partial Backend Degradation

When some backends are down, traffic shifts to healthy backends. Healthy backends become overloaded and start failing. Retries from failed backends amplify load on remaining healthy backends. This creates a cascading failure that looks sudden but was building.

We've seen this pattern repeatedly. In one incident, 30% of backends failed. Traffic shifted to healthy backends. Applications retried failed requests, amplifying load. Healthy backends became overloaded and failed. Within minutes, all backends were down.

How to detect: Monitor health check state changes, retry rates, and backend load distribution. Rapid health check state changes indicate flapping. High retry rates indicate amplification. Uneven backend load distribution indicates traffic shifting.

Real Incident:

Health check sensitivity caused a retry storm that took down the entire backend pool. Health checks marked backends down too aggressively. Applications retried, amplifying load. Within 5 minutes, all backends were down. Tuning health check intervals and thresholds prevented recurrence.

Backend Health Flapping

Backend health flapping happens when health checks are too sensitive. Backends that are slow but healthy get marked down, then marked up, then marked down again. This flapping causes traffic to shift between backends, amplifying load on healthy backends.

False Positives vs Slow Backends

False positives occur when backends are healthy but health checks fail. Network issues, check timeouts, or overly aggressive check intervals cause false positives. False positives reduce available capacity unnecessarily.

Slow backends are actually degraded but health checks pass. Check intervals that are too long or thresholds that are too lenient cause slow backends to pass health checks. Slow backends degrade user experience without triggering alerts.

Both cause problems but require different solutions. False positives require tuning health check intervals and timeouts. Slow backends require stricter health check thresholds and response time monitoring.

We've seen health check flapping during traffic spikes cause cascading failures. Backends that were slow but healthy got marked down. Traffic shifted to remaining backends, overloading them. Those backends got marked down, causing more traffic shifts. This created a feedback loop that took down the service.

How to tune: Health check intervals, thresholds, and failure counts determine sensitivity. Longer intervals reduce flapping but delay failure detection. Shorter intervals detect failures faster but increase flapping risk. Finding the right balance requires understanding your backend behavior under load.

Monitor health check state changes: Rapid state changes indicate flapping. We alert when backend state changes exceed ~5 per minute in many environments, though this threshold varies with backend count and health check frequency. This catches flapping before it causes cascading failures.

Tuning Guidelines:

Start with health check intervals of 2-5 seconds, failure thresholds of 2-3 consecutive failures, and success thresholds of 2-3 consecutive successes as initial defaults. Adjust based on backend behavior, network latency, and application response times. Monitor state change rates to detect flapping.

Latent Saturation

Latent saturation occurs when aggregate metrics look healthy, but individual components are saturated. We've seen incidents where average response times were normal, but p99 response times were spiking. We've seen incidents where overall error rates were low, but specific backends were failing.

Why Averages Lie

Aggregate metrics hide problems. Average response times can be normal while tail latency is spiking. Overall error rates can be low while specific routes are failing. Overall queue depth can be normal while specific frontends are saturated.

Averages smooth out outliers. When 90% of requests are fast and 10% are slow, averages look healthy. But those slow requests degrade user experience. Percentiles reveal these outliers.

We've seen incidents where average metrics looked healthy minutes before complete failure. Average response times were 50ms, but p99 response times were 5 seconds. Average queue depth was 10, but p99 queue depth was 1000. These percentiles indicated saturation before averages showed problems.

Why Percentiles Matter

Percentiles reveal tail latency that averages hide. p50 response times can be normal while p99 indicates saturation. p95 queue depth can be normal while p99 shows queue buildup. Percentiles catch problems before averages.

Monitor per-backend metrics, per-route metrics, and percentile metrics. p50 response times show typical behavior. p95 response times show most requests. p99 response times show tail latency that affects user experience.

Latent saturation indicates capacity problems before they become outages. When p99 latency increases, capacity is constrained. When p99 queue depth increases, queues are building. These are early warning signs that averages miss.

Real Incident:

Average metrics looked healthy minutes before failure. Average response time was 50ms, but p99 was 5 seconds. Average queue depth was 10, but p99 was 1000. By the time averages showed problems, queues were saturated and recovery required reducing traffic. Monitoring percentiles would have caught this earlier.

TLS and Connection Exhaustion Failures

TLS handshake bottlenecks and file descriptor exhaustion cause failures that process monitoring misses. HAProxy can be "healthy" but unable to accept new connections when these limits are reached.

TLS Handshake Bottlenecks

TLS handshakes are CPU-intensive. Under high connection rates, handshakes queue. Handshake queue depth increases before connection errors. Certificate validation, key exchange, and cipher negotiation all add latency.

When TLS handshake rates exceed CPU capacity, handshakes queue. Queue depth increases, causing connection delays. Eventually, handshakes timeout, causing connection failures. This happens before process crashes or obvious errors.

We've seen incidents where TLS handshake queues saturated during traffic spikes. Connection acceptance rates dropped, but process monitoring showed healthy. Monitoring TLS handshake queue depth would have caught this earlier.

File Descriptor Exhaustion

Each connection consumes a file descriptor. When file descriptor limits are reached, new connections fail. This happens before process crashes or obvious errors. HAProxy can be "healthy" but unable to accept new connections.

File descriptor limits often default to 1024 or 4096 per process unless explicitly tuned. Under high connection rates, these limits are reached quickly. When limits are reached, new connections fail silently. Process monitoring shows healthy, but the service is degraded.

We've seen incidents where file descriptor exhaustion during traffic spikes caused silent connection failures. Process monitoring showed healthy, but users couldn't connect. Monitoring file descriptor usage would have caught this before users noticed.

Resource limits that metrics don't capture: connection limits, file descriptor limits, memory limits. When these limits are reached, failures occur suddenly. Monitoring resource usage, not just application metrics, prevents these surprises.

Monitoring Guidelines:

Monitor file descriptor usage, TLS handshake queue depth, and connection acceptance rates. Alert when file descriptor usage exceeds ~80% of limit as a starting point, adjusted based on connection patterns. Alert when TLS handshake queue depth exceeds ~100 depending on workload and CPU capacity. Alert when connection acceptance rate drops below baseline.

Alerts That Actually Wake You Up for the Right Reasons

What to alert on: actionable, early signals that indicate problems before they become outages. What not to alert on: noise, lagging indicators that don't help prevent incidents.

What to Alert On

Queue depth exceeding thresholds: Alert when frontend or backend queue depth exceeds ~10% of maximum queue capacity as a starting point, adjusted per workload. This catches capacity problems early.

Backend health state changes: Alert on rapid state changes (more than ~5 per minute in many environments) to detect flapping. Alert when backends are marked down to catch failures early.

File descriptor usage: Alert when file descriptor usage exceeds ~80% of limit as a starting point, adjusted based on connection patterns. This catches resource exhaustion before connections fail.

TLS handshake queue depth: Alert when TLS handshake queue depth exceeds ~100 depending on workload and CPU capacity. This catches TLS bottlenecks before connections fail.

Per-backend error rates: Alert when any backend error rate exceeds threshold. Aggregate error rates hide per-backend problems.

Percentile latency: Alert on p95 and p99 latency exceeding thresholds. Averages hide tail latency that affects user experience.

Connection acceptance rate: Alert when connection acceptance rate drops below baseline. This catches connection failures early.

What Not to Alert On

Process up/down: Meaningless without context. Process can be up while service is degraded. Use application-level health checks instead.

Aggregate error rates: Hides per-backend problems. Alert on per-backend error rates instead.

Average latency: Hides tail latency that affects user experience. Alert on percentile latency instead.

Overall connection counts: Doesn't indicate saturation. Alert on queue depth and connection acceptance rates instead.

Real Incident:

Too many alerts on meaningless metrics caused alert fatigue. Team ignored alerts because most were false positives. When queue depth increased, alerts were missed. Focusing on actionable metrics reduced alert noise and improved incident response.

Alert Thresholds That Matter:Queue depth: Alert when > ~10% of max queue capacity (starting point, adjust per workload)
Backend health: Alert on state changes, not just down state
File descriptors: Alert when > ~80% of limit (starting point, adjust per connection patterns)
Latency: Alert on p99, not average
Connection acceptance: Alert when rate drops below baseline

Frequently Asked Questions

Everything you need to know about HAProxy monitoring metrics that matter

Are HAProxy reloads safe for zero-downtime deployments? Production

HAProxy reloads use connection draining, but they're not guaranteed zero-downtime. When new connections arrive faster than old connections drain, queues build and connections can drop. Long-lived connections can prevent draining from completing, causing timeouts. For true zero-downtime requirements, use canary reloads and staged rollouts, or alternatives with hot reload capabilities like Envoy.

Why is "process up" monitoring insufficient for HAProxy? Monitoring

The HAProxy process can be running while the service is degraded. We've seen incidents where HAProxy was running but all backends were down, queues were saturated, or reloads were stuck. Process monitoring only confirms the binary is executing-it doesn't indicate if HAProxy is serving traffic, if backends are healthy, or if queues are building. Monitor queue depth, per-backend health, and connection counts as leading indicators.

How does HAProxy behave differently in Kubernetes compared to traditional deployments? Kubernetes

HAProxy works in Kubernetes but behaves like a stateful edge component forced into a dynamic scheduler. Pod restarts drop in-flight connections unless you configure termination grace periods and readiness probes for connection draining. ConfigMap updates propagate asynchronously, causing routing inconsistencies during updates. HAProxy requires manual configuration or custom controllers-it's not an ingress controller and doesn't auto-discover services. Expect config propagation delays and plan for pod lifecycle management. See our HAProxy on Kubernetes guide for detailed deployment patterns.

Why should I monitor queue depth instead of just error rates? Monitoring

Queue depth is a leading indicator-it increases before error rates spike. When backends slow down or fail, requests queue in HAProxy. By the time error rates increase, queues are already saturated and recovery takes longer. Monitor frontend and backend queue depth as early warning signals. Alert when queue depth exceeds ~10% of maximum queue capacity as a starting point, adjusted per workload.

Why do percentiles matter more than averages for HAProxy monitoring? Monitoring

Averages hide tail latency that affects user experience. When 90% of requests are fast and 10% are slow, averages look healthy, but those slow requests degrade user experience. Percentiles reveal saturation that averages hide: p50 shows typical behavior, p95 shows most requests, p99 shows tail latency. We've seen incidents where average latency was 50ms but p99 was 5 seconds-percentiles caught the problem before averages.

What resource limits should I monitor for HAProxy? Monitoring

Monitor file descriptor usage, TLS handshake queue depth, and connection acceptance rates. File descriptor exhaustion causes silent connection failures-alert when usage exceeds ~80% of limit as a starting point, adjusted based on connection patterns. TLS handshake bottlenecks occur when handshake rates exceed CPU capacity-alert when handshake queue depth exceeds ~100 depending on workload and CPU capacity. Connection acceptance rate drops indicate connection failures-alert when rate drops below baseline. These resource limits cause failures that process monitoring misses.

How do I prevent backend health check flapping? Monitoring

Health check flapping occurs when checks are too sensitive. Backends that are slow but healthy get marked down, then up, then down again. Tune health check intervals (start with 2-5 seconds as initial defaults), failure thresholds (2-3 consecutive failures), and success thresholds (2-3 consecutive successes). Monitor health check state changes-alert when backend state changes exceed ~5 per minute in many environments, though this varies with backend count. Adjust thresholds based on backend behavior under load. False positives require longer intervals; slow backends require stricter thresholds.

Conclusion

Monitoring HAProxy requires understanding what breaks before metrics show problems. "Process up" monitoring is operationally meaningless. Queue depth, backend health, and resource usage reveal failures that process status hides.

The metrics that actually matter are the ones that catch failures early: queue depth as the earliest signal, retry storms that amplify degradation, backend health flapping, latent saturation that percentiles reveal, TLS handshake bottlenecks, and file descriptor exhaustion. Alert on these, not process status.

This guide covers monitoring approaches we've seen prevent incidents in production. Use it as a starting point, but adapt alert thresholds and monitoring strategies to your specific traffic patterns and backend behavior. The key is monitoring the right signals before they become outages. See also Why HAProxy Outages Are Invisible Until It's Too Late for why experienced teams still miss failures even with dashboards and alerts.

HAProxy Monitoring: What Breaks in Production (Metrics That Actually Matter)

Article Map

TL;DR

Why "Process Up" Is a Useless Signal

Queue Depth: The Earliest Signal You're Already Missing

Frontend vs Backend Queues

Retry Storms and Silent Amplification

Silent Amplification Under Partial Backend Degradation

Backend Health Flapping

False Positives vs Slow Backends

Latent Saturation

Why Averages Lie

Why Percentiles Matter

TLS and Connection Exhaustion Failures

TLS Handshake Bottlenecks

File Descriptor Exhaustion

Alerts That Actually Wake You Up for the Right Reasons

What to Alert On

What Not to Alert On

Frequently Asked Questions

Conclusion

Related HAProxy Deep Dives

Need Help with HAProxy Monitoring?

HAProxy Monitoring: What Breaks in Production (Metrics That Actually Matter)

Article Map

TL;DR

Why "Process Up" Is a Useless Signal

Queue Depth: The Earliest Signal You're Already Missing

Frontend vs Backend Queues

Retry Storms and Silent Amplification

Silent Amplification Under Partial Backend Degradation

Backend Health Flapping

False Positives vs Slow Backends

Latent Saturation

Why Averages Lie

Why Percentiles Matter

TLS and Connection Exhaustion Failures

TLS Handshake Bottlenecks

File Descriptor Exhaustion

Alerts That Actually Wake You Up for the Right Reasons

What to Alert On

What Not to Alert On

Frequently Asked Questions

Conclusion

Related HAProxy Deep Dives

Need Help with HAProxy Monitoring?

Let's Get Started