Monitoring vs Business Outcomes

Monitoring said everything was fine

In many environments, monitoring is treated as the primary source of truth.

If all checks are green, the system is assumed to be healthy.

In practice, this assumption can be dangerously wrong.

The situation

In one environment, a digital content platform experienced a complete loss of revenue for an entire day.

No sales were recorded.

At the same time, all monitoring systems reported that everything was functioning normally.

servers were up
web services were running
databases were operational
system resources were healthy

From an infrastructure perspective, everything appeared to be in order.

From a business perspective, the system was effectively down.

What was actually happening

When visiting the site, all pages rendered as blank.

The issue was caused by a coding error in a shared PHP include file.

Because that file was included across multiple parts of the application, the result was that all output became empty.

Technically:

Apache was running
PHP was executing
databases were responding

But the application produced no usable output.

Why monitoring failed

The monitoring setup was comprehensive — but it focused on system health, not business outcomes.

It checked:

server availability
service uptime
database connectivity
resource usage

All of these checks were correct.

What it did not check was:

> Does the system actually produce usable output?

This is a common gap.

Monitoring systems often validate that components are running, but not that the system is delivering value.

The impact

The impact extended beyond lost sales.

Affiliate partners continued sending traffic and expected compensation based on normal conversion rates.

This created both financial loss and reputational risk.

The fix

The solution required changes in two areas.

1. Reduce the blast radius

The problematic code was refactored into a separate function.

This ensured that a failure in that component would not take down the entire application.

In the worst case, a non-critical feature would fail, but core functionality would remain available.

2. Monitor business outcomes

A new monitoring approach was introduced.

Instead of only checking system health, the system began tracking:

conversions per product
conversions per mobile provider
conversions per country
time-based conversion patterns

These values were compared against expected ranges.

If current conversions deviated significantly from normal behavior, an alert was triggered.

The result

This approach proved highly effective.

In some cases, the system detected issues in external systems before those systems identified them internally.

The original failure scenario — where the entire site produced empty output without triggering alerts — did not occur again.

The lesson

Infrastructure monitoring should answer two questions:

Are systems running?
Is the business functioning?

If the second question is not covered, critical failures can remain invisible.

The most expensive outages are often not caused by systems going down, but by systems continuing to run without delivering value.

Closing thought

If your monitoring focuses primarily on system health, it may be missing the signals that matter most.

A structured infrastructure assessment can help identify these gaps and define monitoring approaches that reflect real operational and business risk.

A real-life experience from Harold Snippe

Infrastructure reliability, Linux engineering and operational security consultant focused on cross-system production issues, operational risk reduction and infrastructure troubleshooting.

Next step

Get clarity on your infrastructure risks before they become expensive

A short conversation is usually enough to see whether hidden risks, unclear priorities or unresolved trade-offs are putting your environment under pressure.

Discuss your situation