Performance Bottlenecks in a Travel Platform

More servers, same problem

In performance-critical systems, the first instinct is often to add capacity.

More servers, more processing power, more throughput.

But sometimes, performance problems are not caused by lack of resources.

They are caused by inefficiencies that scale with the system.

The situation

A travel platform relied heavily on meta search engines such as Kayak and Skyscanner.

To be included in search results, responses needed to be delivered within a strict time window.

If results arrived too late, they were simply ignored.

This meant:

no visibility to customers
no opportunity to sell
ongoing cost without return

Despite significant infrastructure investments, performance remained insufficient.

The system was unable to consistently deliver results within the required time frame.

What had already been done

The organization had already taken several steps:

provisioned additional servers
implemented an MPLS connection to the aggregator (Amadeus)
scaled infrastructure to handle more requests

Yet the result remained the same.

On average, each server generated only a few successful transactions per day during peak periods.

What was actually happening

The issue was not a single bottleneck.

It was a combination of inefficiencies across the system.

1. Network routing issue

An MPLS connection had been provisioned for faster communication with the aggregator.

However:

> due to routing or access control configuration, traffic was still using the public internet

The intended optimization existed, but was not actually used.

2. Internal network congestion

The internal network was handling both:

application traffic
storage traffic (iSCSI)

This created contention and reduced effective throughput.

3. Limited network capacity

The network infrastructure was built for 100 Mbps.

Combined with shared usage and lack of segmentation, this led to:

congestion
collisions
unpredictable latency

4. Lack of caching

Data that rarely changed was repeatedly requested:

airline information
routes
aircraft types
airport data

Even local systems were overloaded with unnecessary repeated queries.

The fix

The solution was not a single change, but a set of targeted improvements:

correct routing to ensure the MPLS connection was actually used
separate storage traffic from application traffic
address internal network limitations
introduce caching for static and low-change data

The result

After these changes:

response times decreased by several seconds
requests were processed within the required time window
results were included by meta search engines
ticket sales increased significantly

No additional servers were required.

The lesson

Performance problems are often not caused by insufficient capacity.

They are caused by:

misconfigured infrastructure
shared resources under contention
unnecessary repeated work
hidden inefficiencies across system layers

Adding more servers can increase cost without solving the problem.

Understanding how the system behaves is more important than increasing its size.

Closing thought

If your system struggles to meet performance targets despite scaling efforts, the issue may not be how much infrastructure you have.

It may be how efficiently it is used.

A structured assessment can help identify these inefficiencies and turn them into measurable improvements.

A real-life experience from Harold Snippe

Infrastructure reliability, Linux engineering and operational security consultant focused on cross-system production issues, operational risk reduction and infrastructure troubleshooting.

Next step

Get clarity on your infrastructure risks before they become expensive

A short conversation is usually enough to see whether hidden risks, unclear priorities or unresolved trade-offs are putting your environment under pressure.

Discuss your situation