Performance Bottlenecks in a Travel Platform

A real-world case of identifying and resolving performance bottlenecks in a distributed travel platform under load.

More servers, same problem

In performance-critical systems, the first instinct is often to add capacity.

More servers, more processing power, more throughput.

But sometimes, performance problems are not caused by lack of resources.

They are caused by inefficiencies that scale with the system.

---

The situation

A travel platform relied heavily on meta search engines such as Kayak and Skyscanner.

To be included in search results, responses needed to be delivered within a strict time window.

If results arrived too late, they were simply ignored.

This meant:

Despite significant infrastructure investments, performance remained insufficient.

The system was unable to consistently deliver results within the required time frame.

---

What had already been done

The organization had already taken several steps:

Yet the result remained the same.

On average, each server generated only a few successful transactions per day during peak periods.

---

What was actually happening

The issue was not a single bottleneck.

It was a combination of inefficiencies across the system.

1. Network routing issue

An MPLS connection had been provisioned for faster communication with the aggregator.

However:

> due to routing or access control configuration, traffic was still using the public internet

The intended optimization existed, but was not actually used.

---

2. Internal network congestion

The internal network was handling both:

This created contention and reduced effective throughput.

---

3. Limited network capacity

The network infrastructure was built for 100 Mbps.

Combined with shared usage and lack of segmentation, this led to:

---

4. Lack of caching

Data that rarely changed was repeatedly requested:

Even local systems were overloaded with unnecessary repeated queries.

---

The fix

The solution was not a single change, but a set of targeted improvements:

---

The result

After these changes:

No additional servers were required.

---

The lesson

Performance problems are often not caused by insufficient capacity.

They are caused by:

Adding more servers can increase cost without solving the problem.

Understanding how the system behaves is more important than increasing its size.

---

Closing thought

If your system struggles to meet performance targets despite scaling efforts, the issue may not be how much infrastructure you have.

It may be how efficiently it is used.

A structured assessment can help identify these inefficiencies and turn them into measurable improvements.

Need help turning infrastructure risk into a practical plan?

I help teams prioritize remediation, harden platforms, and reduce risk without adding operational chaos.

Book a discovery call