More servers, same problem
In performance-critical systems, the first instinct is often to add capacity.
More servers, more processing power, more throughput.
But sometimes, performance problems are not caused by lack of resources.
They are caused by inefficiencies that scale with the system.
---
The situation
A travel platform relied heavily on meta search engines such as Kayak and Skyscanner.
To be included in search results, responses needed to be delivered within a strict time window.
If results arrived too late, they were simply ignored.
This meant:
- no visibility to customers
- no opportunity to sell
- ongoing cost without return
Despite significant infrastructure investments, performance remained insufficient.
The system was unable to consistently deliver results within the required time frame.
---
What had already been done
The organization had already taken several steps:
- provisioned additional servers
- implemented an MPLS connection to the aggregator (Amadeus)
- scaled infrastructure to handle more requests
Yet the result remained the same.
On average, each server generated only a few successful transactions per day during peak periods.
---
What was actually happening
The issue was not a single bottleneck.
It was a combination of inefficiencies across the system.
1. Network routing issue
An MPLS connection had been provisioned for faster communication with the aggregator.
However:
> due to routing or access control configuration, traffic was still using the public internet
The intended optimization existed, but was not actually used.
---
2. Internal network congestion
The internal network was handling both:
- application traffic
- storage traffic (iSCSI)
This created contention and reduced effective throughput.
---
3. Limited network capacity
The network infrastructure was built for 100 Mbps.
Combined with shared usage and lack of segmentation, this led to:
- congestion
- collisions
- unpredictable latency
---
4. Lack of caching
Data that rarely changed was repeatedly requested:
- airline information
- routes
- aircraft types
- airport data
Even local systems were overloaded with unnecessary repeated queries.
---
The fix
The solution was not a single change, but a set of targeted improvements:
- correct routing to ensure the MPLS connection was actually used
- separate storage traffic from application traffic
- address internal network limitations
- introduce caching for static and low-change data
---
The result
After these changes:
- response times decreased by several seconds
- requests were processed within the required time window
- results were included by meta search engines
- ticket sales increased significantly
No additional servers were required.
---
The lesson
Performance problems are often not caused by insufficient capacity.
They are caused by:
- misconfigured infrastructure
- shared resources under contention
- unnecessary repeated work
- hidden inefficiencies across system layers
Adding more servers can increase cost without solving the problem.
Understanding how the system behaves is more important than increasing its size.
---
Closing thought
If your system struggles to meet performance targets despite scaling efforts, the issue may not be how much infrastructure you have.
It may be how efficiently it is used.
A structured assessment can help identify these inefficiencies and turn them into measurable improvements.