The Bus Factor Is Not a Theory
In engineering, the “bus factor” is often treated as an abstract concept. “How many people need to disappear before the system becomes unmaintainable?”
In reality, it’s rarely about accidents. It’s about dependency—and sometimes, control.
When Knowledge Becomes Leverage
I once worked with a mobile entertainment company where a single engineer effectively held the infrastructure hostage.
Not intentionally, at least not at first.
He was the only person with deep expertise in a niche operating system: FreeBSD. Over time, all critical services became tightly coupled to that knowledge. There was no documentation. No reproducibility. No shared ownership.
Eventually, the situation evolved into something more problematic:
- Decisions couldn’t be challenged
- Changes couldn’t be reviewed
- Risk couldn’t be mitigated
- And most importantly—nothing could move forward without him
At that point, the issue was no longer technical.
It was organizational risk.
The Illusion of “Irreplaceable”
Organizations sometimes mistake scarcity for value.
“If only one person understands this, they must be essential.”
In reality, that’s not resilience. That’s fragility.
True engineering maturity looks very different:
- Systems can be rebuilt from scratch
- Knowledge is distributed
- Dependencies are minimized
- Platforms are chosen for sustainability, not personal preference
When those principles are absent, you don’t have expertise—you have a bottleneck.
Fixing the Problem Without Confrontation
Direct confrontation rarely works in these situations.
Instead, we reframed the problem:
> “We need to document and ensure this environment can be rebuilt by someone else.”
This was not controversial. It sounded responsible. Necessary, even.
But it created a pathway to change.
From FreeBSD to Reproducible Systems
The strategy was simple:
- Abstract the business logic away from the OS
- Migrate services to Linux, where talent availability is significantly higher
- Package applications into RPMs, making deployments consistent and repeatable
- Introduce rebuildability as a requirement, not an afterthought
There was no CI/CD pipeline at the time. But even without it, packaging alone dramatically improved:
- Consistency
- Recoverability
- Onboarding time
- Operational confidence
Most importantly, it removed the single point of failure.
What Actually Changed
After the transition:
- The system could be rebuilt without tribal knowledge
- Multiple engineers could operate and maintain it
- Decisions became transparent
- Risk decreased significantly
And interestingly, the original engineer’s role became healthier too.
Without the burden of being the only one who “knew everything,” collaboration improved.
The Real Lesson
The bus factor is not about people.
It’s about systems that depend on people instead of processes.
If your infrastructure:
- Cannot be rebuilt from scratch
- Requires “that one person” to fix things
- Lacks documentation or packaging
- Relies on niche, hard-to-replace expertise
Then you don’t have a technical challenge.
You have a business continuity problem.
Final Thought
Resilient systems are not defined by uptime alone.
They are defined by how easily they can be understood, replaced, and recovered.
Because sooner or later, every organization faces the same question:
> “What happens if that person is not available tomorrow?”
If the answer is uncertainty, you already know where to start.