Modern IT landscapes do not break because people are careless.
They break because the systems they operate grow in complexity faster than teams can understand them.

Over the years, across insurance, finance, logistics and public-sector environments, the same patterns appear again and again. Different organizations, different architectures, different tools — and yet the pain is nearly identical.

This article outlines the most common pain points that shape the daily reality of operations teams. Not the superficial ones, but the structural forces that make modern infrastructure hard to understand, hard to predict, and sometimes hard to trust.

It is the foundation for the rest of this series.

1. Drift Everywhere Link to heading

Drift is not an exception. Drift is the baseline.

Hotfixes applied under pressure
Manual fixes during incidents
Side effects from patches
New defaults after an update
Differing histories between servers
Different privileges or data sets per environment

Once two machines diverge even slightly, they continue drifting apart.
This happens silently and continuously, and most teams only notice it when a deployment behaves differently “for no obvious reason”.

Drift is not a bug.
It is the natural evolution of long-lived systems.

2. Defaults Nobody Sees Link to heading

Defaults sound harmless, but in real environments they shape more behavior than explicit configuration.

A default:

may be different per OS version
may change with a patch
may come from a library
may be inherited from a global config
may only apply if a value is missing
may be deprecated but still active
may be undocumented entirely

Defaults are a hidden configuration layer — and often the most influential one.

Ignoring defaults means ignoring part of the runtime architecture.

3. Configuration Layering That Nobody Fully Tracks Link to heading

Most people think configuration lives in the file they edit.

But in reality, configuration is distributed across layers:

local configs
global configs
inherited configs
machine settings
environment variables
fallback logic
secrets stores
registry
application-level runtime state

Each layer merges into the next.
Precedence rules are often undocumented or only understood by one senior engineer.
And the effective configuration — the one that actually runs — may not appear in any file.

This layering is powerful, but also where most surprises come from.

4. Documentation Lag (CMDB, Wikis, Tickets) Link to heading

Documentation is always written in past tense — it reflects what someone believed was true when they wrote it.

By the time it is consumed:

new changes were applied
a shortcut was taken
a workaround stayed in place
a hotfix was forgotten
a version bumped a default
a dependency changed its behavior

But the drift is not only technical.
Organizational factors accelerate the problem:

responsibility changes
team boundaries shift
ownership becomes unclear
documentation tasks lose priority
processes grow faster than people can maintain them

Even when documentation exists, it rarely reflects reality.

This raises a deeper question: what is the real intent of documentation?

In theory, documentation should describe the system as it is.
In practice, most documentation describes the system as people believed it was, intended it to be, or want it to be.

The longer the distance between documentation and runtime behaviour, the more documentation becomes opinion rather than fact.

5. Too Many Tools, Not Enough Understanding Link to heading

Most organizations react to complexity with tools:

more monitoring
more dashboards
more logs
more alerts
more pipelines
more scanners

And yet the problem remains:
Tools observe symptoms, not causes.

A dashboard shows that the application slowed down.
It does not show that a default timeout changed.
Or that one machine drifted.
Or that a config layer overrode another.
Or that an implicit fallback kicked in after a dependency updated.

You cannot visualize what you cannot see.

6. The Human Factor: Tribal Knowledge Link to heading

Companies rely heavily on unwritten knowledge:

“This server is special because…”
“We never upgrade that one component…”
“Production uses a different default…”
“That setting must never be touched…”

This knowledge is often:

undocumented
incomplete
outdated
lost when key people leave
contradicted by reality

Operational understanding is fragmented, and very few people can explain the system end to end.
Not because they lack skill — but because the system has grown beyond individual comprehension.

7. Slow Feedback Loops Link to heading

Many problems in infrastructure last longer than they should because feedback loops are slow:

A change is applied
Nobody knows the immediate effect
The real effect becomes visible weeks later
Symptoms appear in a different area
Teams assume the cause is elsewhere

Complex systems don´t break directly.
They break through delayed interactions.

A weak feedback loop allows small errors to accumulate silently.

8. Regulatory Pressure and Process Saturation Link to heading

In regulated environments, processes accumulate over time:

mandatory approvals
mandatory documentation
mandatory change tickets
mandatory checks

Every item is reasonable in isolation.
But together they create:

process bottlenecks
reduced speed
reduced visibility
unmaintained documentation
people working “around” the process

Complexity moves faster than processes can adapt.

9. The Illusion of Stable Systems Link to heading

Teams often assume that systems remain in the state they were installed or deployed in.

This is never fully true.

Long-lived systems accumulate:

decisions
defaults
drift
patches
dependencies
exceptions
workarounds

However — and this is important — the degree of drift strongly correlates with the degree of automation.

The more a system relies on:

repeatable pipelines
declarative configuration
consistent provisioning
automated reconciliation
automated rollout and rollback
enforced defaults
standardized base images

…the less room there is for unintentional divergence between environments.

Automation does not eliminate drift.
But it constrains it.

Manual processes, emergency fixes and local variations introduce entropy.
Automation reduces the surface area where entropy can enter.

Fully automated systems drift slowly.
Semi-automated systems drift continuously.
Manually operated systems drift immediately.

Understanding this relationship is key:
Drift is not a mystery — it is the natural side effect of how changes enter a system.

Closing Link to heading

The pain landscape of modern infrastructure is not caused by individual mistakes.
It is caused by the natural behavior of complex systems: drift, defaults, layering, undocumented interactions, and constantly shifting realities.

This complexity cannot be eliminated.
But it can be understood — if we look at the underlying mechanics instead of just the symptoms.

This series continues by exploring those mechanics one by one, starting with how systems behave differently than intended.

Follow-Up Questions Link to heading

These questions emerged during writing and will be addressed in later articles:

How does the operating model influence drift? For example: do Windows Server Core systems drift less than GUI servers because GUI comfort encourages manual changes?
How can effective configuration be extracted reliably across all layers (file, registry, default, runtime)?
How can we measure drift objectively and compare environments?
What telemetry sources reflect real system behaviour?
How does automation change the geometry of drift over time?

Get in touch

Email me: starttalking@sh-soft.de