- Features
- Pricing
- English
- français
- Deutsche
- Contact us
- Docs
- Login

In the modern dev stack, we have mastered the art of the deploy.
We have CI/CD pipelines that ship code in minutes and observability dashboards that track every millisecond of latency. Yet, when a P0 incident strikes, the most common phrase in Slack isn’t a solution; it’s "I can’t reproduce this locally."
This is the Reproduction Gap.
Most engineering teams are world-class at building and monitoring, but they are remarkably fragile at recreating runtime behaviour.
Without an identical environment, debugging becomes a manual forensic task where the variables change every time a developer attempts a fix.
Solving this requires more than just better logs; it requires an architecture where production reproduction is a standard, automated skill rather than a senior-level manual chore.
When a developer says they can’t reproduce a bug, they aren't complaining about a lack of skill. They are pointing to a structural failure of environment parity.
According to our engineering teams, the "Repro Gap" is usually caused by the drift of three specific variables:
The result is an investigative gap where 80% of the triage time is spent attempting to see the bug happen.
To close this gap, teams are moving toward instant environment cloning to automate the plumbing and move directly to the resolution.
The inability to reproduce a bug instantly creates an investigative gap that can last days. For subtle or user-specific issues, reproduction becomes nearly impossible without a detailed "interview" with the user to figure out exactly which variables need to be replicated.
Because reproduction is manual and fragile, most teams default to "debugging in production." They push a fix and hope the live environment likes it. T
his leads to a cycle of creating junk data in production that can't be deleted, as developers run several iterations of a "test fix" against live databases.
Every manual database export/import cycle to reset a test environment can eat several minutes per iteration, effectively killing the developer's "flow state." (You can explore how instant environment cloning automates this plumbing to move directly to the fix.)
Why don't teams just spin up a fresh environment for every bug?
If you aren't using a containerized, automated environment, provisioning the required services is non-trivial; you’re manually installing software and duplicating configs. Even in advanced K8s setups, cloning production data quickly is a manual chore that often relies on slow, custom scripts.
But "close enough" is what creates the Incident Hangover.
When you can’t reproduce a bug in isolation, you work slowly because you’re afraid of the "Safety Paradox": the fear that an experimental fix might accidentally trigger a production email or corrupt a shared database.
True speed comes from the confidence that your environment is a 100% isolated, disposable clone of the production "crime scene."
For more info: Learn how to move from "hope-based" security to automated, versioned truth. Read the YAML configuration overview.
Reproduction shouldn't be a senior-level "magic trick." It should be a standard, automated part of your workflow.
.upsun/config.yaml to ensure your dev, staging, and production environments are replicas. To move even faster, you can standardize these setups with our debugging template packs.Why is reproduction harder than deployment?
Deployment is a one-way street: you are pushing code to a known state. Reproduction is "reverse engineering": you are trying to recreate a complex, stateful moment in time. Without automated cloning, you are forced to rebuild that state manually every time.
Does Upsun help with "Heisenbugs"?
Yes. Because Upsun clones the entire service mesh and configuration alongside the code, the environmental variables that cause Heisenbugs are captured in the clone. The bug has nowhere to hide.
How do we handle the security of production data during reproduction?
Upsun uses automated hooks to scrub sensitive data and neutralize emails during the branching process. You get the realism of production data without the security risk of "debugging in production."
What happens if a fix works in the clone but fails in production?
On Upsun, this is mathematically unlikely. Since the clone and the production environment use the same .upsun/config.yaml and infrastructure-as-code definitions, the runtime behavior is identical.
Can junior developers use this workflow?
Absolutely. By automating the reproduction setup, you lower the barrier to entry for triage. A junior dev can spin up a production clone via a Git branch and start investigating without needing a senior engineer to configure the environment for them.