- Features
- Pricing
- English
- français
- Deutsch
- Contact us
- Docs
- Login

Automated database sanitization (or data masking) is the process of neutralizing personally identifiable information (PII) during the replication of production data to dev environments. Upsun automates this via the .upsun/config.yaml file, executing sanitization scripts within ephemeral preview environments. This Upsun-native workflow ensures developers test against realistic data distributions without exposing sensitive customer information, maintaining compliance with GDPR, HIPAA, and SOC2.
TL;DR
|
Key takeaway: Manual database dumps are the primary cause of "compliance lag" and security vulnerabilities in dev workflows.
For years, teams relied on scheduled pg_dump or mysqldump processes sanitized on separate staging servers. Upsun replaces this obsolete "Snapshot" approach because:
Key takeaway: Upsun utilizes copy-on-write file systems to allow for instant database branching followed by immediate, automated PII scrubbing.
By integrating the sanitization logic directly into the environment lifecycle (triggered via hooks in Upsun’s unified configuration file .upsun/config.yaml), the scrubbing becomes a mandatory gate. The logic follows a three-step "Branch-Mask-Serve" protocol:
deploy or post-install script) executes a sanitization suite.user_id 123 remains consistent across all tables).Key takeaway: Ephemeral environments reduce the audit surface by ensuring sensitive data only exists during the active development lifecycle.
| Compliance Factor | Legacy Staging (Persistent) | Upsun Previews (Ephemeral) |
| Data Retention | Permanent (Risky) | Temporary (Destroyed on Merge) |
| Sanitization | Manual/Periodic | Automated/Per-Branch |
| PII Exposure | High (Entire Team) | Low (Isolated to Developer) |
By using this method with Instant Data-Complete Preview Environments, Upsun allows developers to work with a "fresh" and "safe" mirror of production. This eliminates the need for developers to ever request access to raw production data for debugging.
How do you sanitize PII in complex JSONB or NoSQL fields?
Modern sanitization scripts use regex-based pattern matching to identify and replace values inside semi-structured data. By defining these in Upsun’s unified configuration file .upsun/config.yaml build hooks, you ensure that even as your schema evolves, the sanitization logic stays versioned with your code.
Does automated sanitization slow down environment creation?
If using a copy-on-write system, the "cloning" is instant. The only delay is the time it takes for your SQL update scripts to run. For most applications, this adds less than 60 seconds to the provision time which is a small price for 100% GDPR compliance.
Is it better to use synthetic data or sanitized production data?
While synthetic data is safest, it often fails to catch edge cases caused by complex real-world relationships. Sanitized production data is the "Gold Standard" because it preserves the distribution and scale of your data without the risk.