New greener region discount. Save 3% on Upsun resource usage. Learn how.
LoginFree trial
FeaturesPricingBlogAbout us
Blog

What is instant data cloning? The game changer for developing cloud-based applications

DevOpsdatadata cloningpreview environments
13 June 2024
Augustin Delaporte
Augustin Delaporte
VP, Product

What sets Upsun apart, is its unparalleled ability to instantly clone data from a running production application.

This groundbreaking feature equips developers with exact replicas of their production environments within minutes, revolutionizing the way they build, test, and deploy software. 

In this blog post, we'll explore what data cloning is, why this capability is a game-changer, and how it significantly enhances the development process.

The developer's challenge: how to maintain accurate and efficient development environments

To test changes reliably, developers need environments that mirror production setups. However, creating these exact replicas is complex and time-consuming. As a result, developers end up with environments lacking the full functionality of the production environment, reducing their efficacy. 

Without proper database replication, testing is incomplete and unreliable, leading to more work when issues emerge later.

Slow environment setup

Traditional methods of creating accurate development environments require manually configuring services, databases, and dependencies, taking hours or even days. This delays project timelines and reduces productivity. Modern methods often only create environments for static or stateless applications, missing critical components like databases, message queues, and files.

Inconsistent data across environments

Data inconsistencies between development and production environments pose significant challenges. Developers often work with outdated datasets that don't reflect the production state, leading to bugs, faulty tests, and features that fail in production. This wastes time and resources as developers scramble to fix issues.

High costs of maintaining multiple environments

Maintaining multiple environments is both time-consuming and costly, requiring substantial hardware and software resources. This adds a significant financial burden to organizations.

The risk of downtime

In addition to setup and data consistency challenges, there's the risk of system downtime. Quick recovery from system failures or data loss is crucial. Traditional backup and recovery methods are often slow and unreliable, leading to prolonged downtime and severely impacting business operations.

In summary, the developer's dilemma is multi-faceted, and each challenge significantly affects the efficiency and effectiveness of the development process.

The solution: instant data cloning of product environments 

Upsun revolutionizes this process by offering the unique capability to instantly clone an entire production environment, including all critical data and services. 

This feature is a game-changer, providing developers with exact replicas of production environments within minutes.

Why data cloning for development environments matters

Cloning data from a production environment is crucial for quality and speed in software development. Here’s why:

Realistic testing conditions

Cloning the data ensures that the development environment is an exact replica of production, capturing all user interactions, edge cases, and data variations. This realism helps developers identify and address issues that would only surface under real-world conditions, leading to more reliable testing and fewer surprises during deployment.

Isolated development 

Each developer can work in their own isolated environment, eliminating conflicts and enabling parallel development. This isolation increases productivity and ensures that one developer’s changes do not impact others.

Rapid iterations

Quickly spinning up development environments means that developers can test, receive feedback, and refine their code more swiftly, accelerating the development cycle.

Efficient debugging

Many bugs are data-dependent and might not appear with simplified or synthetic test data. Using real production data helps uncover such bugs early in the development cycle, reducing the risk of issues slipping into production. This consistency simplifies debugging, making it easier to reproduce and fix issues, thus enhancing overall code quality.

Performance testing

Real data volumes allow for accurate load, stress, and volume testing. Developers can measure and optimize the application’s performance under conditions that closely mirror actual production loads, leading to better resource management and user experience.

Data integrity and security

Ensuring data integrity during cloning means that the application’s behavior remains consistent across different environments. Additionally, Upsun offers mechanisms to scramble, encrypt, or anonymize sensitive data, as well as fine-grained access policies, maintaining security while allowing comprehensive testing and mitigating risk.

Stakeholder collaboration

Preview environments with real, live data can be shared with stakeholders, expediting approvals, speeding up QA processes, and keeping projects on schedule. This real-time feedback loop enhances communication and alignment across teams.

Efficient disaster recovery

Instant data cloning also ensures improved disaster recovery, allowing for quick restoration from backups.

Increased confidence in deployment

Knowing that the code has been tested with real data in an environment identical to production boosts confidence in the deployment process. This reduces the chances of post-deployment issues and downtime, ensuring smoother releases.

How it works: copy-on-write with Ceph

Upsun's instant data cloning leverages an incredibly smart copy-on-write mechanism based on Ceph RBDs which use RADOS capabilities including snapshotting, replication, and strong consistency. 

Here's a breakdown of how it works for different Upsun main capabilities:

Clone a production environment

Every time a production environment is cloned (aka branched in Upsun, following the Git terminology) a snapshot of the disk is taken. This snapshot only involves copying the metadata at that particular point in time, making the process highly efficient and independent of the data size (Upsun can clone a 1TB MySQL or PostgreSQL database instantly).

This snapshot serves as the foundation for the development environment.

From this snapshot, only the changes (writes) are subsequently stored on the development environment's disk. This means that the base environment remains unaltered, while any changes made in the development environment are recorded independently. This method ensures a fast, consistent, and accurate replication of the production environment without unnecessary duplication of data.

Take and restore a backup

The same trick is applied for backups. When creating a backup, Upsun will take a Ceph snapshot, and then stream the snapshot’s disk to an external bucket (like S3). This ensures that the backup can be restored with confidence to its exact state at that specific point in time, guaranteeing full data integrity and recoverability.

Refresh a development environment

When an existing development environment needs to be updated with production data (aka synchronization in Upsun), a new snapshot of the production environment is taken and used as the new base for the development environment. This ensures that all development environments are always in sync with the latest production data.

The efficiency of snapshots

The beauty of this system lies in its efficiency. Snapshots are quick to make because they only involve copying metadata, and the copy-on-write mechanism ensures that only changes are stored, not entire datasets. This allows Upsun to offer incredibly fast cloning, backing, restoring, and synchronizing operations. Effectively eliminating the delays and resource consumption associated with traditional methods of exporting datasets, or relying on external providers.

In summary, Upsun’s use of copy-on-write with Ceph not only accelerates the creation and management of development environments but also ensures data accuracy and integrity. This advanced methodology is another factor that sets Upsun apart in the field of modern application development.

Automatic, instant data cloning for development environments

Upsun was designed from the ground up with instant data cloning as its core feature, providing an unmatched competitive advantage in the realm of software development. 

Instead of relying on cumbersome processes like data export and import that can take hours, or being limited to cloning static applications, Upsun delivers fast, reliable, and fully functional replicas of entire production environments in just minutes. 

Additionally, by not relying on third-party providers for providing databases or message services (like PostgreSQL, Redis, and Kafka), Upsun avoids all the extra costs associated with bandwidth, data transfer, writes, and storage charges. This means that development teams can focus on what they do best—developing, testing, and deploying high-quality software—without worrying about time delays, data inconsistencies, or spiraling costs.

In summary, Upsun’s instant data cloning solves the long-standing challenges of setting up and maintaining accurate and efficient development environments, making it an invaluable asset for any development team. With Upsun, you get faster setup times, more reliable testing conditions, lower costs, and an overall smoother development experience. 

Upsun Logo
Join the community