New greener region discount. Save 3% on Upsun resource usage. Learn how.
LoginFree trial
FeaturesPricingBlogAbout us
Blog

How to sanitize preview environment data

gdprdataactivity scriptsautomation
17 May 2024
Florent Huck
Florent Huck
DevRel Engineer

Being GDPR-compliant across all your projects is a daily challenge—especially if you’re managing sensitive user data on your projects. Upsun adopts a GDPR everywhere approach with high levels of built-in security and compliance as standard—but there are ways to secure your data on our PaaS further when it comes to preview environments. 

Each time you create a new Git branch on a project on Upsun, the corresponding environment inherits the data (assets and database) from its parent. This means that potentially sensitive data from your production website could be exposed to the preview environment. 

So, how do you navigate this and ensure your application remains compliant? Two words: data sanitization. The deliberate and permanent erasure of sensitive data from a storage device making the data non-recoverable. In this article, I will share the methods of data sanitization that you can implement for preview environments to ensure that your data remains safe at every stage of development. 

Some necessary resources before we start 

We have some prerequisites to ensure that you can follow the solutions and steps detailed in this article—please make sure you have installed the following: 

Methods for application data sanitization

For the purpose of this article, we are going to focus on preview environment data sanitization on Symfony, however, the methods detailed apply to all frameworks. 

If you want more details on how to set up Symfony Demo applications on Upsun, take a look at our Up(sun) and running with Symfony Demo guide.

If you want more details on how to set up Symfony Demo applications on Upsun, take a look at our Up(sun) and running with Symfony Demo guide

Throughout the article, we’re going to walk through the various methods available to sanitize Symfony preview environment data on Upsun—5 methods to be exact—which you can weigh up and choose the best one for you. However, make sure that you complete the create a command step before proceeding with any method

If you already know the method you would prefer to use, go ahead and click on the relevant title below and we’ll take you straight there:

First things first, create a command to sanitize your data

Please note: if you’re hosting a stack other than Symfony, please adapt the command to sanitize your database in your stack and push it to your production branch. This is the only Symfony-specific step in this article.

To carry out any of the five data sanitization methods listed above, we need a callable to sanitize our environments. There are two possible ways to do so:

  1. Using an SQL script to update or fake all sensitive data
  2. Using a Symfony command to do it, perhaps using the fakerPHP bundle

Since we are using a Symfony Demo application, we will use the second option. Do the following, from the main Git branch:

symfony composer require --dev fakerphp/faker
git add composer.json composer.lock && git commit -m "composer require --dev fakerphp bundle" 

Then open your code in your favorite IDE and create a new Symfony command, in an SRC/command/SanitizeDataCommand.php file, with the following:

<?php
/* src/Command/SanitizeDataCommand.php */

namespace App\Command;

use App\Entity\User;
use App\Repository\UserRepository;
use Doctrine\ORM\EntityManagerInterface;
use Faker;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

#[AsCommand(
   name: 'app:sanitize-data',
   description: 'Sanitize user data (username and email).',
   aliases: ['app:sanitize']
)]
class SanitizeDataCommand extends Command
{
   private SymfonyStyle $io;

   public function __construct(private UserRepository $userRepository, private EntityManagerInterface $entityManager)
   {
       parent::__construct();
   }

   protected function configure()
   {
       $this
           ->setDescription('This command allows you to sanitize user data (username and email).');
   }

   protected function initialize(InputInterface $input, OutputInterface $output): void
   {
       $this->io = new SymfonyStyle($input, $output);
   }

   protected function execute(InputInterface $input, OutputInterface $output): int
   {
       $users = $this->userRepository->findAll();
       $this->io->progressStart(count($users));

       $this->entityManager->getConnection()->beginTransaction(); // suspend auto-commit
       try {
           /** @var User $user */
           foreach ($users as $user) {
               $this->io->progressAdvance();
               // initialize faker
               $faker = Faker\Factory::create();

               $this->io->text('faking user '.$user->getUsername());
               // fake user info
               $user->setUsername(uniqid($faker->userName()));
               $user->setEmail($faker->email());
               // please adapt to your needs
           }   

           $this->entityManager->flush();
           $this->entityManager->getConnection()->commit();
           $this->io->progressFinish();
       } catch (\Exception $e) {
           $this->entityManager->getConnection()->rollBack();
           throw $e;
       }

       return Command::SUCCESS;
   }
}

This command app:sanitize-data uses the UserRepository and fakes username and email from the default User Symfony entity. Please adapt to your needs. Then push your code to the main branch:

git add src/Command/SanitizeDataCommand.php && git commit -m "sanitize data command" 
symfony deploy

1) Manually sanitize your data

Now that your source code contains a Symfony command to sanitize your data, we will use it manually in a new preview environment. Starting with creating a new staging branch and waiting for the process to finish, like so:

symfony branch staging --type=staging

Then, execute your newly-created Symfony command on your Upsun staging environment, as seen below:

symfony ssh php bin/console -e dev app:sanitize-data

Et voilà, your preview environment data is sanitized!

2) Use environment inheritance

In this section, we will create a preview environment, sanitize its data, and then make all new environments inherit from that preview environment.

As mentioned at the top of this article, each time you create a new Git branch on Upsun, the created environment will inherit data from the parent environment. However, it’s possible to change the default data inheritance and set it later to synchronize data from a new parent—the preview environment we will create. 

The Symfony CLI offers the ability to create a branch without a parent, using option --no-clone-parent, and then setting the parent to staging (a.k.a preview) which ensures new branches inherit the preview environment’s data. Follow the instructions in step 1 for details on how to sanitize preview environment data manually to ensure any future branches inherit sanitized, GDPR-compliant data.

symfony checkout main
symfony branch dev --no-clone-parent
symfony env:info -e dev parent staging
symfony sync -e dev data

And that’s it, your new dev environment is now created with sanitized data from your preview environment.

3) Use a hook

Rather than relying on inheritance, it may be desirable to sanitize certain data on each deployment. In this case, we can move our script call to the hooks section of the configuration. 

The type of hook you choose is up to you—deploy or post_deploy hooks—but here are a few things to keep in mind:

  • Long-running script within the deploy hook will need to extend the deployment time of an application.
  • Long-running script within the post_deploy hook could make non-compliant/critical data momentarily public while the sanitization is taking place.
  • Redeploys; if sanitization is something you’d like to be able to manually trigger with a redeploy, sanitizing will take place on each redeploy only if placed in the post_deploy hook. 

To execute a Symfony command during the post_deploy hook, add the following in your .upsun/config.yaml:

applications:
  app:
    hooks:
      build: ...
      deploy: ...

      post_deploy: |
        if [ "$PLATFORM_ENVIRONMENT_TYPE" != production ]; then
          # The sanitization of the database should happen here (since it's non-production)
          php bin/console -e dev app:sanitize-data
        fi

Then push your code to the main branch:

git checkout main && git add .upsun/config.yaml && git commit -m "add sanitize data command to post_deploy hook" 
symfony deploy

4) Use runtime operations and activity scripts

There is another option that allows you to create a custom trigger that is run in response to certain activities that take place on the project. Namely, when synchronizing an environment with its parent, we could sync back non-anonymized data from the parent (e.g., if synchronizing from the production environment).

The two components that will make this work are:

  • runtime operation: will allow you to trigger one-off commands or scripts on your project. Similar to crons, they run in the application container but not on a specific schedule.
  • An activity script: a JavaScript piece of code that will be run in response to certain activities taking place at the project, environment, or even organization level. 

So we will add an integration (activity script) that responds to certain events to execute a runtime operation to sanitize data on the fly, see add an integration of an activity script below.

Please note: if you set a post_deploy hook from the previous step, please comment it out as it would not be needed anymore after using this runtime operation.

How to create a runtime operation

To configure a runtime operation, we need to add a new top-level YAML key in our .upsun/config.yaml file with the following:

applications:
  app: 
    operations:
      sanitize:
        role: admin
        commands:
          start: |
            if [ "$PLATFORM_ENVIRONMENT_TYPE" != production ]; then
              # The sanitization of the database should happen here (since it's non-production)
              php bin/console -e dev app:sanitize-data
            fi

Then push your file to the main branch and deploy. 

git checkout main 
git add .upsun/config.yaml && git commit -m "add runtime operation to sanitize data"
symfony deploy

And if you want to test this runtime operation manually, you can use the following:

symfony operation:run sanitize --app=app

How to create an activity script

Upsun supports custom scripts that can fire in response to any activity. This script is executed outside of the environment context and so, we need to re-create this context for the activity script to be executed with the necessary rights. To do so, create a new file src/runtime/sanitize.js with the following: 

// src/runtime/sanitize.js
let app_container = "app";
let runtime_operation_name = "sanitize";

if (!variables.api_token) {
    console.log("Variable API Token is not defined!");
    console.log("Please define an environment variable with your API Token using command: ");
    console.log("upsun project:curl /integrations/<INTEGRATION_ID>/variables -X POST -d '{\"name\": \"api_token\", \"value\": \"<API_TOKEN>\", \"is_sensitive\": true, \"is_json\": false}' ");
} else {
    console.log("OAuth2 API Token defined");
    let resp = fetch('https://auth.api.platform.sh/oauth2/token', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/x-www-form-urlencoded'
        },
        body: "client_id=platform-api-user&grant_type=api_token&api_token=" + variables.api_token
    });

    if (!resp.ok) {
        console.log("Failed to get an OAuth2 token, status code was " + resp.status);
    } else {
        console.log("OAuth2 API TOKEN ok");
    }

    let access_token = resp.json().access_token;

    // get current branch from activity object
    let branch;
    switch (activity.type) {
        case 'environment.synchronize':
            branch = activity.parameters.into;
            break;
        case 'environment.branch':
        case 'environment.activate':
            branch = activity.parameters.environment;
            break;
    }

    // run runtime operation runtime_operation_name on current/targeted environment
    resp = fetch("https://api.upsun.com/api/projects/" + activity.project + "/environments/" + branch + "/deployments/current/operations",
        {
            headers: {
                "Authorization": "Bearer " + access_token
            },
            method: "POST",
            body: JSON.stringify({"service": app_container, "operation": runtime_operation_name}),
        });

    if (!resp.ok) {
        console.log("Failed to invoke the runtime operation, status code was " + resp.status);
    } else {
        console.log(runtime_operation_name + " launched");
    }
}

This activity script uses an API Token, as an environment variable, to connect to the current environment and execute the previously defined runtime operation using the Upsun API. We need to define this environment variable for the integration of our activity script, and later add an API Token environment variable.

Then push your file to the main branch and deploy, like so: 

symfony checkout main 
git add src/runtime/sanitize.js
git commit -m "add activity script"
symfony deploy

Add an integration of an activity script

Three Upsun events should trigger this runtime operation: 

  • When creating a new branch (environment.branch), 
  • When synchronization of data between environments occurs (environment.synchronize), 
  • When activating a preview environment (environment.activate). 

To implement these triggers, use this command in your terminal to add an activity script integration.

symfony integration:add --type script --file ./src/runtime/sanitize.js --events environment.branch,environment.synchronize,environment.activate --states complete --environments \*

Please note: A complete list of possible events is available as the Activity script type definition. Any of those Activity Script types can be added as an event list in the --events=event1,event2,... option.

Add an API Token environment variable

First, get the previous integration ID using the following command: 

symfony integration:list

Then, create a new API Token from the Console, keep the value in your hand, and replace it in this terminal command: 

symfony project:curl /integrations/<INTEGRATION_ID>/variables -X POST -d '{"name": "api_token", "value": "<API_TOKEN>", "is_sensitive": true, "is_json": false}'

Please note: replace <INTEGRATION_ID> and <API_TOKEN> with the corresponding values previously created.

You can verify that the variable has been created with this command:

symfony project:curl /integrations/<INTEGRATION_ID>/variables

Time to test

To test if everything has worked, in the Console or with the CLI, trigger the creation of a new branch from main, trigger a sync, deactivate and reactivate your preview environment, and then you should see two activities:

  • Activity triggered
  • A runtime operation activity

Run into a problem? Debug it

If you encounter a problem and want to debug the activity script integration, you need to use the following command:

symfony integration:activity:log <INTEGRATION_ID>

When adding the integration of your activity script, the corresponding script is added in memory on the Upsun side. This means that each time you update your script, you need to update the cached version of the file, using the following command:

symfony integration:update <INTEGRATION_ID> --file ./src/runtime/sanitize.js

Please note: of course, to keep your source code up-to-date, you would need to commit this file:

git add src/runtime/sanitize.js
git commit -m "add activity script"
symfony deploy # optional

5) How to use shell scripts to sanitize development environments

It’s possible to use a shell script to automate the data sanitization of all of your environments, except production,  for all your projects within an organization–learn more about organizations here. To use this shell script, please ensure that all your environment sources from all your projects inside your organization contain the Symfony command to sanitize data, before working through the following steps.

The first step is to create a file named fleet_sanitizer.sh with the following code:  

if [ -n "$ZSH_VERSION" ]; then emulate -L ksh; fi
######################################################
# fleet sanitization demo script, using the CLI.
# 
# Enables the following workflow on a given project and sanitize preview environments (staging, new-feature and auto-updates environment:
# .
# └── main
#     ├── staging
#     |   └── new-feature
#     └── auto-updates
#
# Usage
# 1. source this script: `. fleet_sanitizer.sh` or `source fleet_sanitizer.sh` depending of your local machine
# 2. define ORGANIZATION var: ORGANIZATION=<organizationIdentifier>
# 3. run `sanitize_organization_data $ORGANIZATION`
######################################################

# Utility functions.

# list_org_projects: Print list of projects operation will be applied to before starting.
#   $1: Organization, as it appears in console.upsun.com.
list_org_projects() {
  symfony project:list -o $1 --columns="ID, Title"
}

# get_org_projects: Retrieve an array of project IDs for a given organization.
#   Note: Makes array variable PROJECTS available to subsequent scripts.
#   $1: Organization, as it appears in console.upsun.com.
get_org_projects() {
  PROJECTS_LIST=$(symfony project:list -o $1 --pipe)
  PROJECTS=($PROJECTS_LIST)
}

# get_project_envs: Retrieve an array of envs IDs for a project.
#   Note: Makes array variable ENVS available to subsequent scripts.
#   $1: ProjectId, as it appears in console.upsun.com.
get_project_envs() {
  ENV_LIST=$(symfony environment:list -p $1 --pipe)
  ENVS=($ENV_LIST)
}

# list_project_envs: Print list of envs operation will be applied to before starting.
#   $1: ProjectId, as it appears in console.upsun.com.
list_project_envs() {
  symfony environment:list -p $1
}

# add_env_var: Add environment level environment variable.
#   $1: Variable name.
#   $2: Variable value.
#   $3: Target project ID.
#   $4: Target environment ID.
add_env_var() {
  VAR_STATUS=$(symfony project:curl -p $3 /environments/$4/variables/env:$1 | jq '.status')
  if [ "$VAR_STATUS" != "null" ]; then
    symfony variable:create --name $1 --value "$2" --prefix env: --project $3 --environment $4 --level environment --json false --sensitive false --visible-build true --visible-runtime true --enabled true --inheritable true -q
  else
    printf "\nVariable $1 already exists. Skipping."
  fi
}

# Main functions.
sanitize_organization_data() {
  list_org_projects $1
  get_org_projects $1
  for PROJECT in "${PROJECTS[@]}"; do
    printf "\n### Project $PROJECT."
    # get environments list
    list_project_envs $PROJECT
    get_project_envs $PROJECT
    for ENVIRONMENT in "${ENVS[@]}"; do
      unset -f ENV_CHECK
      ENV_CHECK=$(symfony project:curl -p $PROJECT /environments/$ENVIRONMENT | jq -r '.status')
      unset -f ENV_TYPE
      ENV_TYPE=$(symfony project:curl -p $PROJECT /environments/$ENVIRONMENT | jq -r '.type')

      if [ "$ENV_CHECK" = active -a "$ENV_TYPE" != production ]; then
        unset -f DATA_SANITIZED
        DATA_SANITIZED=$(symfony variable:get -p $PROJECT -e $ENVIRONMENT env:DATA_SANITIZED --property=value)
        if [ "$DATA_SANITIZED" != true ]; then
          printf "\nEnvironment $ENVIRONMENT exists and is not sanitized yet. Sanitizing data."
          printf "\n"
          # do sanitization here
          symfony ssh -p $PROJECT -e $ENVIRONMENT -- php bin/console app:sanitize-data
          printf "\nSanitizing data is finished, redeploying"
          add_env_var DATA_SANITIZED true $PROJECT $ENVIRONMENT
        else
          printf "\nEnvironment $ENVIRONMENT exists and does not need to be sanitized. skipping."
        fi
      elif [ "$ENVIRONMENT" == main ]; then
        printf "\nEnvironment $ENVIRONMENT is production one, skipping."
      else
        printf "\nEnvironment $ENVIRONMENT is not active $ENV_CHECK, skipping."
      fi
    done
  done
}

Please note: in this script, each time we sanitize an environment, we set environment variable DATA_SANITIZED to be sure that the next time we run this script, it will not sanitize the environment repeatedly.

Then, depending on the machine you want to run this script on, please adapt the code to your needs but it should look something like this:

. fleet_sanitizer.sh  # or source fleet_sanitizer.sh
ORGANIZATION=<organizationIdentifier>
sanitize_organization_data $ORGANIZATION

Tip: you will find the organization identifier for a specific project by clicking on your name, and then on settings in the top right corner of the screen.

And just like that, your data is sanitized and you're well on your way to GDPR compliance! 

If you have any further questions about our security and compliance capabilities or encounter any issues with the methods and/or steps above, reach out to our support team who’ll be happy to help. 

Stay up-to-date on all the latest from us over on our social media and community channels. Catch us over on Dev.to, Reddit, and Discord.

FAQ

What should I do if I encounter issues with data sanitization scripts in Upsun?
If you encounter issues with data sanitization scripts in Upsun, start by checking the logs and outputs for any error messages or indications of what might be going wrong. Ensure that all dependencies and permissions are correctly set up. You can also use the Upsun support resources, including documentation and community forums, to find solutions to common issues. If the problem persists, reach out to Upsun's support team for assistance, providing detailed information about the issue and any error messages received.

Upsun Logo
Join the community