GDPR compliant Magento 2 database dump

On May 25th, a data privacy law known as GDPR came into effect. It impacts the way companies collect and handle user data. In this article we will show you how to handle personal user data when creating database dumps in order to avoid potential GDPR penalties.

What is GDPR?

In short, General Data Protection Regulation or GDPR is a set of rules that regulate how EU citizen data must be managed, empowering EU citizens with more control over their personal data. Organizations have to make sure that personal data is legally gathered, strictly managed and respected. Only the data that is needed should be collected and processed.

You can find out what Inchoo did to prepare for GDPR in our blog post by Toni Anicic.

Personal data in a Magento 2 project

Often when developing a Magento 2 project, data from a live Magento 2 website has to be used. Usually this means creating a copy of the website’s database structure and data, also known as a database dump. This database dump might include tables with personal user data such as names, addresses, emails, orders, invoices, etc. Having access to personal user data when it’s not needed is considered bad practice as the data might get lost, stolen or become available to people that it’s not intended for. Handling the user data this way can lead to significant GDPR penalties. Luckily, there is a way to avoid this by using a CLI tool called n98-magerun.

Netz98 magerun CLI tools

N98-magerun2 provides some handy tools to work with Magento 2 from the command line. Among the available tools that n98-magerun provides is the database dump tool.

Installation

There are multiple ways to install n98-magerun2 in a Magento 2 project. We can download and install the phar file and make it executable or we can install the tool using Composer.

Install the phar file

To download the latest stable n98-magerun2.phar file, run this command in your Magento 2 project:

wget https://files.magerun.net/n98-magerun2.phar

Or if you prefer to use Curl:

curl -O https://files.magerun.net/n98-magerun2.phar

Verify the download by comparing the SHA256 checksum with the one on the website:

shasum -a256 n98-magerun2.phar

We can make the phar file executable:

chmod +x ./n98-magerun2.phar

It can now be called by using the PHP CLI interpreter:

php n98-magerun2.phar {command}

Install with Composer

To install n98-magerun2 using Composer, inside a Magento 2 project run:

composer require n98/magerun2

If there is an error, try:

composer require --no-update n98/magerun2
composer update

N98-magerun2 commands are executed from the vendor/bin/ folder. To verify the installation, run:

./vendor/bin/n98-magerun2 --version

Database dump

The db:dump command is used to dump the project database. It uses mysqldump.

./vendor/bin/n98-magerun2 db:dump

This command will create a file containing the database structure and all of the data.

Stripped database dump

The db:dump command has a –strip argument that can be used to exclude specific tables from the dump.

./vendor/bin/n98-magerun2 db:dump [--strip]

Tables that we want to exclude can be added to the –strip argument. Each of the tables should be separated by a space. Wildcards like * and ? can be used in the table names to strip multiple tables.

./vendor/bin/n98-magerun2 db:dump --strip=”customer_address* sales_invoice_*

Table groups

Predefined table groups that start with @ can also be used in the –strip argument. These contain a list of tables that will be excluded from the dump when the table group is used in the –strip argument.

./vendor/bin/n98-magerun2 db:dump --strip=”@stripped”

The table groups are predefined in the config.yaml file either in the vendor/n98/magerun2/ folder or in the n98-magerun2.phar package.

Available table groups:

@customers
	customer_address*
	customer_entity*
	customer_grid_flat
	customer_log
	customer_visitor
	newsletter_subscriber
	product_alert*
	vault_payment_token*
	wishlist*
 
@search
	catalogsearch_*
 
@sessions
	core_session
 
@log
	log_url
	log_url_info
	log_visitor
	log_visitor_info
	log_visitor_online
	report_event
	report_compared_product_index
	report_viewed_*
 
@quotes
	quote
	quote_*
 
@sales
	sales_order
	sales_order_address
	sales_order_aggregated_created
	sales_order_aggregated_updated
	sales_order_grid
	sales_order_item
	sales_order_payment
	sales_order_status_history
	sales_order_tax
	sales_order_tax_item
	sales_invoice
	sales_invoice_*
	sales_invoiced_*
	sales_shipment
	sales_shipment_*
	sales_shipping_*
	sales_creditmemo
	sales_creditmemo_*
	sales_recurring_*
	sales_refunded_*
	sales_payment_*
	enterprise_sales_*
	enterprise_customer_sales_*
	sales_bestsellers_*
	paypal_billing_agreement*
	paypal_payment_transaction
	paypal_settlement_report*
 
@admin
	admin*
	authorization*
 
@trade
	@customers
	@sales
	@quotes
 
@stripped
	@log
	@sessions
 
@development
	@admin
	@trade
	@stripped
	@search

The @development table group includes other predefined table groups that contain sensitive user data such as logs, sessions, trade data, admin users, orders, invoices, credit memos, quotes, etc. This table group should be used when personal user data is not needed.

The @development table group should take care of all customer data for a default Magento 2 project, but in many cases, a Magento 2 project will contain modules that create their own data tables. For customer data contained in tables not defined by the default Magento 2 project installation, custom table groups should be defined.

Custom table groups

In addition to the predefined table groups, custom table groups can also be defined. To define a custom table group, create a n98-magento2.yaml file inside the Magento 2 project app/etc/ folder. The file should contain the following lines:

# app/etc/n98-magerun2.yaml
# ...
commands:
 N98MagentoCommandDatabaseDumpCommand:
    table-groups:
      - id: table_group_name
        description: table group description
        tables: space separated list of tables
# ...

This will create a new table group @table_group_name that can be used in the –strip argument to exclude the data specified inside this group.

This way we can strip all of the personal user data that we do not need, making sure that the database dump is GDPR compliant and that the personal user data is never available if not needed.