E-mail anonymization in Firebird

29 09 2011

I worked recently with Firebird database. I want to share my experience in one task I had to do: anonymization of e-mails.

To anonymize all e-mails in a table run in Firebird one of the following SQL expressions depending on what output is desired.

Let’s assume we have two tables MEMBRE and SOCIETE (in French: member and company). Both have an email column. MEMBRE has NOT NULL and UNIQUE clauses on that column. SOCIETE is not restricted.

In my case I wanted to be able to see if e-mails were actually sent to the anonymized addresses. That’s why I change domain names to @yopmail.com. See www.yopmail.com for more details about the service.

Scenario 1

update membre set email=substring(email from 1 for position('@' in email)) || 'yopmail.com';

update societe set email=substring(email from 1 for position('@' in email)) || 'yopmail.com' where email is not null and email like '%@%';

Will do the following change: john.doe@company.com => john.doe@yopmail.com

The simplest solution but may cause a data violation error if the same username is used in different domains, e.g. john.doe@company.com and john.doe@societe.fr

Scenario 2

create sequence anonymizator;

update membre set email='mem' || next value for anonymizator || '@yopmail.com';

update societe set email='soc' || next value for anonymizator || '@yopmail.com' where email is not null and email like '%@%';

drop sequence anonymizator;

Will do the following change: john.doe@company.com => mem1@yopmail.com

Requires a sequence which is created before and deleted after the operation. ‘1’ in the result address will be increased for subsequent rows. Completely erases any information of previous e-mail value so it really anonymizes data.

Scenario 3

update membre set email=substring(email from 1 for position('@' in email)-1) || '_' || substring(email from position('@' in email)+1) || '@yopmail.com';

update societe set email=substring(email from 1 for position('@' in email)-1) || '_' || substring(email from position('@' in email)+1) || '@yopmail.com' where email is not null and email like '%@%';

Will do the following change: john.doe@company.com => john.doe_company.com@yopmail.com

The most sophisticated but keeps information of initial e-mail value. So doesn’t really anonymize but successfully prevents sending e-mail messages to affected addresses.

Note: Anonymizing e-mails may increase its length and therefore may cause an error during execution. When the database replies

arithmetic exception, numeric overflow, or string truncation

it means that a new value probably exceeds the length of field being changed. The table can be changed with this SQL:

alter table membre alter column email type varchar(100);