Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / MySQL

Anonymization of Patients' Data in MySQL

5.00/5 (2 votes)
28 Sep 2015CPOL2 min read 16.3K  
How to easily anonymize (or pseudonymize, depending on the depth of performing) patients' proper data in clear text in MySQL

Introduction

When it comes to testing large databases in MySQL containing personalized patient data in a readable clear text, one quickly must understand that there are legal limits set to working with non-pseudonymized or non-anonymized data irrespective of the existence of contracts with the data's keeper. Therefore, it is mandatory to at least pseudonymize but better anonymize these data.

Background

Certain countries (e.g. Germany, Austria, Switzerland, Scandinavia, etc.) do not permit the handling of personalized patient data by external contractors (as opposed to e.g. internal house administrators) under the premise that these data are not anonymized (or pseudonymized) in those parts which are (more or less and thus reconstructibly) human readable.

In these cases, it remains obligatory to anonymize (or pseudonymize) these data. Complex technical methodologies applicable not to every kind of database or other data handling unit stand against simple procedures on an SQL query level which do the job technically both adequately and are quickly realizable.

Whereas with some SQL software servers (e.g. ORACLE, PostgreSQL), there exists the so-called OVERLAY method ...

SQL
-- SELECT OVERLAY(Surname PLACING '*' FROM 2 FOR 15), FirstName FROM dat_patient_copy;

..., in many other database servers, you need to handle this coding scenario in a different way.

For MySQL 5, here is one way to handle this issue.

Using the Code

The following example assumes the existence of following database-associated items:

  • Database name: adipositas_anonym
  • Table name: "dat_patient"
  • SQL query procedure: UPDATE
  • Assumption: Keep the first letter of the respective string to be anonymized, replace everything thereafter with 15 '*' (the most abstract form of anonymization here would be to replace all characters by 'x', the fewer replacements are carried out, the more of a pseudonymization takes place per definitionem).
SQL
USE adipositas_anonym;

UPDATE dat_patient
	SET Surname = CONCAT(SUBSTRING(Surname,1,1), '*****************'),
 	FirstName = CONCAT(SUBSTRING(FirstName,1,1), '*****************'),
	CreatorPerson = 'Dr N. N.', -- global replacement for demo reasons
	CreatorHospital = 'Demo Hospital', -- global replacement for demo reasons
	CreatorDepartment = 'Demo Department', -- global replacement for demo reasons
	CreatorDatum = now(), -- that you know different timely versions of anonymizations
	BirthName = CONCAT(SUBSTRING(BirthName,1,1), '*****************'),
	PriorName = CONCAT(SUBSTRING(PriorName,1,1), '*****************'),
	Street = CONCAT(SUBSTRING(Street,1,1), '*****************'),
	Number = CONCAT(SUBSTRING(Number,1,1), '*****************'),
	PostBox = CONCAT(SUBSTRING(PostBox,1,1), '*****************'),
	Zip = CONCAT(SUBSTRING(Zip,1,1), '*****************'),
	Location = CONCAT(SUBSTRING(Location,1,1), '*****************'),
	TelNr01 = CONCAT(SUBSTRING(TelNr01,1,1), '*****************'),
	TelNr02 = CONCAT(SUBSTRING(TelNr02,1,1), '*****************'),
	Mobile = CONCAT(SUBSTRING(Mobile,1,1), '*****************'),
	email = CONCAT(SUBSTRING(email,1,1), '*****************'),
	KISPVS = CONCAT(SUBSTRING(KISPVS,1,1), '*****************')
	WHERE 1 = 1;

This surely is not a very cunning or even intellectual way to handle such an issue. But it works well, is quickly performed and helps you in an instant if you have to prove to have done things.

Points of Interest

This piece of SQL query code can easily be integrated into e.g. a PHP code which scrutinizes whether the parts of the database you want to check is really anonymized.

History

  • v1.0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)