In this article, we will understand what Dynamic Data masking is and how it helps to eliminate all the chances of unmasked data getting revealed.
Introduction
At the time of writing this article, it’s been more than 2 months since the most secure and powerful SQL Server yet released for the public. It has many features which you can use to make your data and application more secure, more maintained and easily retrieved as per your requirements. It has introduced both client side security such as Always Encrypted as well as server side security such as Data masking, etc. In this post, we’ll examine how Dynamic Data Masking can help secure our data.
Earlier, we were masking the sensitive data by applying logic at application end or by replacing characters at SQL Server end and then returning the masked data to the application. In the former case, the risk was at:
- When application fails to mask data
- Sensitive data transferred over network
- All the users in SQL Server with
SELECT
permission still can access the unmasked data
In the later case, the first two risks are eliminated but still there is a chance that the sensitive data can be revealed by the SQL Server user through which the application retrieves data.
When application fails to mask data Sensitive data transferred over network - All the users in SQL Server with
SELECT
permission still can access the unmasked data
So, let’s quickly jump into the next section to understand what Dynamic Data masking is and how it helps to eliminate all these chances of unmasked data getting revealed.
Data Masking
If you search for the definition of Data Masking over the web, you’ll end up with varied descriptions. The simplest and straight forward definition can be:
“Data masking is the process of hiding original data with random characters or data.”
--wikipedia
If we look further, there are various data masking techniques available such as:
- Shuffling: Shuffling order of the characters in the value.
Example: 12345 -----> 35312 - Nulling: Substituting characters in the value with null (hash) symbol.
Example: 12345 -----> ###45 - Substitution: Substituting value with another value from a substitution table.
Example: Suvendu Giri -----> John Ptak - Masking Out: Masks the complete data or a selective part of the data.
Example: suvendu@mydomain.com -----> suvendu@xxxx.xxx
etc.
Considering many techniques of data masking, they can be categorized into three types:
- Static Data Masking: Original data in production changed to masked data
- On-the-fly Data masking: Copies data from one source to another source and masking done on the later
- Dynamic data masking: Masking done dynamically at runtime
Dynamic Data Masking
“Dynamic data masking limits sensitive data exposure by masking it to non-privileged users. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a data protection feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.”
--- MSDN
So, dynamic data masking alters the result for non-privileged users while streaming and not with data in the production database.
Data Masking in MySQL
As far I know, till the recent version of MySQL, there is no inbuilt mechanism to provide dynamic data masking capabilities. The possible options for MySQL users seems to be:
- Use 3rd party tools
- Use custom scripts/functions
Examples:
SELECT '****' AS PhoneNumber
FROM `Employees`
SELECT REPEAT('*', CHAR_LENGTH(PhoneNumber) - 6) AS PhoneNumber
FROM `Employees`
SELECT CONCAT(SUBSTR(PhoneNumber, 1, 4), _
REPEAT('*', CHAR_LENGTH(PhoneNumber) - 4)) AS PhoneNumber
FROM `Employees`
Data Masking in PostgreSQL
Same as MySQL. It seems they are still lacking this feature. However, we can make use of some custom scripts or user defined functions.
Azure SQL Database too supports the dynamic data masking similar to SQL Server and I have heard from my friends that Oracle provides some kind of mechanism for data masking, although I haven’t ever used it.
Dynamic Data Masking in SQL Server 2016
There are four masking functions available in SQL Server 2016 to provide separate ways to mask your data when they are returned by a query. These are:
Default
When you want all the characters to be masked, you can use Default masking function. The characters for masking and number of masked characters to be returned by a query is determined by the following logic:
- XXXX – four Xs if data length is more than or equal to four
- 0 in case of numeric datatype
- 0 in case of binary data type
- 01/01/1900 for datetime datatypes
Syntax:
MASKED WITH (FUNCTION = 'default()')
Creating table:
CREATE TABLE [dbo].[Employee](
[SecretCode] VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
Modifying column:
ALTER TABLE [dbo].[Employees]
ALTER COLUMN [SecretCode] ADD MASKED WITH (FUNCTION = 'default()');
Example:
Let’s check these functionalities using Table
variable.
DECLARE @Employees AS TABLE
(
SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()')
)
INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21'
SELECT * FROM @Employees
Result:
SecretCode NetSalary DateOfBirth
xxxx 0.00 1900-01-01 00:00:00.000
NOTE: You need to check this with a non-admin user as admin users are granted to see all the data. So, if you will execute this using an admin user, then you’ll see no difference. Let’s say your non-admin user is ‘demouser
’, then add the following line at the top of the query.
EXECUTE AS USER='DemoUser'
Email
It partially masks the characters of an email id. The interesting thing to notice is, it masks characters as well as the length of the email id making it impossible to predict the email id from the few displayed characters.
Syntax:
MASKED WITH (FUNCTION = 'email()')
Example:
Let’s modify the same example to add an emailed column as follows:
DECLARE @Employees AS TABLE
(
SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()')
)
INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in'
SELECT * FROM @Employees
Result:
SecretCode NetSalary DateOfBirth EmailID
xxxx 0.00 1900-01-01 00:00:00.000 sXXX@XXXX.com
Partial
It masks the data based on custom inputs to the function. It masks the data partially. Three parameters need to be passed to this masked function such as:
Prefix
: Number of characters that has to be shown from the start Padding
: Characters that you need to show in between prefix and suffix Suffix
: Number of characters that has to be shown from the end
Syntax:
MASKED WITH (FUNCTION = 'partial(prefix,padding,suffix)')
Example:
Further adding a new column to the same example as:
DECLARE @Employees AS TABLE
(
SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),
FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)')
)
INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in','Suvendu Shekhar Giri'
SELECT * FROM @Employees
Result:
SecretCode NetSalary DateOfBirth EmailID FullName
xxxx 0.00 1900-01-01 00:00:00.000 sXXX@XXXX.com Su***ri
Random
Masks any numeric value with a random value based on the provided upper and lower boundary values as parameters to the masked function.
Syntax:
MASKED WITH (FUNCTION = ‘random(lower_bound,upper_bound)
Example:
DECLARE @Employees AS TABLE
(
SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),
FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)'),
Age INT MASKED WITH (FUNCTION = 'random(90,100)')
)
INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in','Suvendu Shekhar Giri',32
SELECT * FROM @Employees
Result:
SecretCode NetSalary DateOfBirth EmailID FullName Age
xxxx 0.00 1900-01-01 00:00:00.000 sXXX@XXXX.com Su***ri 93
If you want a user to see the values stored in database same as admin, you can grant access to the user using the following command:
GRANT UNMASK TO DemoUser
You can again apply the restriction to see the data masked by following command:
REVOKE UNMASK TO DemoUser
To remove the masked function from a column, you can drop it similar to other constraints like:
ALTER TABLE Employees
ALTER COLUMN FullName DROP MASKED
Conclusion
Dynamic Data Masking in SQL Server 2016 is really a promising feature and will help in avoiding writing so many custom scripts to mask the sensitive data. The role/access based viewing data makes perfect sense and is very useful. Currently, the access for a user can be given on a database. It would be great if there can be a mechanism to give this access based on table or schema.
Please leave your feedback, comments or suggestions in the comments section below.
Thanks for reading. :)
History
- 12th August, 2016: First published