Introduction
For legal reasons it is very important to encrypt table columns containing sensitive data, like SSNs. SQL Server 2005 allows you to encrypt data using different algorithms that employ symmetric and asymmetric keys. Alternatively, you can also use password-based encryption, where the password must be supplied by the client to encrypt/decrypt data. However, everything has a price, so we want to know how expensive this is.
With and without encryption
Let's prepare some test data, 100K "customers:"
SET NOCOUNT ON
go
create table Customers_Data (
name varchar(128) not null,
SSN varchar(10) not null)
go
declare @cnt int set @cnt=100000
while @cnt>0
begin
set @cnt=@cnt-1
insert into Customers_Data (name,SSN)
select 'Cust_'+convert(varchar,@cnt),
'010'+convert(varchar,@cnt)
end
go
Our artificially-generated fake SSNs are all unique. They have a format "010xxxxxx," but they are shorter in some cases. At first, let's perform our experiments without any encryption:
create table Customers (
id int identity primary key,
name varchar(128) not null,
SSN varchar(10) not null)
go
In our first experiment, we copy raw Customers_Data
into the target table and measure the elapsed time:
declare @t1 datetime, @cnt int
truncate table Customers
set @t1=getdate()
insert into Customers (name,SSN)
select * from Customers_Data
select datediff(ms,@t1,getdate())
go
In the second experiment, we find one record by the given SSN. This is done using a table scan comparing all SSNs. This test is repeated multiple times to make it more precise:
declare @t1 datetime, @name varchar(128), @cnt int
set @t1=getdate()
set @cnt=30
while @cnt>0 begin
set @cnt=@cnt-1
select @name=max(name) from Customers
with(index(0)) – forced table scan
where SSN='010'+convert(varchar,@cnt)
end
select datediff(ms,@t1,getdate())/30.
go
Keys
To encrypt data, we must create encryption keys. Here is an example:
create master key encryption by password = 'p@sswOrd'
go
create asymmetric key AsymKey With Algorithm = RSA_1024
go
Now if a database is stolen, encrypted data cannot be recovered because the master key is not backed up within a user database. When data is encrypted, the result is stored as varbinary(128)
. Encrypted data cannot be too long.
create table encr_Customers (
id int identity primary key,
name varchar(128) not null,
SSN varbinary(128))
go
Transparency and decryption
We want to make our encryption transparent to other SQL code and the front end. Our view customers must behave exactly like old table customers, except that truncate table
cannot be used.
create view Customers
as
select
id, name,
convert(varchar(10),
DecryptByAsymKey(AsymKey_ID('AsymKey'), SSN)) as SSN
from encr_Customers
go
So, when we read data we decrypt column SSN on the fly. How can we encrypt data when it is inserted into this view? We can use an instead of
trigger:
create trigger IT1 on Customers
instead of insert
as
begin
insert into encr_Customers (name,SSN)
select name,
EncryptByAsymKey(AsymKey_ID('AsymKey'),SSN)
from inserted
end
GO
We should create a similar trigger for the updates if the encrypted column is involved in the updates. Our view works exactly like a table from before, except that you cannot say truncate table Customers
; you should use truncate table encr_Customers
instead. We cannot make a password-based encryption transparent, as we would have to hardcode a password into a view. This would, of course, ruin the very idea of encryption.
Comparing keys and algorithms
Now we can compare different keys and different algorithms of encryption:
Key type
| Algorithm/length
| Table scan
| Test insert
|
Without Encryption
| | 2
| 93
|
AsymKey
| 1024
| 102656
| 2623
|
AsymKey
| 512
| 21716
| 1486
|
AsymKey
| 2048
| 747766
| 6563
|
SymKey
| DES
| 173
| 623
|
SymKey
| TRIPLE_DES
| 200
| 580
|
SymKey
| RC2
| 173
| 606
|
SymKey
| RC4
| 250
| 576
|
SymKey
| DESX
| 203
| 516
|
SymKey
| AES_128
| 173
| 736
|
SymKey
| AES_192
| 173
| 576
|
SymKey
| AES_256
| 186
| 610
|
Note that asymmetric keys are much longer to encrypt. This is especially the case when decrypting data. Now comes the funny part. Say we want to find a customer by SSN:
select * from Customers where SSN=@value
Usually, SSN is indexed and the record is found immediately. However, if you tried it in our encrypted view, you would see that SQL Server uses a table scan even if an index is created on encr_Customers.SSN. Why? It appears as though the statement above is equivalent to (in simplified code):
select * from encr_Customers where decrypt(SSN)=@value
If we rewrite this as...
select * from encr_Customers where SSN=encrypt(@value)
.. then it would work much faster and use an index, but it would not find our record! This happens because all encryption functions are non-deterministic every time they generate another value (values truncated):
select EncryptByAsymKey(AsymKey_ID('AsymKey'), 'This is my example')
select EncryptByAsymKey(AsymKey_ID('AsymKey'), 'This is my example')
select EncryptByAsymKey(AsymKey_ID('AsymKey'), 'This is my example')
(values truncated)
0x9FD1DDA8A6ACED67C0C24CDD566CAD655E7B7E3D76197896…
0xE18FFFB9EC75CD2093089A5DDB83220A244346F77AA548BF…
0x98D563BA855573A442A278B2565D9216192AD5BC7B664637…
For that reason, unfortunately, all indices on encrypted data are absolutely useless.
History
- 11 June, 2007 -- Original version posted