Introduction
In order to avoid surprises while investigating performance issues because of bad database design at client side environment, we should have a review process of baseline database of high end products. Just to avoid confusion, baseline database is a product database that is specifically tailored for each client implementation. To give you a fairly visible idea of what we are saying here is that if a product database misses key performance index, then this index will get missed on every new implementation later on. These review processes make sure that over implementation quality standards have been met.
This tip can act as a baseline script which when executed can highlight violations against defined standard. So the rules are important. Let’s see how we can create rules using SQL Server/Oracle meta views.
Rules
Rules are simply defined standards that must be followed while designing databases. For example, most of the applications support audit reports which require each table to have certain audit fields like createdOn
, CreatedBy
, UpdatedOn
, UpdatedBy
. These fields highlighted here are meant for understanding purposes only, user of course can use their own fields for mandatory rules.
We will discuss the following rules for explanation purposes. The idea can be extended to any sort of rules one may have.
- Audit Field (
CREATED_ON
, CREATED_BY
, UPDATED_ON
, UPDATED_BY
) must be present in all tables - Audit Fields must not mark Nullable
Created_ON
/UPDATED_ON
should have a default date constraint CREATED_BY
/UPDATED_BY
field must be of type nvarchar
of size 30
- Same column names data type and size should match
- All tables must have a Primary Key Constraint
- Data Type mismatch Identified over SAME COLUMN name
Creating Tests
SQL can be used as a rule engine for validation verification purposes because SQL Server sys views contain huge information about everything SQL Server has. Hence, we have written Tests cases using same sys views for validating individual rules. Let's see each rule one by one.
Audit Fields Must Be Present in All Tables
Finding tables that do not have a certain column is easy, all we have to do is to look at sys.tables
/columns views and apply the appropriate filters to get the desired result. The below approach is not an ideal approach but it gets the job done!
select distinct 'Audit Field Not Found' as reason, t.name, 'UPDATED_BY' as fieldname
from sys.tables t
left outer join
(
select distinct t.name
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
where c.name = 'UPDATED_BY'
) tmp on t.name = tmp.name
where tmp.name is null
union all
select 'Audit Field Not Found' as reason, t.name, 'CREATED_BY' as fieldname
from sys.tables t
left outer join
(
select distinct t.name
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
where c.name = 'CREATED_BY'
) tmp on t.name = tmp.name
where tmp.name is null
union all
select 'Audit Field Not Found' as reason, t.name, 'CREATED_ON' as fieldname
from sys.tables t
left outer join
(
select distinct t.name
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
where c.name = 'CREATED_ON'
) tmp on t.name = tmp.name
where tmp.name is null
union all
select 'Audit Field Not Found' as reason, t.name, 'UPDATED_ON' as fieldname
from sys.tables t
left outer join
(
select distinct t.name
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
where c.name = 'UPDATED_ON'
) tmp on t.name = tmp.name
where tmp.name is null
select distinct 'Audit Field Not Found' as reason, t.table_name, 'updated_by' as fieldname
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
('UPDATED_BY')
where c.column_name is null
union all
select distinct 'Audit Field Not Found' as reason, t.table_name, 'created_by' as fieldname
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
('CREATED_BY')
where c.column_name is null
union all
select distinct 'Audit Field Not Found' as reason, t.table_name, 'created_on' as fieldname
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
('CREATED_ON')
where c.column_name is null
union all
select distinct 'Audit Field Not Found' as reason, t.table_name, 'updated_on' as fieldname
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
('UPDATED_ON')
where c.column_name is null
All Audit Fields Must Not Be Marked as Nullable
Audit fields are mandatory by nature hence can't be Nullable
. The below query identifies all tables that are violating this rule.
SELECT distinct 'Nullable field found' as Reason, T.NAME, c.name, c.is_nullable
FROM sys.tables t
INNER JOIN sys.columns c ON c.object_id = t.object_id
INNER JOIN sys.types ty on c.system_type_id = ty.system_type_id
WHERE c.name in ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
and c.is_nullable = 1
order by 1
SELECT distinct 'Nullable field found' as Reason, T.TABLE_NAME, c.COLUMN_NAME, c.NULLABLE
FROM USER_TABLES T
INNER JOIN USER_TAB_COLUMNS C ON T.TABLE_NAME = C.TABLE_NAME AND C.NULLABLE = 'Y'
WHERE C.COLUMN_NAME IN ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
ORDER BY T.TABLE_NAME
CREATED/UPDATED ON Fields Should Have getdate() as Default Constraint<o:p>
Since CREATED
/UPDATED ON
fields are mandatory fields, an ideal way to provide a default value is to have a default getdate()
constraint. We can identify all such tables that do not have default constraint over Created
/Updated On
fields using the below query.
SELECT 'Default Constraint Not Found' as Reason, T.NAME, c.name, c.is_nullable
FROM sys.tables t
INNER JOIN sys.columns c ON c.object_id = t.object_id
INNER JOIN sys.types ty on c.system_type_id = ty.system_type_id
WHERE c.name in ('CREATED_ON', 'UPDATED_ON')
and c.default_object_id = 0
order by 1
SELECT 'Default Constraint Not Found' AS REASON, T.TABLE_NAME, C.COLUMN_NAME
FROM USER_TABLES T
LEFT OUTER JOIN user_tab_columns c ON T.TABLE_NAME = C.TABLE_NAME AND C.DATA_DEFAULT is not null
WHERE C.COLUMN_NAME IN ('CREATED_ON', 'UPDATED_ON')
AND C.TABLE_NAME IS NULL
ORDER BY T.TABLE_NAME
CREATED/UPDATED BY Field Must be of Type NVARCHAR
At any given time, whenever one sees a production DB, s/he will be surprised to see audit fields have different data types like in rare cases you will find created
/updated by
fields of type, char
, nchar
, varchar
, etc. The below query will identify all such rare cases.
select distinct 'DataType difference Found' as reason, t.name, c.name, ty.name
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
inner join sys.types ty on c.system_type_id = ty.user_type_id
where c.name in ('CREATED_BY', 'UPDATED_BY')
and ty.name != 'nvarchar'
order by 1
select distinct 'DataType difference Found' as reason, t.table_name, c.column_name, c.data_type
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name
where c.column_name in ('CREATED_BY', 'UPDATED_BY')
and c.data_type != 'VARCHAR2'
order by t.table_name;
CREATED/UPDATED BY Field Length Should be 30 Characters<o:p>
Similarly, there will be cases where Created
/Updated by
fields have variable sizes, some are of length 50, some have 70 and in rare cases 10. We used the below query to identify all such instances where size is different.
select distinct 'DataType size difference Found' as reason, t.name, c.name, ty.name, c.max_length
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
inner join sys.types ty on c.system_type_id = ty.user_type_id
where c.name in ('CREATED_BY', 'UPDATED_BY')
and ty.name = 'nvarchar'
and c.max_length != 60
order by 1
select distinct 'DataType size difference Found' as reason, t.table_name, _
c.column_name, c.data_type, c.data_length
from user_tables t
left outer join user_tab_columns c on t.table_name = c.table_name
where c.column_name in ('CREATED_BY', 'UPDATED_BY')
and c.data_type = 'VARCHAR2'
and c.data_length != 250
order by t.table_name;
Primary Key Not Found
Experience says that at times, developers miss creation of Primary Keys as well. Identifying tables missing primary keys are a bit tricky however one can use the below SQL to identify such tables.
select 'Primary Key Not Found' as reason, t.name
from sys.tables t
left outer join
(
select t.name
from sys.indexes i
inner join sys.index_columns ic on i.object_id = ic.object_id and i.index_id = ic.index_id
inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
inner join sys.tables t on c.object_id = t.object_id
where i.is_primary_key = 1
) t2 on t.name = t2.name
where t2.name is null
order by 2
select 'Primary Key Not Found' as reason, t.table_name
from user_tables t
inner join user_constraints c on t.table_name = c.table_name and c.constraint_type = 'P'
where c.table_name is null
order by t.table_name
Identity Primary Key Found
There are products that discourage identity based Primary Keys. The below query can be used to identify identity based primary keys.
select 'Identity Primary Key Found' as reason,t.name,
c.name as column_name,
c.is_identity
from sys.indexes i
inner join sys.index_columns ic on i.object_id = ic.object_id and i.index_id = ic.index_id
inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
inner join sys.tables t on c.object_id = t.object_id
where c.is_identity = 1
order by 2
Data Type Mismatch Identified Over SAME COLUMN Name
This is the most common design problem, that is developer forgets to enforce foreign key constraint over child tables. Identifying such columns is tricky; to ease this problem, we assumed that same column names in two or more tables should be linked together. Below query highlights/suggests all such columns using the above assumptions that should have foreign key constraints.
declare @tbl table
(
pk_column_name varchar(100),
pk_table_name varchar(100),
pk_user_type_id varchar(100),
pk_type_length int,
fk_column_name varchar(100),
fk_table_name varchar(100),
fk_user_type_id varchar(100),
fk_type_length int
)
insert into @tbl (pk_column_name, pk_table_name, pk_user_type_id, pk_type_length, _
fk_column_name, fk_table_name, fk_user_type_id, fk_type_length)
select tk.pk_column_name, tk.table_name, tk.user_type_id, _
tk.max_length, c1.name, t.name, c1.user_type_id, c1.max_length
from sys.columns c1
inner join
(
select c.name as pk_column_name,
t.name as table_name,
c.user_type_id,
c.max_length
from sys.indexes i
inner join sys.index_columns ic on i.object_id = ic.object_id and i.index_id = ic.index_id
inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
inner join sys.tables t on c.object_id = t.object_id
where i.is_primary_key = 1
and c.name not in ('CREATED_ON', _
'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
and t.name not like 'QRTZ_%'
) tk on c1.name = tk.pk_column_name
inner join sys.tables t on c1.object_id = t.object_id
where t.name != tk.table_name
delete from @tbl
where fk_column_name in
(select fk_column_name
from @tbl
group by fk_column_name
having count(*) = 1
)
select 'Data Type mismatch Identified over SAME COLUMN name',
t.pk_column_name,
t.pk_table_name,
ty1.name,
t.pk_type_length,
t.fk_column_name,
t.fk_table_name,
ty2.name,
t.fk_type_length
from @tbl t
inner join sys.types ty1 on t.pk_user_type_id = ty1.user_type_id
inner join sys.types ty2 on t.fk_user_type_id = ty2.user_type_id
where t.pk_user_type_id != fk_user_type_id
order by 2
select 'Data Type mismatch Identified over SAME COLUMN name' as reason, _
p.pk_column_name, p.table_name, tc.column_name as fk_column_name, _
tc.table_name as fk_table_name, p.data_type, p.data_length
from user_tab_columns tc
inner join
(
select distinct ucc.column_name as pk_column_name,
t.table_name as table_name,
c.data_type,
c.data_length
from user_tables t
inner join user_tab_columns c on t.table_name = c.table_name
inner join user_constraints uc on t.table_name = uc.table_name
inner join user_cons_columns ucc on uc.constraint_name = ucc.constraint_name
where uc.constraint_type = 'P'
and ucc.column_name not in ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
and ucc.column_name not like 'QRTZ_%'
) p on tc.column_name = p.pk_column_name
where tc.table_name != p.table_name;
Points of Interest
These test cases help us a lot in not only standardizing our database design, but also help in performance optimization in some cases of our products. Adding new test cases is also very easy depending upon business cases.
History
- Version 1.0: First version
- Version 1.1: Added Oracle support