Verifying Database Design Issues Against Predefined Quality Rules Using Schema Views

_Asif_

5.00/5 (8 votes)

6 Dec 2016CPOL4 min read

22.4K

100

This article helps in identifying database design issues like certain fields missing, nullable fields, Primary key not found issues in production environment using SQL Server schema views.

Introduction

In order to avoid surprises while investigating performance issues because of bad database design at client side environment, we should have a review process of baseline database of high end products. Just to avoid confusion, baseline database is a product database that is specifically tailored for each client implementation. To give you a fairly visible idea of what we are saying here is that if a product database misses key performance index, then this index will get missed on every new implementation later on. These review processes make sure that over implementation quality standards have been met.

This tip can act as a baseline script which when executed can highlight violations against defined standard. So the rules are important. Let’s see how we can create rules using SQL Server/Oracle meta views.

Rules

Rules are simply defined standards that must be followed while designing databases. For example, most of the applications support audit reports which require each table to have certain audit fields like createdOn, CreatedBy, UpdatedOn, UpdatedBy. These fields highlighted here are meant for understanding purposes only, user of course can use their own fields for mandatory rules.

We will discuss the following rules for explanation purposes. The idea can be extended to any sort of rules one may have.

Audit Field (CREATED_ON, CREATED_BY, UPDATED_ON, UPDATED_BY) must be present in all tables
Audit Fields must not mark Nullable
Created_ON/UPDATED_ON should have a default date constraint
CREATED_BY/UPDATED_BY field must be of type nvarchar of size 30
Same column names data type and size should match
All tables must have a Primary Key Constraint
Data Type mismatch Identified over SAME COLUMN name

Creating Tests

SQL can be used as a rule engine for validation verification purposes because SQL Server sys views contain huge information about everything SQL Server has. Hence, we have written Tests cases using same sys views for validating individual rules. Let's see each rule one by one.

Audit Fields Must Be Present in All Tables

Finding tables that do not have a certain column is easy, all we have to do is to look at sys.tables/columns views and apply the appropriate filters to get the desired result. The below approach is not an ideal approach but it gets the job done!

select  distinct 'Audit Field Not Found' as reason, t.name, 'UPDATED_BY' as fieldname
from    sys.tables t
        left outer join 
        (
            select distinct t.name
            from sys.tables  t
                inner join sys.columns c on c.object_id = t.object_id
            where c.name = 'UPDATED_BY'
        ) tmp on t.name = tmp.name
where   tmp.name is null        
union all
select  'Audit Field Not Found' as reason, t.name, 'CREATED_BY' as fieldname
from    sys.tables t
        left outer join 
        (
            select distinct t.name
            from sys.tables  t
                inner join sys.columns c on c.object_id = t.object_id
            where c.name = 'CREATED_BY'
        ) tmp on t.name = tmp.name
where   tmp.name is null
union all
select  'Audit Field Not Found' as reason, t.name, 'CREATED_ON' as fieldname
from    sys.tables t
        left outer join 
        (
            select distinct t.name
            from sys.tables  t
                inner join sys.columns c on c.object_id = t.object_id
            where c.name = 'CREATED_ON'
        ) tmp on t.name = tmp.name
where   tmp.name is null    
union all
select  'Audit Field Not Found' as reason, t.name, 'UPDATED_ON' as fieldname
from    sys.tables t
        left outer join 
        (
            select distinct t.name
            from sys.tables  t
                inner join sys.columns c on c.object_id = t.object_id
            where c.name = 'UPDATED_ON'
        ) tmp on t.name = tmp.name
where   tmp.name is null

Oracle

select  distinct 'Audit Field Not Found' as reason, t.table_name, 'updated_by' as fieldname
from    user_tables t
        left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
        ('UPDATED_BY')
where   c.column_name is null
union all
select  distinct 'Audit Field Not Found' as reason, t.table_name, 'created_by' as fieldname
from	user_tables t
		left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
        ('CREATED_BY')
where	c.column_name is null
union all
select  distinct 'Audit Field Not Found' as reason, t.table_name, 'created_on' as fieldname
from	user_tables t
		left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
        ('CREATED_ON')
where	c.column_name is null
union	all
select  distinct 'Audit Field Not Found' as reason, t.table_name, 'updated_on' as fieldname
from	user_tables t
		left outer join user_tab_columns c on t.table_name = c.table_name and c.column_name in _
        ('UPDATED_ON')
where	c.column_name is null

All Audit Fields Must Not Be Marked as Nullable

Audit fields are mandatory by nature hence can't be Nullable. The below query identifies all tables that are violating this rule.

SELECT  distinct 'Nullable field found' as Reason, T.NAME, c.name, c.is_nullable
FROM sys.tables  t
    INNER JOIN sys.columns c ON c.object_id = t.object_id
    INNER JOIN sys.types ty on c.system_type_id = ty.system_type_id
WHERE c.name in ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
and   c.is_nullable = 1
order by 1

Oracle

SELECT  distinct 'Nullable field found' as Reason, T.TABLE_NAME, c.COLUMN_NAME, c.NULLABLE
FROM USER_TABLES  T
	INNER JOIN USER_TAB_COLUMNS C ON T.TABLE_NAME = C.TABLE_NAME AND C.NULLABLE = 'Y'
WHERE C.COLUMN_NAME IN ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
ORDER BY T.TABLE_NAME

CREATED/UPDATED ON Fields Should Have getdate() as Default Constraint<o:p>

Since CREATED/UPDATED ON fields are mandatory fields, an ideal way to provide a default value is to have a default getdate() constraint. We can identify all such tables that do not have default constraint over Created/Updated On fields using the below query.

SELECT  'Default Constraint Not Found' as Reason, T.NAME, c.name, c.is_nullable
FROM sys.tables  t
    INNER JOIN sys.columns c ON c.object_id = t.object_id
    INNER JOIN sys.types ty on c.system_type_id = ty.system_type_id
WHERE c.name in ('CREATED_ON', 'UPDATED_ON')
and   c.default_object_id = 0
order by 1

Oracle

SELECT  'Default Constraint Not Found' AS REASON, T.TABLE_NAME, C.COLUMN_NAME
FROM USER_TABLES T
     LEFT OUTER JOIN user_tab_columns c ON T.TABLE_NAME = C.TABLE_NAME AND C.DATA_DEFAULT is not null
WHERE C.COLUMN_NAME IN ('CREATED_ON', 'UPDATED_ON')
AND C.TABLE_NAME IS NULL
ORDER BY T.TABLE_NAME

CREATED/UPDATED BY Field Must be of Type NVARCHAR

At any given time, whenever one sees a production DB, s/he will be surprised to see audit fields have different data types like in rare cases you will find created/updated by fields of type, char, nchar, varchar, etc. The below query will identify all such rare cases.

select  distinct 'DataType difference Found' as reason, t.name, c.name, ty.name
from	sys.tables  t
		inner join sys.columns c on c.object_id = t.object_id
		inner join sys.types ty on c.system_type_id = ty.user_type_id
where	c.name in ('CREATED_BY', 'UPDATED_BY')
and     ty.name != 'nvarchar'
order by 1

Oracle

select  distinct 'DataType difference Found' as reason, t.table_name, c.column_name, c.data_type
from	user_tables t
		left outer join user_tab_columns c on t.table_name = c.table_name 
where	c.column_name in ('CREATED_BY', 'UPDATED_BY')
and		c.data_type != 'VARCHAR2'
order by t.table_name;

CREATED/UPDATED BY Field Length Should be 30 Characters<o:p>

Similarly, there will be cases where Created/Updated by fields have variable sizes, some are of length 50, some have 70 and in rare cases 10. We used the below query to identify all such instances where size is different.

select  distinct 'DataType size difference Found' as reason, t.name, c.name, ty.name, c.max_length
from	sys.tables  t
		inner join sys.columns c on c.object_id = t.object_id
		inner join sys.types ty on c.system_type_id = ty.user_type_id
where	c.name in ('CREATED_BY', 'UPDATED_BY')
and     ty.name = 'nvarchar'
and     c.max_length != 60 
order by 1

Oracle

select  distinct 'DataType size difference Found' as reason,  t.table_name, _
                  c.column_name, c.data_type, c.data_length
from	user_tables t
		left outer join user_tab_columns c on t.table_name = c.table_name 
where	c.column_name in  ('CREATED_BY', 'UPDATED_BY')
and		c.data_type = 'VARCHAR2'
and     c.data_length != 250 
order by t.table_name;

Primary Key Not Found

Experience says that at times, developers miss creation of Primary Keys as well. Identifying tables missing primary keys are a bit tricky however one can use the below SQL to identify such tables.

select 'Primary Key Not Found' as reason, t.name
from    sys.tables t
        left outer join 
        (
            select t.name
            from sys.indexes i
             inner join sys.index_columns ic  on i.object_id = ic.object_id and i.index_id = ic.index_id
             inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
             inner join sys.tables t on  c.object_id = t.object_id
            where i.is_primary_key = 1
        ) t2 on t.name = t2.name
where    t2.name is null
order by 2

Oracle

select 'Primary Key Not Found' as reason, t.table_name
from user_tables  t
	inner join user_constraints c on t.table_name = c.table_name and c.constraint_type = 'P'
where	c.table_name is null
order by t.table_name

Identity Primary Key Found

There are products that discourage identity based Primary Keys. The below query can be used to identify identity based primary keys.

SQL

select 'Identity Primary Key Found' as reason,t.name,
		c.name as column_name,
		c.is_identity
from	sys.indexes i
		inner join sys.index_columns ic  on i.object_id = ic.object_id and i.index_id = ic.index_id
		inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
		inner join sys.tables t on  c.object_id = t.object_id
where	c.is_identity = 1
order by 2

Data Type Mismatch Identified Over SAME COLUMN Name

This is the most common design problem, that is developer forgets to enforce foreign key constraint over child tables. Identifying such columns is tricky; to ease this problem, we assumed that same column names in two or more tables should be linked together. Below query highlights/suggests all such columns using the above assumptions that should have foreign key constraints.

declare @tbl table
(
    pk_column_name varchar(100),
    pk_table_name  varchar(100),
    pk_user_type_id varchar(100),
    pk_type_length int,
    fk_column_name varchar(100),
    fk_table_name varchar(100),
    fk_user_type_id varchar(100),
    fk_type_length int
)

insert into @tbl (pk_column_name, pk_table_name, pk_user_type_id, pk_type_length, _
                  fk_column_name, fk_table_name, fk_user_type_id, fk_type_length)
select  tk.pk_column_name, tk.table_name, tk.user_type_id, _
tk.max_length, c1.name, t.name, c1.user_type_id, c1.max_length
from    sys.columns c1
        inner join 
        (
            select  c.name as pk_column_name,
                    t.name as table_name,
                    c.user_type_id,
                    c.max_length
            from    sys.indexes i
            inner join sys.index_columns ic  on i.object_id = ic.object_id and i.index_id = ic.index_id
            inner join sys.columns c on ic.object_id = c.object_id and c.column_id = ic.column_id
            inner join sys.tables t on  c.object_id = t.object_id            
            where    i.is_primary_key = 1
            and        c.name not in ('CREATED_ON', _
            		'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
            and        t.name not like 'QRTZ_%'
        )  tk on c1.name = tk.pk_column_name 
        inner join sys.tables t on  c1.object_id = t.object_id    
where t.name != tk.table_name        

delete from @tbl
where  fk_column_name in 
(select fk_column_name
from    @tbl
group by fk_column_name
having  count(*) = 1
)

select 'Data Type mismatch Identified over SAME COLUMN name', 
        t.pk_column_name, 
        t.pk_table_name, 
        ty1.name, 
        t.pk_type_length,
        t.fk_column_name, 
        t.fk_table_name, 
        ty2.name, 
        t.fk_type_length
from    @tbl t
        inner join sys.types ty1 on t.pk_user_type_id = ty1.user_type_id
        inner join sys.types ty2 on t.fk_user_type_id = ty2.user_type_id
where   t.pk_user_type_id != fk_user_type_id
order by 2

Oracle

select 'Data Type mismatch Identified over SAME COLUMN name' as reason, _
        p.pk_column_name, p.table_name, tc.column_name as fk_column_name, _
        tc.table_name as fk_table_name, p.data_type, p.data_length
 from user_tab_columns tc 
      inner join 
       (
      select  distinct ucc.column_name as pk_column_name,
              t.table_name as table_name,
              c.data_type,
              c.data_length
      from    user_tables t
      inner join user_tab_columns c on t.table_name = c.table_name
      inner join user_constraints uc on t.table_name = uc.table_name
      inner join user_cons_columns ucc on uc.constraint_name = ucc.constraint_name          
      where    uc.constraint_type = 'P'
      and        ucc.column_name not in   ('CREATED_ON', 'UPDATED_ON', 'CREATED_BY', 'UPDATED_BY')
      and        ucc.column_name not like 'QRTZ_%'
      ) p on tc.column_name = p.pk_column_name
      where tc.table_name != p.table_name;

Points of Interest

These test cases help us a lot in not only standardizing our database design, but also help in performance optimization in some cases of our products. Adding new test cases is also very easy depending upon business cases.

History

Version 1.0: First version
Version 1.1: Added Oracle support

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)