Introduction
Online transaction processing (OLTP) database contains huge transaction records. Sometimes, it has millions and billions of rows. These data are stored in relational database. When business requires analysis of these transactions, it takes huge amount of time to process these data. Business user wants to analyse these data in small time period. They don't want to have complexity in schema too. Sometimes, business user wants to analyse data from other source too. These other source data could be from different business sources. So database designer comes with a solution to implement Datawarehouse system for business data. This system is separate from OLTP database. So, analysis process becomes easier than before.
Background
Datawarehouse system consist of Fact table, Dimension table and Auxiliary table. Datawarehouse system can have multiple OLTP system as data source. It can be CRM system, ERP system, CMS system or Flat file data. Tables in Datawarehouse system are organised in specific schema. This schema can be either star schema or snowflake schema. We are going to cover each topic of datawarehouse system in the same article.
Using the Code
CREATE TABLE SalesRegion (
SalesRegionId INTEGER NOT NULL,
RegionName VARCHAR(50) NOT NULL,
CONSTRAINT SalesRegion_pk PRIMARY KEY (SalesRegionId)
);
CREATE TABLE DimSalesMan (
SalesRegionId INTEGER NOT NULL,
SalesManId INTEGER NOT NULL,
SallesmanName VARCHAR(50) NOT NULL,
CONSTRAINT DimSalesMan_pk PRIMARY KEY (SalesRegionId, SalesManId)
);
CREATE TABLE DimDate (
DateId INTEGER NOT NULL,
DateValue DATE NOT NULL,
MonthValue INTEGER NOT NULL,
YearValue INTEGER NOT NULL,
DateQuarter INTEGER NOT NULL,
DateHalfYear INTEGER NOT NULL,
CONSTRAINT DimDate_pk PRIMARY KEY (DateId)
);
CREATE TABLE Currency (
CurrencyId INTEGER NOT NULL,
CurrencyName VARCHAR(50) NOT NULL,
CurrencyFxRate DECIMAL(18) NOT NULL,
CONSTRAINT Currency_pk PRIMARY KEY (CurrencyId)
);
CREATE TABLE DimProduct (
ProductId INTEGER NOT NULL,
CurrencyID INTEGER NOT NULL,
ProductName VARCHAR(50) NOT NULL,
ProductCost NUMERIC(18,2) NOT NULL,
CONSTRAINT DimProduct_pk PRIMARY KEY (ProductId, CurrencyID)
);
CREATE TABLE FactSales (
SalesRegionId INTEGER NOT NULL,
SalesManId INTEGER NOT NULL,
ProductId INTEGER NOT NULL,
CurrencyID INTEGER NOT NULL,
DateId INTEGER NOT NULL,
SalesId INTEGER NOT NULL,
SalesDescription VARCHAR(100) NOT NULL,
CONSTRAINT FactSales_pk PRIMARY KEY _
(SalesRegionId, SalesManId, ProductId, CurrencyID, DateId)
);
ALTER TABLE DimSalesMan ADD CONSTRAINT SalesRegion_DimSalesMan_fk
FOREIGN KEY (SalesRegionId)
REFERENCES SalesRegion (SalesRegionId)
ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimSalesMan_fk
FOREIGN KEY (SalesRegionId, SalesManId)
REFERENCES DimSalesMan (SalesRegionId, SalesManId)
ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimDate_fk
FOREIGN KEY (DateId)
REFERENCES DimDate (DateId)
ALTER TABLE DimProduct ADD CONSTRAINT Currency_DimProduct_fk
FOREIGN KEY (CurrencyID)
REFERENCES Currency (CurrencyId)
ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimProduct_fk
FOREIGN KEY (ProductId, CurrencyID)
REFERENCES DimProduct (ProductId, CurrencyID)
Fact Table
Table which contains fact information is known as fact table. This information can be measured in analysis such as Total Sales Amount, Total Order Count, Total Product Sales, etc. are different measures of business. This measure information is stored in Fact table. Normally, Fact table contains transaction of business. e.g. FactSales table is fact table. It contains business transaction.
Dimension Table
Table which contains dimension for analysis of data is known as Dimension table. It can be Date Dimension, Product Dimension, Region Dimension, Currency Dimension, etc. Analysis of Fact table are process by Dimension table. One can analyse business transaction on the basis of period wise, region wise, product wise. Data in dimension table are normally fixed nature. This data normally does not modify or modify in some condition when updated set is required in business. It may be modified in a month or in a year, e.g. DimDate
, DimProduct
, DimRegion
are dimension tables.
Attribute of Dimension Table
Attribute of dimension table could be product name, product cost, product color, product category, product weight, etc.
Natural Key
The default key assigned to attribute is known as Natural key, e.g. ProductCode
is natural key for any product.
Surrogate Key
Numeric value used as key which is used to join dimension table with fact table is known as surrogate key, e.g., ProductId
is surrogate key for product dimension table. It joins with FactSales
fact table to get product sales information.
Database Schema
Datawarehouse
is a relational database where Fact
and Dimension
tables are related to each other. Before defining datawarehouse sytem, its structure must be defined. All tables in datawarehouse are structured in specific schema. It can be Star schema or Snowflake schema.
Star Schema
Database schema in which dimension tables are directly connected with fact table is known as Star Schema. It is the most popular schema used in designing datawarehouse system. It is easy to implement and relate dimension tables with fact tables.
Snowflake Schema
Database schema in which dimension tables are connected with fact table but dimension table is itself related with another dimension table to get data is known as Snowflake Schema. Snowflake schema makes datawarehouse designing complex. If possible, we should avoid snowflake schema using view where multiple dimension tables are joined together and used as simple dimension table.
Junk Dimension
Some values in data warehouse system are much less in number of rows. We cannot include them as dimension tables and also cannot avoid in system such as Global setting value for datawarehouse system. We can put all require global setting in separate setting table but it is not considered as dimension table. It is termed as Junk Dimension.
Degenerate Dimension
Some tables in datawarehouse system contains attribute of fact table. These attributes are used in analysis of business such as Transaction number in fact table is maintained as degenerate dimension.
Slowly Changing Dimension
It is process managing changes of data in dimension table. When data in dimension table is modified, it is required to maintain change history of data. It gives an accurate picture to business when analysis depends on different duration of time, e.g., Global business sales depends on currency rate. Each country has its different currency and the exchange rate gets updated from time to time. If some sales occurred in Jan 2014 with XXX currency exchange rate and it is modified in July 2014, the correct analysis is only possible when we can evaluate sales value exactly based on that duration exchange rate. If we calculate sales amount on current exchange rate value, then sales figure will not be accurate.
SCD type1
In SCD type1, dimension table overwrites the existing value of dimension attribute with new value. So, no change history is maintained for any changed attribute value. The dimension table will contain only the last updated value, e.g. in dimcustomer table, the customer address can be maintained as SCD type1 so that it will hold only the current address of customer.
SCD type2
In SCD type2, complete track of change history is maintained. So for every change in attribute value, insert new record in dimension table. History is tracked through date range given with record, e.g. DimCurrencyFxRate
should be of type SCD type2 because currency exchange rate is valid from specific date to certain date range only.
SCD type3
In SCD type3, only last few change history is maintained. The number of changes tracking is based on business requirement. So such dimension table contains multiple columns for the same attribute, e.g., DimProduct
table can contain three column names for Productname
naming Productname1
, Productname2
, Productname3
. This will help when business team asks to change the name of any product. Productname3
will contain the latest change name, so on Productname2
and Productname1
can contain previous 2 names of product.
Bridge Table or Factless Fact Table
It is an intermediate table which allows to join dimension table with another dimension table. It does not contain transaction detail, rather it contains key attribute with other columns.
Snapshot Fact Table
Table which contains aggregate of transaction of Fact table is known as snapshot fact table. When business transaction is huge, entry in fact table will be huge. This will make analysis process time consuming so that fact table transactions are aggregated and stored in separate snapshot table.
Transaction Fact Table
Table which contains complete details of transaction is known as transaction fact table. Attribute in transaction fact table contains each details of transaction, e.g., transaction number, productid, transaction date, transaction user, etc.
Points of Interest
This article contains information on terminologies used in datawarehouse. This will help user to understand concepts before designing new datawarehouse system.
History
- 22nd July, 2014: Initial version