The readers of this article are going to get some professional life tips related the subject matter. This article also highlights the importance of choosing the correct use of join in your SQL queries to meet business objectives.
About Cartesian Product
It would not be much fun for beginners if we just start SQL examples of Cartesian Product without first developing some understanding of it.
Let us explore it first with the help of some simple definitions and examples.
What is Cartesian Product?
A Cartesian Product is a product of two sets of elements where each element in one set is multiplied by every other element of the other set.
In simple words, if we have the following sets:
A={1,2}, B={3,4}
Their Cartesian Product will be as follows:
C=AXB={(1,3),(1,4),(2,3),(2,4)}
As you can see, the element 1
in set A
is multiplied with all the elements 3,4
in set B
to become (1,3)
and (1,4)
and similarly the second element 2
is multiplied with all the elements in set B
to from (2,3)
and (2,4)
.
What is Cross Product?
Cartesian Product is also known as Cross Product as the multiplication is applied across all the elements in one set with all the elements of the other set.
Difference between Cartesian Product and Cross Join
A Cross Join is actually the SQL name for Cartesian Product where the elements can be thought of columns and sets can be thought of rows.
Example of Cartesian Product
If we have two different tables in a SQL database with each having 3 rows, we will get 9 rows in total after applying the Cartesian Product.
Applying Cartesian Product (Cross Join) in SQL
Let us go through a couple of examples to understand how it works and why it is needed in SQL.
This article assumes that readers are familiar with basic relational database concepts and comfortable to write and run SQL scripts on a locally or remoted installed SQL instance.
Handling Scattered Examination Data
How would you handle a professional life scenario where you are handed over a disconnected or rather scattered data of examination bundled into a database to be refined further in order to make it more meaningful and understandable?
Well, perhaps you don’t know the solution until you analyze the data to choose the best suited option and that is what we are trying to understand.
Business Requirements
An Examination services provider company maintains the data in a SQL database but they source the data from different places in such a way that you have the following two tables:
Now there is a table (Student
) that contains the information about the registered students while the type of exam each of the students has to take once registered is stored in another table called Exam Type.
The problem is that these two tables are completely unlinked and you have to come up with a solution that each individual student should be allocated all the available exam types without manually entering the data.
Proposed Solution
The proposed solution is to get Cartesian Product of the two unlinked tables such that they become associated with each other the way the business wants it to see.
Setup Sample Exam Database
Let us setup a sample database called ExamCart
by running the following script against master database:
USE MASTER
GO
CREATE DATABASE ExamCart
GO
USE ExamCart
CREATE TABLE dbo.ExamType
(
ExamTypeId INT IDENTITY(1,1),
Name VARCHAR(50)NOT NULL,
Detail VARCHAR(200)NULL
CONSTRAINT PK_ExamType_ExamTypeId PRIMARY KEY (ExamTypeId)
)
GO
SET IDENTITY_INSERT dbo.ExamType ON
GO
INSERT INTO dbo.ExamType
(
ExamTypeId
,Name
,Detail
)
VALUES
(
1
,'Mid-Term Exam'
,'This is Mid-Term Exam'
),
(
2
,'Final Term Exam'
,'This is Final Term Exam'
),
(
3
,'First Term Exam'
,'This is First Term Exam'
);
GO
SET IDENTITY_INSERT dbo.ExamType OFF
GO
Now just to impersonate the professional life scenario soon after some time, the Student
table about the registered students was added to the database.
So, please add a Student
table by running the script against ExamCart
database this time as follows:
USE ExamCart
DROP TABLE if EXISTS dbo.Student
CREATE TABLE dbo.Student
(
StudentId INT IDENTITY(1,1),
Name VARCHAR(50)NOT NULL,
Detail VARCHAR(200)NULL
CONSTRAINT PK_Student_StudentId PRIMARY KEY (StudentId)
)
GO
SET IDENTITY_INSERT dbo.Student ON
GO
INSERT INTO dbo.Student
(
StudentId
,Name
,Detail
)
VALUES
(
1
,'Atif'
,'This is Atif'
),
(
2
,'Sam'
,'This is Sam'
),
(
3
,'Sarah'
,'This is Sarah'
);
GO
SET IDENTITY_INSERT dbo.Student OFF
GO
Quick Check
Let us take a quick look at the student
table by running the following script against the sample database:
SELECT
s.StudentId
,s.Name
,s.Detail
FROM dbo.Student s
The output is as follows:
Please note that I am using dbForge Studio for SQL Server to write and run the T-SQL scripts, so the output look (display) may be different but query results remain unaffected (are not going to be different) if you are using any other tool like SQL Server Management Studio (SSMS).
Similarly, let us see the contents of ExamType
table as follows:
SELECT et.ExamTypeId, et.Name, et.Detail FROM dbo.ExamType et
The output is as follows:
Applying the Cartesian Product of Both Tables
Now in order to assign each examination type to each student, we need a Cartesian Product of both tables which can be successfully achieved by the following query:
SELECT s.StudentId,s.Name,s.Detail,et.ExamTypeId,et.Name AS Exam_Name, et.Detail AS Exam_Detail
FROM dbo.Student s,dbo.ExamType et
The results are as follows:
Applying Cartesian Product Using Cross Join
The same results can also be achieved using Cross Join
keyword:
SELECT
et.ExamTypeId,et.Name AS Exam_Name,et.Detail AS Exam_Detail,
s.StudentId,s.Name,s.Detail
FROM dbo.Student s CROSS JOIN dbo.ExamType et
order BY et.ExamTypeId
The results are as follows:
Creating a Cartesian Product StudentExam Table
A better way to store your results physically is to create a table on the fly that contains the result of the product of the two tables.
However, we need to create a distinct id column to get it qualified for a primary key later on as per standard practice of creating tables in a SQL database.
To create a unique id, we are going to use ROW_NUMBER()
function that assigns a new id starting from 1.
This is achieved as follows:
SELECT
ROW_NUMBER() OVER (ORDER BY et.Name) as StudentExamId,
et.ExamTypeId,et.Name AS Exam_Name,et.Detail AS Exam_Detail,
s.StudentId,s.Name,s.Detail INTO StudentExam
FROM dbo.Student s CROSS JOIN dbo.ExamType et
Let us expand the ExamCart
databases node to locate the newly created StudentExam
table as a result of Cartesian Product (Cross Join):
Professional Life Tip
Please remember that Cross Join or Cartesian Product of two large tables (having many rows) could become a very resource intensive operation which means your SQL Server may be needing a lot of power to perform this operation and that can cause performance issues and in order to resolve the performance issue, you may have to scale up (add more to) your Server resources (which is often easily possible in cloud environment) but that may also come up with some more cost.
However, it is a safe operation that does the job it is supposed to do and can also be used to denormalize the data (a standard practice) in a typical data warehouse business intelligence solution.
Congratulations! You have successfully learned about Cartesian Product in SQL along with the steps to implement it by working out your solution.
History
- 6th July, 2021: Initial version