Introduction
The purpose of this article is to show the difference between And
clause with On
and Where
clause when using with left join in SQL Server.
Background
I have been working on SQL Server databases for more than 5 years and I was not aware of the difference between the On
clause and Where
clause when used with left join. Once I asked this question to our DBA (who has 10+ years of experience) and he said that there is some difference but he was not be able to explain the difference. Then I started exploring this topic myself and got some interesting observations of these two clauses when used with left join. So today I decided to share my knowledge so that everyone could benefit from my work.
Using the code
Today I will explain the difference between On
clause and Where
clause when used with left join in SQL Server. When the On
clause is used in an outer join, the outer table will have no effect on this On
clause and all rows from the outer table will be returned and the On
clause determines which rows of the subordinate table joins to the outer table. Rows of the outer table that do not meet the condition specified in the On
clause in the join are extended with null values for subordinate columns (columns of the subordinate table), whereas the Where
clause filters the rows that actually were returned to the final output.
It’s difficult to absorb this from the above definition, so let’s try to understand this difference with the help of an example. Suppose we have two tables Departments (deptId, deptName) and Employees (empID, DeptID, EmpName, Salary) and deptiD is the foreign key of the department table. An employee can have only one department where as a department can have many employees.
We have this sample data in the Department table:
DeptId DeptName
----------- ---------------
1 HR
2 Payroll
3 Admin
4 Marketing
5 HR & Accounts
(5 rows)
And here is the sample data for the Employees table:
EmpId EmpName DeptId EmpSalary
----------- --------------- ----------- ---------------------
1 John 1 5000.00
2 Albert 1 4500.00
3 Crain 2 6000.00
4 Micheal 2 5000.00
5 David NULL 34.00
6 Kelly NULL 457.00
7 Rudy 1 879.00
8 Smith 2 7878.00
9 Karsen 5 878.00
10 Stringer 5 345.00
11 Cheryl NULL NULL
(11 rows)
Case 1
Suppose we are asked to show all the employees and their relevant departments, then we would write a query like this:
select * from employees e
<p>left join departments d on e.deptid = d.deptid</p>
(query1)
and the result of this query is:
EmpId EmpName DeptId EmpSalary DeptId DeptName
----------- --------------- ----------- --------------------- ----------- ---------------
1 John 1 5000.00 1 HR
2 Albert 1 4500.00 1 HR
3 Crain 2 6000.00 2 Payroll
4 Micheal 2 5000.00 2 Payroll
5 David NULL 34.00 NULL NULL
6 Kelly NULL 457.00 NULL NULL
7 Rudy 1 879.00 1 HR
8 Smith 2 7878.00 2 Payroll
9 Karsen 5 878.00 5 HR & Accounts
10 Stringer 5 345.00 5 HR & Accounts
11 Cheryl NULL NULL NULL NULL
(11 rows)
(Table 1)
Here you can see that all rows from the Employees table are listed along with their department, if they have a department; otherwise, deptid and deptname are null.
Case 2
Let’s say we are asked to show the list of all employees and the department name of only those employees who have "HR" or "HR & Accounts" department; then we would write a query like this:
select * from employees e
left join departments d on e.deptid = d.deptid
and ( d.deptname = 'HR' or d.deptname = 'HR & Accounts')
(query 2)
and the result of this query is:
EmpId EmpName DeptId EmpSalary DeptId DeptName
----------- --------------- ----------- --------------------- ----------- ---------------
1 John 1 5000.00 1 HR
2 Albert 1 4500.00 1 HR
3 Crain 2 6000.00 NULL NULL
4 Micheal 2 5000.00 NULL NULL
5 David NULL 34.00 NULL NULL
6 Kelly NULL 457.00 NULL NULL
7 Rudy 1 879.00 1 HR
8 Smith 2 7878.00 NULL NULL
9 Karsen 5 878.00 5 HR & Accounts
10 Stringer 5 345.00 5 HR & Accounts
11 Cheryl NULL NULL NULL NULL
(11 rows)
Table (2)
You can notice here that only the same number of rows are returned as in query 1, but here the department of only those employees are listed whose department name is "HR" or "HR & Accounts". As you can see, "Crain"," Micheal", and "Smith" have department "Payroll" (see table 1), but it’s not listed in the above result set because here we applied the filter in the "On
clause"; hence only those rows become part of the join that has department name "HR" or "HR & Employee", and as a result, all other employees who don’t have department name "HR" and "HR & Accoutns" show their department name as null. This is an example of the On
clause in left join.
Case 3
What if we are asked to show only those employees who have their department name "HR" or "HR & Accounts"? We would write a query like this:
select * from employees e
left join departments d on e.deptid = d.deptid
where ( d.deptname = 'HR' or d.deptname = 'HR & Accounts')
(query 3)
and the result of this query is:
EmpId EmpName DeptId EmpSalary DeptId DeptName
----------- --------------- ----------- --------------------- ----------- ---------------
1 John 1 5000.00 1 HR
2 Albert 1 4500.00 1 HR
7 Rudy 1 879.00 1 HR
9 Karsen 5 878.00 5 HR & Accounts
10 Stringer 5 345.00 5 HR & Accounts
(5 rows)
Table (3)
You can see here that only those employees are listed who have their department "HR" or "HR & Accounts", so what we did here is the same query as query 1. You can check the result of query 1 in table 1, and then we just applied a filter in the Where
clause to filter the rows and to return only those employees who have their department name as "HR" or "HR & Accounts". This is an example of Where
clause in left join.
Now you have seen all three cases and can see how these differ from each other. This was an example of a (1:1) one to one table as one employee can have only one department. Let's execute the above 3 cases with (1:M), a one to many table relationship where one department can have multiple employees (or more than one employee can be in one department).
Case 4
Let’s suppose our requirement is to get all departments and the employees listed in these departments, then we would write a query like:
select * from departments d
left join employees e on e.deptId = d.deptId
(query 4)
It will return all rows from the Department table and only those rows from the Employees table that have a department. The result will look like:
DeptId DeptName EmpId EmpName DeptId EmpSalary
----------- --------------- ----------- --------------- ----------- ---------------------
1 HR 1 John 1 5000.00
1 HR 2 Albert 1 4500.00
1 HR 7 Rudy 1 879.00
2 Payroll 3 Crain 2 6000.00
2 Payroll 4 Micheal 2 5000.00
2 Payroll 8 Smith 2 7878.00
3 Admin NULL NULL NULL NULL
4 Marketing NULL NULL NULL NULL
5 HR & Accounts 9 Karsen 5 878.00
5 HR & Accounts 10 Stringer 5 345.00
(10 rows)
(Table 4)
You can see in the result that all departments are listed, even those that don’t have any employees, as "Admin" and "Marketing".
Case 5
But what if we are asked to show all departments and only those employees who have department name "HR or "HR & Accounting". Then we would write a query like:
select * from departments d
left join employees e on e.deptId = d.deptId and
( d.deptname = 'HR' or d.deptname = 'HR & Accounts')
(query 5)
It will show the following records:
DeptId DeptName EmpId EmpName DeptId EmpSalary
----------- --------------- ----------- --------------- ----------- ---------------------
1 HR 1 John 1 5000.00
1 HR 2 Albert 1 4500.00
1 HR 7 Rudy 1 879.00
2 Payroll NULL NULL NULL NULL
3 Admin NULL NULL NULL NULL
4 Marketing NULL NULL NULL NULL
5 HR & Accounts 9 Karsen 5 878.00
5 HR & Accounts 10 Stringer 5 345.00
(8 rows)
Table (5)
You can see in the result, this is the On
clause in left join; it just return all rows from the Department table and only those rows from the Employee table that have department "HR" or "HR & Accounts". We have employees in the Payroll department (Table 4 as a result of query 4) but it’s not listed in the above result set as we just put an extra condition in the "On
clause" that the department name should be "HR" or "HR & Accounts". That’s why only those rows were part of this join which have department name "HR" and "HR & Accounts". Hence all rows from the Department table were returned and only those matching rows were returned from the Employees table who have their department as "HR" or "HR & Accounts".
Case 6
What if we are asked to show only the "HR" and "HR & Accounts" departments along with their relevant employees? We would then write a query like this:
select * from departments d
left join employees e on e.deptId = d.deptId
where ( d.deptname = 'HR' or d.deptname = 'HR & Accounts')
(query 6)
and the result of the above query would be:
DeptId DeptName EmpId EmpName DeptId EmpSalary
----------- --------------- ----------- --------------- ----------- ---------------------
1 HR 1 John 1 5000.00
1 HR 2 Albert 1 4500.00
1 HR 7 Rudy 1 879.00
5 HR & Accounts 9 Karsen 5 878.00
5 HR & Accounts 10 Stringer 5 345.00
(5 rows)
(Table 6)
So what happened here? We just did a left join as we did in query 4, and you can see the result in table 4. And then we apply a filter on the result set to return only those rows that have deptNmae as "HR" or "HR&Accounts". This is a Where
clause in left join. Now you can see how it is different from the result of query 5 where we put this condition in the On
clause.
There is no difference between the Where
clause and On
clause when used with inner join.
I have shown you six different cases to explain the concept. I hope you enjoyed reading the article. I have attached a script that can be used to create the tables and sample data and then you can practice all by yourself. I would be waiting for your feedback.
References