Introduction
Welcome to this, my first article in many years. I am intending on doing a series of articles on SQL, one a week. I know I am probably not writing anything that’s not already on the site, but my goal is to provide a series of articles that cover things in an organised way, almost like a book that people can work through. I suspect a lot of developers work with databases, and know a little bit of SQL, but really don’t understand how to really get the most out of it. Hopefully I can help change that.
Background
This first article is going to start with basics. I assume you know how to write a ‘where’ clause, and I’m going to start by talking about the different ways of connecting tables to select data between them. If you didn’t download the code for this article, please do so now. It contains all the SQL you’ll want to run in Management Studio while reading through this.
Download it now. I’ll wait….
OK, back ? The first thing you’ll see is code that looks like this:
with listall(id) as
(
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10
),
listodd(id) as
(
select 1 union all
select 3 union all
select 5 union all
select 7 union all
select 9 union all
select 11
),
listeven(id) as
(
select 2 union all
select 4 union all
select 6 union all
select 8 union all
select 10
)
Don’t worry if you have no idea what this is ( these are Common Table Expressions, the subject of the second article ). Suffice it to say, this creates three in memory tables, one of the sequence 1-10, one of only odd numbers to 11 and one of only even, to 10.
INNER JOIN
I assume that everyone knows what this does:
select * from listall
It returns the contents of the listall table ( 1-10). Now, what does this do:
select * from listall la inner join listodd lo on lo.id = la.id
It returns the odd sequence to 9, because an inner join will return only rowsets where the two tables can be connected. This is useful where you have, for example, an order table with a client id, and a client table where that id gives you all the client’s details. Now, what does this do ?
select * from listeven le inner join listodd lo on lo.id = le.id
It returns nothing, because the odd and even rows have no rows in common.
OUTER JOIN
Now, what does this do ?
select * from listall lo left outer join listeven le on lo.id = le.id
It returns all our rows from the table on the LEFT of the select, even if the join does not match. Where the join does not match, a null is returned. This can be useful where you are using the join to work out if something exists, you can use a case statement to set a value according to if a match was found, like so:
select lo.id, isEven = case when le.id is null then 0 else 1 end from listall lo left outer join listeven le on lo.id = le.id
The other thing you can do is use COALESCE or ISNULL to set a default value, like so:
select lo.id, isnull(le.id, -1) from listall la left outer join listeven le on la.id = le.id
As you can see, you can do a lot more than just return raw data, in SQL.
Now, what does this do:
select * from listeven le left outer join listall la on la.id = le.id
It returns only the even numbers. The join is on the LEFT, and the left most table only contains even numbers. To get the same result, if you need this table order, you do this:
select * from listeven le right outer join listall la on la.id = le.id
If you need nulls from both sides, you can do this:
select * from listeven le full outer join listodd lo on lo.id = le.id
Note if you try to order this using orderby, you will find that nulls get ordered first. To get this in order again, you’d do this:
select coalesce(le.id, lo.id) as idList, le.id, lo.id from listeven le full outer join listodd lo on lo.id = le.id order by idlist
As you may be aware, coalesce, like isnull, will return the second value if the first one is null. The difference is that coalesce ONLY does this, and ISNULL can evaluate an expression as the first argument.
One thing to add, the word ‘OUTER’ is optional, you can just specify ‘left join’, ‘right join’ or ‘full join’.
CROSS JOIN
This is probably the one people know the least. What do you expect this to do ?
select * from listeven le cross join listodd lo
If you run it, you will get 30 results. There are 5 numbers in listeven and 6 in listodd. 5 times 6 is 30 results. Cross join returns every row in table 1, combined with every row in table 2. This is a very powerful technique, when used correctly. There will likely be an article down the track on cool things you can do with cross joins, and I am certain that cross joins will feature in next week’s article. As you can see, because it just returns EVERY possible combination, there is no column to join on, in fact, if the two tables have columns in common, both values are returned, and you will have two columns with the same names, unless you alias them like this:
select le.id, lo.id as id2 from listeven le cross join listodd lo
You are not obliged to return ALL the data cross joined, you can specify columns and alias them, just like any other select.
A short hand for a cross join, is a comma, as in:
select * from listeven, listodd
However, while you may see this used in online examples, it has not been 'valid' SQL for over 20 years, and no database implementation is required to support it, so you should not use it in your own code.
INTERSECT
Although they are not really joins, I am going to cover two other things here. The first is ‘intersect’. It works like this:
select * from listall
intersect
select * from listodd
In this case, listodd has one value not in listall ( 11), so the result is the contents of listodd, minus the 11. Sometimes you will have two result sets, and you will want to know what values are in common between them. INTERSECT allows you to do this. Of course, this can be done by intersecting two select statements that contain as many joins and other operations as you wish. Select * is acceptable syntax, but it’s required that both tables have the same column names and data types. Be careful, sometimes you will only care about an intersection of one column ( usually a primary key ), in which case, it probably makes sense to have CTEs that return the primary key values only, and one that returns all the data, so you can end up with code like this ( this one is not in the file, as it won’t run )
select * from myClientList where clientid in ( select clientid from list1 intersect select clientid from list2)
This would of course be at the bottom of statements defining three possibly complex CTEs. Hopefully you can see how CTEs increase readability and maintainability of SQL code.
EXCEPT
I recently had the task of transforming a large data set from one database/format to another. We had a ton of complex business rules, so I ended up with a view that grabbed all the data from the old DB, a lot of views to process different data types, then a view at the top to aggregate all those values, and turn them in to one list again. It was very complex, but at the end, we wanted to make sure we had not missed any records. This is exactly the sort of situation where the ‘EXCEPT’ operation is invaluable.
select * from listall
except
select * from listodd
This returns everything in the first table that is not in the second ( but NOT vice versa ). So, while the number ‘11’ is in listodd and not in listall, it is not returned by this operation. In my case, this returned all of the documents in my base list, that were not already in the top level list, and which had therefore not fulfilled any business rules and would otherwise have been lost. Given the nature of our data transformations, my query was a little more complex ( it was a union of three selects ), and I had to do an EXCEPT on my primary key only, then select all my data on the basis of that operation. If ANY row differs, the data will be returned as being different, so if you want to find only what records are not there ( regardless of if they have been transformed in any way ), then an operation on just the key is what you need.
UNION and UNION ALL
I nearly forgot to cover these. What if, given our data set, we wanted a sequence from 1 to 11 ? This will do that:
select * from listall
union
select * from listodd
order by id
It returns the combination of values from both tables, removing duplicates. If you want to see all values, including duplicates, do this:
select * from listall
union all
select * from listodd
order by id
I have seen some truly awful SQL code based on unions. Remember, union means, run these separate SQL queries, then merge them. If you can instead create one query, it’s bound to be faster ( but UNION itself is VERY fast, what I mean is, if you run two queries that are complex, you pay the cost of both queries, instead of just running one ). However, UNION was invaluable in the task I was describing earlier, because I had a base view, and different views for different business rules, which then needed to be pulled together again. It WAS slow, but it was the best possible tool for the task at hand.
Several people suggested I add a visual representation of the different join types. There is already an excellent article on CP that does this, you can see it here. I see no reason to duplicate it, when it is better for people seeking more information to read that article, as well as mine. I will add, the 'excluding joins' the author shows are perfectly valid, for the purposes of the techniques I've shown, I'd use the 'EXCEPT' keyword to get the same result. I am not sure which is faster.
As you can see, there’s a lot more to SQL than just the select statement and inner joins. Next week I’ll set out to explain how to get the most out of CTEs. If you have any thing you’d like me to specifically cover, or even any general SQL questions, please ask. I promise I’ll answer every question, even if I sometimes have to say that I’m not sure ( but I’ll try to find out ).
Part two ( which covers SELECT statements ) can now be read here.