Introduction
As a software developer, I am often called upon to write SQL queries to retrieve data from the database. Since being the DBA is part of my role, I have found it quite natural to do so. However, I have come across a number of developers who do not understand how to write an efficient T-SQL command. This article is designed with the developers in mind but it will be equally useful for database developers who want to brush up on their skills. This particular article focuses on the basics of how to retrieve data from a database.
Purpose
Throughout this article I will cover the basics of how to access data from a SQL database as well as my suggestions for how to best use each of these tools. Most of these techniques should work on any modern version of Microsoft SQL. I will do my best to point out when commands are only available in certain versions of SQL. All queries have been designed to run against the AdventureWorks database. Microsoft provides this sample database for free on their website.
Please note that it is outside the scope of this document to really discuss in depth how these queries should be called (from code directly or through stored procedures). I would say that, when in doubt, use a stored procedure.
T-SQL Test Platform
Included with this article is a very simple test platform that I developed to help you see how these commands would run. With this, you can play with the different options and mix and match the different commands in a safe manner. The tool is not meant to be production-ready code. It is simply a way to quickly execute these commands without needing to open a SQL command window. I’ve included the EXE for those who don’t want to load the project and build it. This project also comes loaded with all of the samples that I give you in this article.
Standard Commands
The first thing you learn when being taught how to access data in a database are the following commands: SELECT, FROM, WHERE, ORDER BY. These command are the foundation on which most queries are built. SELECT tells the system what columns you want, FROM says where to get the columns, WHERE filters the results, and ORDER BY puts the results into the desired order. Here is a basic example of this type of query:
SELECT Name, ProductNumber, Color
FROM Production.Product
WHERE Color = 'Black'
ORDER BY Name ASC, ProductNumber ASC
Name ProductNumber Color
Chainring CR-7833 Black
Full-Finger Gloves, L GL-F110-L Black
Full-Finger Gloves, M GL-F110-M Black
Full-Finger Gloves, S GL-F110-S Black
Half-Finger Gloves, L GL-H102-L Black
…
(93 row(s) affected)
Best Practice: The ORDER BY command can order in ascending or descending order. This is specified by an ASC or DESC after the column name. If you do not specify anything, the column is sorted ascending. However, it is good practice to write this out instead of leaving it implied.
Best Practice: While SELECT * will give you every column, avoid this method if at all possible. By specifying which columns you want, you will reduce the bandwidth needed to transfer the results to the client.
Key Point: You do not need to have a column listed in the SELECT statement in order to use it in the WHERE clause. It simply needs to be a column in one of the referenced items in the FROM statement.
Key Point: Query sections get evaluated in the following order: FROM, WHERE, SELECT, ORDER BY. This means that if you create an alias (see below) in your SELECT statement, the WHERE statement will not be able to reference it (since it hasn’t been evaluated yet) but the ORDER By statement can use it.
Key Point: Query results are not ordered unless specifically given an order. It may seem like they are always returned in the same order without an ORDER BY statement but this is not reliable. If you want your records returned in a specific order, you must specify it.
Filtering Results with WHERE
We already touched on how to use a WHERE statement above but I wanted to go into a bit more depth on what you can do and, more importantly what you should and should not do using a WHERE statement. The WHERE statement at its best utilizes the table’s index in order to quickly limit what data is returned in the result set. For example, this query takes advantage of a nonclustered unique index on the Name column:
SELECT Name,StandardCost
FROM Production.Product
WHERE Name = 'HL Bottom Bracket'
Name StandardCost
HL Bottom Bracket 53.9416
(1 row(s) affected)
What if, however, we wanted every bottom bracket in our database, not just the HL Bottom Bracket? In that case, we could use the LIKE keyword instead of the equals in our WHERE statement and use the percent (%) sign to represent a multi-character wildcard. That query would look like this:
SELECT Name,StandardCost
FROM Production.Product
WHERE Name LIKE '% Bottom Bracket'
Name StandardCost
HL Bottom Bracket 53.9416
LL Bottom Bracket 23.9716
ML Bottom Bracket 44.9506
(3 row(s) affected)
That gives us the results we wanted but at a fairly significant cost. The first query used an Index Seek (very efficient) while this second query uses an Index Scan (inefficient). The reason for this is because we put the wildcard at the beginning of the statement. Sometimes you cannot help this. For example, in this query it would be very difficult to find all of the items with the text “Bottom Bracket" in their text. Moving the wildcard anywhere else in the text will allow the system to do an Index Seek again.
Best Practice: When you have multiple statements in the WHERE section, use parenthesis to ensure the proper order of execution. For example, (Name=’Wrench’ AND Color=’Blue’) OR Color=’Black’
is different from Name=’Wrench’ AND (Color=’Blue’ OR Color=’Black’)
.
Best Practice: As we learned above, avoid putting a wildcard character at the beginning of the search text. This will dramatically improve the performance of your query.
Best Practice: Avoid performing functions on the left side of a WHERE statement. For example, the following would be functionally equivalent to the query we did above but it would actually perform even worse than our existing poorly performing query:
SELECT Name,StandardCost
FROM Production.Product
WHERE SUBSTRING(Name,4,14) = 'Bottom Bracket'
Name StandardCost
LL Bottom Bracket 23.9716
ML Bottom Bracket 44.9506
HL Bottom Bracket 53.9416
(3 row(s) affected)
Limiting Results with TOP
If you only want a certain number of records to be returned, you can use the TOP command to specify either the number of records or the percentage of records to return. Note that this does not filter the results, it just grabs the specified number (or percentage) of records off of the top of the query and returns them. This command can be useful in a number of ways. One way this can be useful is if you want to return only the top five items from your inventory based upon their cost. You would simply order the query by cost in descending order so that the most expensive product is on top and then you would select only the top five records. Here is an example of doing this:
SELECT TOP 5 Name,StandardCost
FROM Production.Product
ORDER BY StandardCost DESC
Name StandardCost
Road-150 Red, 62 2171.2942
Road-150 Red, 44 2171.2942
Road-150 Red, 48 2171.2942
Road-150 Red, 52 2171.2942
Road-150 Red, 56 2171.2942
(5 row(s) affected)
If you wanted to return the top five percent of your products based upon price, you would simply modify the above statement to begin SELECT TOP 5 PERCENT
…
If you want to get a random set of rows returned, use the TOP statement in your SELECT and put a ORDER BY newid()
at the end like so (I ran it twice to show you that it is random):
SELECT TOP 3 Name,StandardCost
FROM Production.Product
ORDER BY NEWID()
Name StandardCost
ML Fork 77.9176
Mountain-500 Silver, 40 308.2179
HL Touring Frame - Yellow, 50 601.7437
(3 row(s) affected)
Name StandardCost
Mountain End Caps 0.00
LL Road Frame - Red, 52 187.1571
Mountain-200 Black, 38 1251.9813
(3 row(s) affected)
Key Point: The method I showed you of finding the most expensive items does not take into account items with the same price. In my example, all five items that return have the same Standard Cost. If I only wanted the top three, which ones would it give me? The answer is the first three it found. This order might change without notice unless you had specified a second column to order by as well.
Linking two (or more) tables or queries together with JOINs
Linking multiple tables or queries together using a JOIN statement is a large topic on its own. While I could cover it in depth, there is already a great article on CodeProject that covers the issue very well. Here is the link: Visual_SQL_Joins.aspx. It even has pictures.
I will just add my comments and ideas on how to best implement joins. The one thing you will notice especially here but also throughout the rest of this article is that I advocate being explicit in whatever you do. Don’t assume the next person to come along will know all of the implicit rules. Besides, it usually makes the query more readable to write things out fully.
Best Practice: Using just the keyword JOIN implies an INNER JOIN. However, write it out. Don’t make the next person to come along remember if it is INNER or OUTER.
Best Practice: You can specify how the two tables are joined in the FROM statement or in the WHERE statement. For the sake of clarity, specify this information in the FROM statement. When you put it in the WHERE statement, it confuses the join information with the filtering information.
Best Practice: The keyword OUTER may be omitted when you specify the type of OUTER JOIN (LEFT, RIGHT, FULL). This may be more of a preference since it becomes obvious that it is an OUTER join but include the keyword OUTER.
Renaming items using the AS keyword
Sometimes a table has a column name that isn’t what you would want it named. It might be cryptic (GL00101 – this is an actual column name in the Microsoft Dynamics Great Plains database), too long (TheTotalNumberOfProducts), too generic (Name), or just not to your liking. Whatever the reason might be, you can rename the column by simply adding the AS keyword followed by the new name (as long as the new name does not conflict with an existing column in your query). Here is an example:
SELECT TOP 3 Name AS Product_Name, StandardCost
FROM Production.Product
WHERE Name LIKE 'Mountain%'
ORDER BY Product_Name ASC
Product_Name StandardCost
Mountain Bike Socks, L 3.3963
Mountain Bike Socks, M 3.3963
Mountain Bottle Cage 3.7363
(3 row(s) affected)
I limited the results to three since this is a demonstration query. Notice that I renamed the Name column to be Product_Name so that it was easier to read. It would make even more sense if we had a query that contained a customer’s name as well as the product name.
I intentionally made this query a bit tricky just to show off what we have already discussed. Notice that the WHERE clause references the Name column while the ORDER BY clause references the Product_Name column. Remember this is because the WHERE clause is evaluated before the SELECT while the ORDER BY is evaluated after the SELECT. If we wanted to use the name Product_Name in our WHERE clause, we could do so like this:
SELECT TOP 3 *
FROM (
SELECT Name AS Product_Name, StandardCost
FROM Production.Product
) AS Prod
WHERE Product_Name LIKE 'Mountain%'
ORDER BY Product_Name ASC
Product_Name StandardCost
Mountain Bike Socks, L 3.3963
Mountain Bike Socks, M 3.3963
Mountain Bottle Cage 3.7363
(3 row(s) affected)
Basically we need to create the query (without limits) inside of parenthesis and use that as the table in the FROM statement. Notice that we named the entire nested query as Prod. You need to name the query something so that you can reference it. The end result of all of this is the same except that we could use the column name Product_Name in our WHERE clause. When doing this, be careful to place items in the right query. For example, I put my “TOP 3" limiter in the outer query instead of the inner query because otherwise I would have gotten bad results (we need to apply our WHERE to the entire table, not just three records).
Finding ranges of data using BETWEEN and IN
We can use the typical operators like >, <, and = in our WHERE clause to find the data we want. However, there are a couple of extra commands that we can put in our toolbox as well. The first command we will look at is the BETWEEN operator. Here is an example of how to use this operator:
SELECT StandardCost
FROM Production.Product
WHERE StandardCost BETWEEN 1 AND 2
StandardCost
1.8663
1.8663
1.4923
1.8663
(4 row(s) affected)
The syntax is easy enough to read. This method is syntactically equivalent to the following statement:
SELECT StandardCost
FROM Production.Product
WHERE StandardCost >= 1
AND StandardCost <=2
StandardCost
1.8663
1.8663
1.4923
1.8663
(4 row(s) affected)
If you look at the execution plan, these two statements actually evaluate to the same plan which means there is no performance gain or loss by using the BETWEEN operator. The biggest thing we gain is a simpler statement. Note that the BETWEEN operator includes the lower and upper ends so if there were an item that sold for 2 dollars it would be included in this list. The same would be true for an item that cost one dollar.
The next operator we want to look at is the IN operator. This operator tells the system to find an exact match to one of the items in the list. The list can have two or more items in it. Here is an example of how to use the IN operator:
SELECT StandardCost
FROM Production.Product
WHERE StandardCost IN (1.8663, 44.9506, 1.4923)
StandardCost
1.8663
1.8663
1.4923
1.8663
44.9506
(5 row(s) affected)
If we were going to write this out using the standard operators, this is how it would look:
SELECT StandardCost
FROM Production.Product
WHERE StandardCost = 1.8663
OR StandardCost = 44.9506
OR StandardCost = 1.4923
StandardCost
1.8663
1.8663
1.4923
1.8663
44.9506
(5 row(s) affected)
While the execution plan does not write out the translation of this query, it executes the same plan for both of these queries. Again, this means that the IN operator does not cause any performance gains or losses. It is simply an easier way to write out our query.
Advanced Topics
I have intentionally not covered some of the advanced topics including grouping of results, temporary tables, parameters and other important T-SQL topics. I intend to cover these and more in my next article. SQL has a lot to offer developers. While it would be great if you could read a good SQL book or two completely in order to get a full picture of how to build SQL statements properly, I intent to give you the condensed version in these articles.
Conclusion
In this article, we have learned how to use the basic commands and operators in T-SQL statements. With this knowledge you can build simple commands to gather data from one or more tables in Microsoft SQL. Along the way we discussed some of the best practices that should be followed as well as a few gotchas that we should be aware of when building our queries. I hope you have found at least something that could be of benefit to you from this article. I appreciate constructive feedback and look forward to your thoughts on how this article could be approved.
History
- January 11, 2011: Initial version