Topics to be Covered
-
SQL Basics
- Introduction
- What is a database?
- What does Relational mean?
- A small example
- SQL Statement
- Naming Things
- Creating Things
-
Everything about SELECT Statement
- Introduction
FROM
Clause - Limiting Result Set
WHERE
Clause WHERE
Clause AND OR
MORE
Operators BETWEEN
LIKE
IN
IS
and ISNOT
SQL Basics
Introduction
Before I begin, this article is intended for each and every beginner, may it be a Software Developer or a Quality Analyst, this is a must know for every software engineer. Some might say we have Database administrator, then why should we, but I feel being a developer, one should have all round knowledge of what's going on and handle everything.
Here in this article, I will be explain each and every concept in detail to get started with SQL (Structured Query Language).
First of all, the question going on your mind now would be what is SQL?
As mentioned above, SQL stands for Structured Query Language. It is a special-purpose programming language. Its purpose is to manipulate relational databases. Relational databases are one which have detailed information about all the tables in a formally organized manner. The best part here is it contains both the data-definition syntax as well as data-manipulation syntax.
What is a Database?
Now the diagram describes what it is. Database is simply a container that helps organize data in a format or constructive manner. In a database, data is stored in a rational way. Now, let's take an example that would make is help understand it better. For instance, our contact list in cell phones. If we think a bit into it, then we realize the contact list is a database with data in it in an organized manner. Then we sort by the letters search the names from the list. Thus, we perform query execution on the list to retrieve the records we want. This is the flexibility a simple database provides us. It makes it easier to:
- Query Data
- Update Data
- Insert New Data
- Delete Data
There are basically three types of databases available:
We will be discussing only on the Relational Database as that is commonly used. I have provided the links to the other types of databases for reference.
What is Relational?
The dictionary meaning to this is “Things that relate each other in a way”. Here, the meaning is keep its consistency. Relational model is the database model on which SQL is based. It is thus a way to describe data and the relationship between those data entities.
In a relational database, data is stored in a construct called Table
. Now what is a Table
?
- A table is a collection of columns that define the table properties with specific datatype associated, with specific size, thus restricting the size of each record to be inserted into the table.
- Every column can be made required or not required (using the check box available for
Null
values). - Rows can be retrieved by asking questions about the data. Like “You would need all the rows of data that have a column (surname) value as having value ‘
Ray
'” - An essential part of a table is “
Key
”. Every table must have a key associated with any column to uniquely identify a table. Usually, a table is associated with a key called Primary Key. (We will be discussing about keys in the upcoming modules). - If more than one table uses the same primary key (one table-Primary key and other named foreign key), then the tables can be merged or joined to give different set of records set.
A Small Example
Before starting with the, I hope the readers have a view of the SSMS (SQL Server Management Studio). Here is how the SSMS looks like:
Now let's ask questions to ourselves. Why do we at all need database with multiple tables, why not a single table! let's check with an example why?
Suppose we have a User
table which would have columns named UserID
, FName
, LName
, Email
& PhNo
. Now suppose we have only this table and there is a requirement in the future that a user may have more than one Phone number or more than one Email. Just think how we would approch this, as we cannot store multiple values into a column. Thus, does not this give us an indication to have another table
! Just see the difference below:
In the above figures, you can see the difference clearly, using the first table will cause real problem and a lot of redundant data/records in the table which anyway violates the Normalization rules (Normalization Rules). Thus, creating the second table which would be having a foreign key relationship with the User
table, can now store as many phone numbers as a user can have.
SQL Statement
- A SQL statement is framed from the combination of actionable set of valid words out of which some are SSMS defined and others are user defined.
- A valid SQL statement would be if a statement/query ends with a semicolon (
;
), it is not like that it's mandatory to add a semicolon at the end. But it is a part of the ANSI standard, so it is advisable to use the semicolon at the end of the statement. - SQL is not case-sensitive. Usually, the queries/statements are written in Caps(Capitals). Sometimes the user-defined functions are in lower case. But obviously, the standard needs to be followed for better maintenance and to maintain consistency.
- In SQL statement, we can put comments. For single line comments, “
–
” is used as a predecessor for the query as single line comment where as “/**/
”, is used for the query as multiple line comments. - SQL statements start with commands. Comparing with the English language, the commands here are similar to the verbs. For example,
SELECT
(Command/keyword in SQL)
In generic terms, SELECT
would go like below:
SELECT VALUES FROM TABLE NAME;
After the command comes the part that is highly dependent on the commands that is to define the set of result/records we would expect from the query like a list of all values of the columns (generally, the wildcard symbol “*
” asterix is used to select all columns in table). Then is the FROM
clause which is used to fetch records / column values from which table
(as it explains itself).
Naming Things
This might be a bit controversial, as there is nothing called the best way to do. So we follow what is followed the most, or follow the conventions already being used. This would differ from person to person. But the most important factor to keep in mind is to have consistency through out. So, that when we see the database, the names of the tables define themselves. Some of the rules I am listing below:
Creating Things
There is a whole set of SQL commands which are required to create and modify the structures/records in the database.
CREATE DATABASE DemoDatabase;
USE DemoDatabase;
CREATE Table DemoDatabase.DemoTable(…);
Here, I have only shown the structures, we will be discussing more of that ahead.
Datatypes are very vital when we will be creating the tables in a database. Each column in a database has restriction on the datatype of the data that can be stored in the table.
Let's have a look at the datatypes:
Data Type | Value/Space |
CHARACTER | Can hold ‘N’ number of characters which are specified implicitly, to be exact statically that is the number of characters to be accepted in the value is set at the time of column creation |
VARYING CHARACTER | Can hold ‘N’ characters which is set dynamically, that is if MAX is set, then it would accept any number of characters sent from the server into the column. |
BINARY | Stores Hexadecimal data, fixed length with a length of n bytes, where n varies from 1 through 8,000. Used for images and other sort of blobs. |
VAR BINARY | As the name suggests, it accepts value of varying length. When size is not specified, then default value is taken as 1. |
SMALLINT | Its length is 2 bytes and ranges from -2^15 (-32,768) to 2^15-1 (32,767) |
INT | Its length is 4 bytes and ranges from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647) |
BIGINT | Its length is 8 bytes and ranges from -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) |
TINYINT | Its length is 1 byte and ranges from 0-255. Can be used for values with Yes/No or the enumeration types which are limited to fewer values |
BOOLEAN | Stores TRUE FALSE values. When NOT NULLABLE , then default value is False , else it also accepts NULL values. |
DATE | Stores only Date values, only YEAR MONTH & DAY values in the format YYYY–MM–DD |
TIME | Stores only time values, only HOUR MINUTE & SECONDS values of a day in the format HH–MM–SS |
TIMESTAMP | Stores both date & time. This is when date is combined with time of a day with the seconds. Format is just concatenating both date format and time format |
These are some of the datatypes, I have listed out. The important thing here to keep in mind is, while creating the table, we never actually keep in mind. Thus the datatype
plays a very important role if we consider the future. Suppose we use the number of users or the userid in a table datatype as smallint
or only int
even, then sometimes in future may be the number of users increases to a large extent and overlaps the range of the datatypes. Then here the problem arises. You might have heard about the youtube crash for the Gangnam song. The reason behind it was this ... the issue with the datatype for the number of views. It increased to such an extent that it crashed.
Relational Database Management System is based on the relational model. In simple terms, it is a type of database management which stores data in related tables.
Everything About SELECT Statement
Introduction
We have already discussed about the SSMS where we will be demonstrating our queries. After you read through this part of the article, you would be able to understand the details of using the SELECT
statement, asking different types of questions to the select
query.
First of all, we store the data in a database to reuse it at some later point of time. Querying data is all about asking questions while retrieving data. Understanding the combination of English and the SQL helps us postulate the queries very easily.
Let's create a database first. Look below at the query.
CREATE DATABASE QUERY
CREATE DATABASE DemoDatabase;
The above query using the master creates the database named “DemoDatabase
”. You can also directly create database using the SSMS UI. See the below images:
When in the first image as you see, the New Database is clicked, another window pops up as shown in the second image. Add the database name there and it creates the database for you with a maxsize of unlimited.
Then we create a Table for the storage of records.
CREATING A TABLE
USE DemoDatabase
CREATE TABLE [dbo].[User](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[UserName] [nvarchar](50) NULL,
[Password] [nvarchar](50) NULL,
[UserEmail] [nvarchar](50) NULL,
CONSTRAINT [PK_User] PRIMARY KEY CLUSTERED
(
[UserId]
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, _
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Now the above query creates a table for you, but we need to understand the commands used in the query as some seem understandable whereas some are not.
PAD_INDEX
: Specifies the index padding. The default is OFF
. More information. STATISTICS_NORECOMPUTE
: Creation & Recreation of index & auto update of the columns statistics. More info. ALLOW_ROW_LOCKS
: Enhances performances by locking the entire table rather than locking individual row. By default, SQL sets to ON
. More info. ALLOW_PAGE_LOCKS
: Set to TRUE
/ON
when there are chances of high concurrency in order to improve performance. By default, SQL sets to ON
. More info.
Thus, we have created a database which now has a table named User
. Now for your information, we can also create table using the SSMS UI as we have done for the database.
We open up the database tree structure and find Tables as a child tree under it.
We right click on that and select New Table
, we get a different UI in the query window now asking for the column names and the datatypes also to set the Nullable
/NonNullable
with a check box, then lets view the images:
Add the Primary key column and set it as Primary Key like below:
Then we need to set the column as an identity column which will be auto-incremented by 1 everytime a new row is added. The developer need not bother about incrementing or setting the primary key. Set it as below:
Then, we add all the columns with the datatypes mentioned above and save the table with a proper table name. Now, here the checkbox for the Nullable column plays a vital role. Think and specify the value here. Finally, the UI looks like below:
Thus, now we have created a table also. Now it is time to add records/rows into it. We will be writing queries to insert the values into the table. Let's see how:
INSERT ROWS INTO TABLE
INSERT INTO dbo.[User] (UserName,Password,UserEmail)
VALUES ('Suraj','abc123','abc@abc.com');
Here, as you can see, we have added only the three column values as the primary key userId
is auto-incremented, i.e., an Identity column. Thus, this adds the values. For a faster approach, usually developers use the SSMS UI here also to add the records.
We right click on the specific table into which we will be adding records, then click on Edit Top 200 (default value) items.
Then, we see in a new query window the table columns with the rows of records already added and a default empty row where we will be adding the values like below:
Here, we add the values in the row containing NULL
values. The Primary key column is non editable as it will increment itself. The other columns are populated with the values to set the table records.
Now let's get back to our topic of discussion Select
:
1. Select All Records From the Table
SELECT * FROM dbo.[User]
This query is very simple and easy to understand. Here, the FROM
clause is used which is easy when we retrieve data from a single table. As mentioned above ‘*
’ relates to all columns in a table. Thus, the above query returns all the records of all the columns.
Another point here is if we select some columns out of all columns, then we should/must qualify the column names in the Select
list of the query. We can bring in the concept of ‘aliasing’ in our next query which selects some columns in the table.
2. Select Fewer Columns from the Table
SELECT users.UserName as Name, users.Password as PWD FROM dbo.[User] as users
The above query returns only the User name column as Name
and the Password column as PWD
in the result set. This is the power of alias, it changes the column head names. Below is the sample result set:
Similarly, you can use shorter Table alias names and use it throughout your query for convenience and select as many and the specific columns you need from the table.
There are various ways for limiting the result sets we get. By default, all the rows that match the From
clause query are retrieved in the result set. There are two different ways to limit the result set of the Select
statement.
One is to add conditions to the Query after From
clause.
Another is to add the DISTINCT
qualifier to the Select
statement. Distinct
is a result set qualifier. Also TOP
(number) can be used to limit the result set.
We will see the second type here and move to the condition type consequently.
3. Select Distinct Rows
SELECT DISTINCT users.UserName as Name FROM dbo.[User] as users;
This query returns all the rows that are distinct based on the names values.
4. Select Some Rows from TOP
SELECT TOP 500 users.UserName as Name FROM dbo.[User] as users;
This would return only the top 500 rows no matter how many in total we get as result set.
Next, we discuss about the conditions type:
WHERE Clause
This is used after the FROM
clause and can be thought of a search from the records using the conditions based on which we would be doing the result search. Searching with lot more precision. The body of the where
clause will be one or more expressions with return value as True
/False
. If the response is true
based on the conditions for a specific row coming under the check, then the row gets added to the result set which is retrieved finally from the query.
A simple example here would be Get
the list of records/users whose name is ‘Suraj
’.
5. Select Users with UserName as ‘Suraj'
SELECT * FROM dbo.[User] as users
WHERE users.UserName = 'Suraj' ;
This selects all the users with the username
as Suraj
, usually this is made as Firstname
as Suraj
. Thus the result set would look like below:
To master the WHERE
clause in SQL, we need to master how all the operators available behave. Let's have a look at the operators available with the WHERE
clause in SQL and how they would behave:
Datatype | Behaviour |
Equals ‘=’ | Returns true if both the conditions on left and right hand side matches |
Not Equals to ‘<>’ | Returns true if both the conditions on left and right hand side do not match |
Greater than ‘>’ | Returns true if the value on the left hand side is larger than the right hand side value |
Less than ‘<‘ | Returns true if the value on the left hand side is smaller than the right hand side value |
Greater than or equal to ‘>=’ | Returns true if the value on the left hand side is larger than or same as the right hand side value |
Less than or equal to ‘<=’ | Returns true if the value on the left hand side is smaller than or same as the right hand side value |
Now let's have a look at the queries and how they behave when operators are applied. Here, I will be applying operators on the string
values.
6. Select the users with UserName greater than ‘Suraj'
SELECT * FROM dbo.[User] as users
WHERE users.UserName > 'Suraj' ;
The result set here would return the name starting with alphabets more than ‘S’, but remember the check is for each letter until the greater than matches. Result set:
Thus the name starting with ‘T
’is displayed in the result.
Similarly, less than and the other operators would work out.
7. Select the users with UserName less than ‘Suraj'
SELECT * FROM dbo.[User] as users
WHERE users.UserName < 'Suraj' ;
8. Select the users with UserName Not Equal to ‘Suraj’
SELECT * FROM dbo.[User] as users
WHERE users.UserName <> 'Suraj' ;
Then let's peep into the AND OR
operators. As we all know, each boolean expression can be combined with other boolean expressions. These are combined by using the AND OR
operator.
AND
means that both the boolean expressions combined by using AND
, both must evaluate to True
for the row to satisfy to be a part of the result set. Like for example, list out the users who have first name as Suraj
and email as a specific value. Here, as per the simple English language, and
is to be used here to evaluate and produce the result set.
9. Select the users with UserName as ‘Suraj’ & UserEmail as ‘test@test.com'
SELECT * FROM dbo.[User] as users
WHERE users.UserName ='Suraj' AND users.UserEmail = 'test@test.com';
The result set goes as below:
Thus here in the result set, we can check that both the expressions on the sides of the AND
operator turns out to be true
for both the rows in the result set.
OR
keyword means that any of the expressions on the either side of OR
operator can be true
, then the row becomes a part of the result set. That is when any of the expression satisfies then the row is a part. Just keep in mind if the first expression turns out to be true
, then the second expression is not evaluated. If first expression turns to be false
, then the second expression is evaluated to be checked.
10. Select the users with UserName as ‘Suraj’ or UserEmail as ‘xyz@xyz.com'
SELECT * FROM dbo.[User] as users
WHERE users.UserName ='Suraj' OR users.UserEmail = 'xyz@xyz.com';
The result set is as below:
As we see first when the first expression, i.e., UserName
is Suraj
is a part of the result set and also the rows with UserEmail
as ‘xyz@xyz.com
’ .
BETWEEN
operator acts on any column and taking two values checking the range. If a row value lies in the range specified in the BETWEEN
condition, then the expression evaluates to True
.
11. Select the users with Age Lying Between Specific Value
SELECT * FROM dbo.[User] as users
WHERE users.Age BETWEEN 25 AND 50 ;
The result set is as below:
Thus, the result set includes only those rows where the Age
lies between 25
and 50
, others are neglected. Between in inclusive of the values being checked.
LIKE
operator is a special one which is used with string
s. When we give LIKE
operator to the string
s, then it looks and searches for those values of the string
which match the pattern specified in the LIKE
condition. For LIKE ‘%’
is used which is otherwise called wild card symbol. The %
can be used anywhere in the string
. Like for example, who all are the users whose names start with ‘S
’.
12. Select the users with the UserName starting with letter ‘S'
SELECT * FROM dbo.[User] as users
WHERE users.UserName LIKE 'S%' ;
This gives the result with all the username
s starting with S
. Thus we write Like ‘S%’
wild card symbol specified after the letter means starting with. Specifying wild card symbol at the beginning like ‘%S
’ means all usernames ending with S
.
SELECT * FROM dbo.[User] as users
WHERE users.UserName LIKE '%C' ;
Thus this is the result set with the records where the column value UserName
contains ‘C
’.
IN
operator is another special operator which requires column and a list of specified/valued values. Values can be of any datatype. If a row column value matches any of the specified values in the list used for the IN
operator, then the row is added to the result set.
SELECT * FROM dbo.[User] as users
WHERE users.UserName IN ('Suraj' , 'test') ;
This gives the result set having all Users
with user name Suraj
and test. Thus, the IN
operator seems to overlap with the BETWEEN
operator, but they are different in some scenarios.
IS
operator is less used and a special keyword, which is used to check for NULL
values. An interesting thing to keep in mind is that the NULL
values are not assignable to the ‘=
’ operator & its complement is ISNOT
.
Conclusion
Thus, here I have tried to specify the basics in SQL. As I mentioned, I have tried here, so any queries, suggestions & concerns are most welcome. Let us please discuss and clear out doubts and keep sharing.
References
- MSDN
- Introduction to SQL by John Flanders
- SQL authority by Pinal Dave
- W3 schools