Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Everything About SQL- Get Started

4.79/5 (13 votes)
16 Mar 2015CPOL19 min read 27.8K  
This post covers everything about SQL - getting started

Topics to be Covered

  • SQL Basics

    • Introduction
    • What is a database?
    • What does Relational mean?
    • A small example
    • SQL Statement
    • Naming Things
    • Creating Things
  • Everything about SELECT Statement

    • Introduction
    • FROM Clause
    • Limiting Result Set
    • WHERE Clause
    • WHERE Clause
    • AND OR
    • MORE Operators
    • BETWEEN
    • LIKE
    • IN
    • IS and ISNOT

SQL Basics

Introduction

Before I begin, this article is intended for each and every beginner, may it be a Software Developer or a Quality Analyst, this is a must know for every software engineer. Some might say we have Database administrator, then why should we, but I feel being a developer, one should have all round knowledge of what's going on and handle everything.

Here in this article, I will be explain each and every concept in detail to get started with SQL (Structured Query Language).

First of all, the question thinkzoo going on your mind now would be what is SQL?

As mentioned above, SQL stands for Structured Query Language. It is a special-purpose programming language. Its purpose is to manipulate relational databases. Relational databases are one which have detailed information about all the tables in a formally organized manner. The best part here is it contains both the data-definition syntax as well as data-manipulation syntax.

What is a Database?

Untitled

Now the diagram describes what it is. :D Database is simply a container that helps organize data in a format or constructive manner. In a database, data is stored in a rational way. Now, let's take an example that would make is help understand it better. For instance, our contact list in cell phones. If we think a bit into it, then we realize the contact list is a database with data in it in an organized manner. Then we sort by the letters search the names from the list. Thus, we perform query execution on the list to retrieve the records we want. This is the flexibility a simple database provides us. It makes it easier to:

  • Query Data
  • Update Data
  • Insert New Data
  • Delete Data

There are basically three types of databases available:

We will be discussing only on the Relational Database as that is commonly used. I have provided the links to the other types of databases for reference.

What is Relational?

The dictionary meaning to this is “Things that relate each other in a way”. Here, the meaning is keep its consistency. Relational model is the database model on which SQL is based. It is thus a way to describe data and the relationship between those data entities.

In a relational database, data is stored in a construct called Table. Now what is a Table?

  • A table is a collection of columns that define the table properties with specific datatype associated, with specific size, thus restricting the size of each record to be inserted into the table.
  • Every column can be made required or not required (using the check box available for Null values).
  • Rows can be retrieved by asking questions about the data. Like “You would need all the rows of data that have a column (surname) value as having value ‘Ray'”
  • An essential part of a table is “Key”. Every table must have a key associated with any column to uniquely identify a table. Usually, a table is associated with a key called Primary Key. (We will be discussing about keys in the upcoming modules).
  • If more than one table uses the same primary key (one table-Primary key and other named foreign key), then the tables can be merged or joined to give different set of records set.

A Small Example

Before starting with the, I hope the readers have a view of the SSMS (SQL Server Management Studio). Here is how the SSMS looks like:

ssms

Now let's ask questions to ourselves. Why do we at all need database with multiple tables, why not a single table! let's check with an example why?

Suppose we have a User table which would have columns named UserID, FName, LName, Email & PhNo. Now suppose we have only this table and there is a requirement in the future that a user may have more than one Phone number or more than one Email. Just think how we would approch this, as we cannot store multiple values into a column. Thus, does not this give us an indication to have another table! Just see the difference below:

table1

taberel

In the above figures, you can see the difference clearly, using the first table will cause real problem and a lot of redundant data/records in the table which anyway violates the Normalization rules (Normalization Rules). Thus, creating the second table which would be having a foreign key relationship with the User table, can now store as many phone numbers as a user can have.

SQL Statement

  • A SQL statement is framed from the combination of actionable set of valid words out of which some are SSMS defined and others are user defined.
  • A valid SQL statement would be if a statement/query ends with a semicolon (;), it is not like that it's mandatory to add a semicolon at the end. But it is a part of the ANSI standard, so it is advisable to use the semicolon at the end of the statement.
  • SQL is not case-sensitive. Usually, the queries/statements are written in Caps(Capitals). Sometimes the user-defined functions are in lower case. But obviously, the standard needs to be followed for better maintenance and to maintain consistency.
  • In SQL statement, we can put comments. For single line comments, “” is used as a predecessor for the query as single line comment where as “/**/”, is used for the query as multiple line comments.
  • SQL statements start with commands. Comparing with the English language, the commands here are similar to the verbs. For example, SELECT(Command/keyword in SQL)
    In generic terms, SELECT would go like below:
    SQL
    SELECT VALUES FROM TABLE NAME;

    After the command comes the part that is highly dependent on the commands that is to define the set of result/records we would expect from the query like a list of all values of the columns (generally, the wildcard symbol “*” asterix is used to select all columns in table). Then is the FROM clause which is used to fetch records / column values from which table (as it explains itself).

Naming Things

This might be a bit controversial, as there is nothing called the best way to do. So we follow what is followed the most, or follow the conventions already being used. This would differ from person to person. But the most important factor to keep in mind is to have consistency through out. So, that when we see the database, the names of the tables define themselves. Some of the rules I am listing below:

  • Table name should be singular. A table name describes what the values would be in a row. It is not a hard bound rule, but a rule. :)
  • Column names in a table should not repeat themselves and also the same applies at the database level. The repetition is only allowed when there is a relationship between tables.
  • Names are “scoped” in SQL. What this means is a database would have a name. Tables inside database should have a name. Thus while accessing the table, it should be accessed with its “full” name which would include the database name, separated by periods.
    Like TableName:
    DatabaseName.Table<br />
    	ColumnName
    : Table.Column

Creating Things

There is a whole set of SQL commands which are required to create and modify the structures/records in the database.

SQL
CREATE DATABASE DemoDatabase;
USE DemoDatabase;
CREATE Table DemoDatabase.DemoTable(…);

Here, I have only shown the structures, we will be discussing more of that ahead.
Datatypes are very vital when we will be creating the tables in a database. Each column in a database has restriction on the datatype of the data that can be stored in the table.
Let's have a look at the datatypes:

Data Type Value/Space
CHARACTER Can hold ‘N’ number of characters which are specified implicitly, to be exact statically that is the number of characters to be accepted in the value is set at the time of column creation
VARYING CHARACTER Can hold ‘N’ characters which is set dynamically, that is if MAX is set, then it would accept any number of characters sent from the server into the column.
BINARY Stores Hexadecimal data, fixed length with a length of n bytes, where n varies from 1 through 8,000. Used for images and other sort of blobs.
VAR BINARY As the name suggests, it accepts value of varying length. When size is not specified, then default value is taken as 1.
SMALLINT Its length is 2 bytes and ranges from -2^15 (-32,768) to 2^15-1 (32,767)
INT Its length is 4 bytes and ranges from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
BIGINT Its length is 8 bytes and ranges from -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
TINYINT Its length is 1 byte and ranges from 0-255. Can be used for values with Yes/No or the enumeration types which are limited to fewer values
BOOLEAN Stores TRUE FALSE values. When NOT NULLABLE, then default value is False, else it also accepts NULL values.
DATE Stores only Date values, only YEAR MONTH & DAY values in the format YYYY–MM–DD
TIME Stores only time values, only HOUR MINUTE & SECONDS values of a day in the format HH–MM–SS
TIMESTAMP Stores both date & time. This is when date is combined with time of a day with the seconds. Format is just concatenating both date format and time format

These are some of the datatypes, I have listed out. The important thing here to keep in mind is, while creating the table, we never actually keep in mind. Thus the datatype plays a very important role if we consider the future. Suppose we use the number of users or the userid in a table datatype as smallint or only int even, then sometimes in future may be the number of users increases to a large extent and overlaps the range of the datatypes. Then here the problem arises. You might have heard about the youtube crash for the Gangnam song. The reason behind it was this ... the issue with the datatype for the number of views. It increased to such an extent that it crashed.

Relational Database Management System is based on the relational model. In simple terms, it is a type of database management which stores data in related tables.

Everything About SELECT Statement

Introduction

We have already discussed about the SSMS where we will be demonstrating our queries. After you read through this part of the article, you would be able to understand the details of using the SELECT statement, asking different types of questions to the select query.

First of all, we store the data in a database to reuse it at some later point of time. Querying data is all about asking questions while retrieving data. Understanding the combination of English and the SQL helps us postulate the queries very easily.

Let's create a database first. Look below at the query.

CREATE DATABASE QUERY

SQL
CREATE DATABASE DemoDatabase;

The above query using the master creates the database named “DemoDatabase”. You can also directly create database using the SSMS UI. See the below images:

db1 db2

When in the first image as you see, the New Database is clicked, another window pops up as shown in the second image. Add the database name there and it creates the database for you with a maxsize of unlimited.

Then we create a Table for the storage of records.

CREATING A TABLE

SQL
USE DemoDatabase
CREATE TABLE [dbo].[User](
	[UserId] [int] IDENTITY(1,1) NOT NULL,
	[UserName] [nvarchar](50) NULL,
	[Password] [nvarchar](50) NULL,
	[UserEmail] [nvarchar](50) NULL,
 CONSTRAINT [PK_User] PRIMARY KEY CLUSTERED 
(
	[UserId] 
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, _
       ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Now the above query creates a table for you, but we need to understand the commands used in the query as some seem understandable whereas some are not.

  • PAD_INDEX: Specifies the index padding. The default is OFF. More information.
  • STATISTICS_NORECOMPUTE: Creation & Recreation of index & auto update of the columns statistics. More info.
  • ALLOW_ROW_LOCKS: Enhances performances by locking the entire table rather than locking individual row. By default, SQL sets to ON. More info.
  • ALLOW_PAGE_LOCKS: Set to TRUE/ON when there are chances of high concurrency in order to improve performance. By default, SQL sets to ON. More info.

Thus, we have created a database which now has a table named User. Now for your information, we can also create table using the SSMS UI as we have done for the database.

We open up the database tree structure and find Tables as a child tree under it.

We right click on that and select New Table, we get a different UI in the query window now asking for the column names and the datatypes also to set the Nullable/NonNullable with a check box, then lets view the images:

Add the Primary key column and set it as Primary Key like below:

setprimary

Then we need to set the column as an identity column which will be auto-incremented by 1 everytime a new row is added. The developer need not bother about incrementing or setting the primary key. Set it as below:

setprimary2

Then, we add all the columns with the datatypes mentioned above and save the table with a proper table name. Now, here the checkbox for the Nullable column plays a vital role. Think and specify the value here. Finally, the UI looks like below:

setprimary3

Thus, now we have created a table also. Now it is time to add records/rows into it. We will be writing queries to insert the values into the table. Let's see how:

INSERT ROWS INTO TABLE

SQL
INSERT INTO dbo.[User] (UserName,Password,UserEmail)
VALUES ('Suraj','abc123','abc@abc.com');

Here, as you can see, we have added only the three column values as the primary key userId is auto-incremented, i.e., an Identity column. Thus, this adds the values. For a faster approach, usually developers use the SSMS UI here also to add the records.

We right click on the specific table into which we will be adding records, then click on Edit Top 200 (default value) items.

Then, we see in a new query window the table columns with the rows of records already added and a default empty row where we will be adding the values like below:

edittable

Here, we add the values in the row containing NULL values. The Primary key column is non editable as it will increment itself. The other columns are populated with the values to set the table records.

Now let's get back to our topic of discussion Select:

1. Select All Records From the Table
SQL
SELECT * FROM dbo.[User]

This query is very simple and easy to understand. Here, the FROM clause is used which is easy when we retrieve data from a single table. As mentioned above ‘*’ relates to all columns in a table. Thus, the above query returns all the records of all the columns.

Another point here is if we select some columns out of all columns, then we should/must qualify the column names in the Select list of the query. We can bring in the concept of ‘aliasing’ in our next query which selects some columns in the table.

2. Select Fewer Columns from the Table
SQL
SELECT users.UserName as Name, users.Password as PWD FROM dbo.[User] as users

The above query returns only the User name column as Name and the Password column as PWD in the result set. This is the power of alias, it changes the column head names. Below is the sample result set:

result

Similarly, you can use shorter Table alias names and use it throughout your query for convenience and select as many and the specific columns you need from the table.

There are various ways for limiting the result sets we get. By default, all the rows that match the From clause query are retrieved in the result set. There are two different ways to limit the result set of the Select statement.

One is to add conditions to the Query after From clause.

Another is to add the DISTINCT qualifier to the Select statement. Distinct is a result set qualifier. Also TOP (number) can be used to limit the result set.

We will see the second type here and move to the condition type consequently.

3. Select Distinct Rows
SQL
SELECT DISTINCT users.UserName as Name FROM dbo.[User] as users;

This query returns all the rows that are distinct based on the names values.

4. Select Some Rows from TOP
SQL
SELECT TOP 500 users.UserName as Name FROM dbo.[User] as users;

This would return only the top 500 rows no matter how many in total we get as result set.

Next, we discuss about the conditions type:

WHERE Clause

This is used after the FROM clause and can be thought of a search from the records using the conditions based on which we would be doing the result search. Searching with lot more precision. The body of the where clause will be one or more expressions with return value as True/False. If the response is true based on the conditions for a specific row coming under the check, then the row gets added to the result set which is retrieved finally from the query.

A simple example here would be Get the list of records/users whose name is ‘Suraj’.

5. Select Users with UserName as ‘Suraj'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName = 'Suraj' ;

This selects all the users with the username as Suraj, usually this is made as Firstname as Suraj. Thus the result set would look like below:

where

To master the WHERE clause in SQL, we need to master how all the operators available behave. Let's have a look at the operators available with the WHERE clause in SQL and how they would behave:

Datatype Behaviour
Equals ‘=’ Returns true if both the conditions on left and right hand side matches
Not Equals to ‘<>’ Returns true if both the conditions on left and right hand side do not match
Greater than ‘>’ Returns true if the value on the left hand side is larger than the right hand side value
Less than ‘<‘ Returns true if the value on the left hand side is smaller than the right hand side value
Greater than or equal to ‘>=’ Returns true if the value on the left hand side is larger than or same as the right hand side value
Less than or equal to ‘<=’ Returns true if the value on the left hand side is smaller than or same as the right hand side value

Now let's have a look at the queries and how they behave when operators are applied. Here, I will be applying operators on the string values.

6. Select the users with UserName greater than ‘Suraj'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName > 'Suraj' ;

The result set here would return the name starting with alphabets more than ‘S’, but remember the check is for each letter until the greater than matches. Result set:

where2

Thus the name starting with ‘T’is displayed in the result.

Similarly, less than and the other operators would work out.

7. Select the users with UserName less than ‘Suraj'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName < 'Suraj' ;

where3

8. Select the users with UserName Not Equal to ‘Suraj’
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName <> 'Suraj' ;

where4

Then let's peep into the AND OR operators. As we all know, each boolean expression can be combined with other boolean expressions. These are combined by using the AND OR operator.

AND means that both the boolean expressions combined by using AND, both must evaluate to True for the row to satisfy to be a part of the result set. Like for example, list out the users who have first name as Suraj and email as a specific value. Here, as per the simple English language, and is to be used here to evaluate and produce the result set.

9. Select the users with UserName as ‘Suraj’ & UserEmail as ‘test@test.com'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName ='Suraj' AND users.UserEmail = 'test@test.com';

The result set goes as below:

result2

Thus here in the result set, we can check that both the expressions on the sides of the AND operator turns out to be true for both the rows in the result set.

OR keyword means that any of the expressions on the either side of OR operator can be true, then the row becomes a part of the result set. That is when any of the expression satisfies then the row is a part. Just keep in mind if the first expression turns out to be true, then the second expression is not evaluated. If first expression turns to be false, then the second expression is evaluated to be checked.

10. Select the users with UserName as ‘Suraj’ or UserEmail as ‘xyz@xyz.com'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName ='Suraj' OR users.UserEmail = 'xyz@xyz.com';

The result set is as below:

result3

As we see first when the first expression, i.e., UserName is Suraj is a part of the result set and also the rows with UserEmail as ‘xyz@xyz.com’ .

BETWEEN operator acts on any column and taking two values checking the range. If a row value lies in the range specified in the BETWEEN condition, then the expression evaluates to True.

11. Select the users with Age Lying Between Specific Value
SQL
SELECT * FROM dbo.[User] as users
WHERE users.Age BETWEEN 25 AND 50 ;

The result set is as below:

result4

Thus, the result set includes only those rows where the Age lies between 25 and 50, others are neglected. Between in inclusive of the values being checked.

LIKE operator is a special one which is used with strings. When we give LIKE operator to the strings, then it looks and searches for those values of the string which match the pattern specified in the LIKE condition. For LIKE ‘%’ is used which is otherwise called wild card symbol. The % can be used anywhere in the string. Like for example, who all are the users whose names start with ‘S’.

12. Select the users with the UserName starting with letter ‘S'
SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName LIKE 'S%' ;

result5

This gives the result with all the usernames starting with S. Thus we write Like ‘S%’ wild card symbol specified after the letter means starting with. Specifying wild card symbol at the beginning like ‘%S’ means all usernames ending with S.

SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName LIKE '%C' ;

result6

Thus this is the result set with the records where the column value UserName contains ‘C’.

IN operator is another special operator which requires column and a list of specified/valued values. Values can be of any datatype. If a row column value matches any of the specified values in the list used for the IN operator, then the row is added to the result set.

SQL
SELECT * FROM dbo.[User] as users
WHERE users.UserName IN ('Suraj' , 'test') ;

This gives the result set having all Users with user name Suraj and test. Thus, the IN operator seems to overlap with the BETWEEN operator, but they are different in some scenarios.

IS operator is less used and a special keyword, which is used to check for NULL values. An interesting thing to keep in mind is that the NULL values are not assignable to the ‘=’ operator & its complement is ISNOT.

Conclusion

Thus, here I have tried to specify the basics in SQL. As I mentioned, I have tried here, so any queries, suggestions & concerns are most welcome. Let us please discuss and clear out doubts and keep sharing.

References

  • MSDN
  • Introduction to SQL by John Flanders
  • SQL authority by Pinal Dave
  • W3 schools

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)