Introduction
SQL Server Views can be a great helpful tool for database developers, allowing us to encapsulate many complicated queries into one single query and help other developers in working with our database more easily. But if we just create a regular view, its mission is just help to make the use of a complicated SELECT
query become faster and more readable and understandable. There is no performance improvement applied on that kind of view, as the view becomes a shortcut to retrieve data. Fortunately, now we can create an indexed view, which will help us to increase query performance, and save the cost. There are many benefits if we understand it well and manipulate it in the right way. But there is also no free-lunch, so what should we be concerned about? Where and when to use indexed views?
In this very basic article, I'm going to show you what is an indexed view and how to use it to improve query performance in SQL Server.
What Is a View?
View is an object in SQL Server that allows us to group a complex and complicated SELECT
clause, with many sub clauses, like JOIN
operator or aggregate function like SUM COUNT AVG
,... into a single query. For a very common database for managing customers invoices, assuming we have a query with many JOIN
clauses to list all order detail from customer
s who live in Danang
city.
SELECT a.Id, a.FirstName, a.LastName, d.ProductId, d.ProductName
FROM Customers a JOIN Orders b
ON a.CustomerId = b.CustomerId
JOIN OrderDetails c
ON b.OrderId = c.OrderId
JOIN Products d
ON c.ProductId = d.ProductId
WHERE a.City = 'Danang'
So wherever we want a list of order details from customers who live in Danang city, we have to run the query above. It's sometime not a short but very very long and complex query, and if the database engineer gives the complex query to a backend developer, he would get confused. In this case, the database engineer will create a view name vOrderDetail_DanangCity
as below:
CREATE VIEW dbo.vOrderDetail_DanangCity AS
SELECT a.Id, a.FirstName, a.LastName, d.ProductId, d.ProductName
FROM Customers a JOIN Orders b
ON a.CustomerId = b.CustomerId
JOIN OrderDetails c
ON b.OrderId = c.OrderId
JOIN Products d
ON c.ProductId = d.ProductId
WHERE a.City = 'Danang'
So now the database engineer provides a view to backend developer, named vOrderDetail_DanangCity
. The developer only needs to know what the view does, and doesn't care about how it does it. Anywhere he needs to use the view, he just:
SELECT * FROM dbo.vOrderDetail_DanangCity
Cool, now he can get the result using only 1-line query. But what happens inside the view? In this case, creating a view is just a shortcut, and when the view is called, the Database Management System (DBMS) still has to re-run the query above to get the result. It means many join
clauses, sub clauses, and aggregate functions... would be executed again.
Indexed Views
Indexed Views (or Materialized Views in some books), is a view that has unique, clustered index on it. Let's create an indexed view to understand it clearly.
Assuming I have a simple database with the following relationship:
One student
can join many course
s. One course
can have many student
s joined. A student
joining a course will have an examination and have a score. In the database, there are 2000 rows in Course
s, 5000 rows in Student
s, and 10,000,000 rows in Student_Course
. You can download this demo database backup file (bak) at the end of this article.
Assuming we have to list all courses that have at least 1 student
joined, and the average score of that course
.
SELECT crs.CourseName, AVG(std_crs.Score) AS Average
FROM dbo.Courses crs JOIN Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName
It takes 4 seconds to execute this query. We will use this query to create a view:
CREATE VIEW dbo.AverageScrores
WITH SCHEMABINDING
AS
SELECT crs.CourseName, SUM(std_crs.Score) AS TotalScore, COUNT_BIG(*) AS NumberOfRecords
FROM dbo.Courses crs JOIN dbo.Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName
Note that we will use SUM
and COUNT_BIG(*)
for this case. Creating indexed views in SQL Server has some restrictions and requirements, and we have to deal with it, in this case by turning AVG
to SUM
and COUNT
, and in the future, we will calculate AVG = SUM / COUNT
. We will not discuss these requirements here.
The WITH SCHEMABINDING
is a mandatory requirement to create indexed view. It helps to protect the column appear in the view, you cannot alter or drop column from the base table. If you want, first you have to drop the view, alter or delete column, then re-create another view.
Finally, create a unique, clustered index on that view. We can create index on one or more columns.
CREATE UNIQUE CLUSTERED INDEX CIX_AverageScrores
ON dbo.AverageScrores(CourseName);
And now, we have an indexed view. When we create a Unique index on a view, we are "materializing" it.
- If the view doesn't have any unique index, it is just a shortcut. When the view is called, DBMS will re-run the query to get the result. It means DBMS will run many
join
clauses, many complicated aggregate functions,... all over again. - If the view has an unique index, it is "materialized". One copy of this view is created. This materialized view is used to store the result of the query we used to create the view. When the view is called, instead of re-running the query with many complex clauses, DBMS will read from the materialized view.
Let's run the query below:
SELECT CourseName, TotalScore / NumberOfRecords AS Average FROM dbo.AverageScrores WITH (NOEXPAND)
We are now using the indexed view, by pointing out that the query should use WITH (NOEXPAND)
. WITH (NOEXPAND)
will let the DBMS know that it is an indexed view and we want to use the result stored before. If you forget to add WITH (NOEXPAND)
, the DBMS would re-run the query.
The result appears after 00:00:00 seconds. Immediately.
Compare Indexed Views and Views
Run these 2 queries: The first query is the query we used to create the view. Second query is the indexed views.
SELECT crs.CourseName, SUM(std_crs.Score) / COUNT_BIG(*) AS Average
FROM dbo.Courses crs JOIN dbo.Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName
SELECT CourseName, TotalScore / NumberOfRecords AS Average FROM dbo.AverageScrores WITH (NOEXPAND)
We can see the result is the same, there is no difference.
But the time of execution is totally different. When we use indexed view, it's 00:00:00
. When we use the based query, it's 4 seconds. This difference is because indexed views WITH NOEXPAND
doesn't have to calculate the result again. It didn't join table and calculate aggregate function. It just reads the result stored in the view and shows it to us.
When to Use
Using indexed view seems to be a good choice. But is that a free-lunch?
No, it's not. In fact, it's an expensive lunch if we use them abusively. Whenever the based tables (the tables join in FROM
clause when we create the view) has changed, DBMS not only has to update the table, re-index the view but also re-calculate the value stored. In this case, when some student
joins any course
, the DBMS has to re-calculate SUM
and COUNT
of that course, which is stored in materialized view. It also means the more complex the query is, the more complicated the view maintenance takes.
View maintenance is a big problem. What is the most effective way to maintain the result stored fastest without re-running the original query? Because of view maintenance problem, there are many restrictions and requirements for creating indexed view, for example, query with OUTER JOIN
is not allowed.
So we know Indexed view will:
Make SELECT clause run faster.
Make INSERT, UPDATE, DELETE or some manipulating event become slower.
So use them wisely. Just apply indexed view on a query that we use very often, but the based tables are not updated frequently. In this case, the performance and effectiveness reach the maximum. Otherwise, be careful. You can make the system run slower than you expected if the view is not used very often, but the based tables have been updating time by time.
You can download the demo database at the link below (I use SQL Server 2014):
History
- 23rd March, 2019: Initial post