Background
I work for a small web company which over the last year or two has experienced a large amount of growth in both terms of staff as well as web site traffic. As part of that growth we have been forced to spend a large amount of time focusing on our database. And since we lack a DBA, that stewardship has fallen to me.
We now serve between 1 and 2 million page views daily and over the last year it has come to my attention that the biggest load on our database in terms of CPU and memory was not so much due poor performing queries (although there have been some) but rather poorly formatted queries.
I have noticed as I read articles and also as I look at the code run by commercial applications (more on this later) that ad hoc queries are used quite a bit. This is all despite the numerous warnings everywhere regarding SQL injection attacks as well as the performance benefits of using stored procedures.
For the purpose of this article I am defining (rather loosely) ad hoc queries as SQL commands built as strings by an SQL client application and submitted to SQL server. This in essence is rather encompassing and includes pretty much everything but stored procedure commands. Poorly written stored procedures can be a drain on CPU and memory for reasons other than poorly tuned SQL statements, but those reasons are beyond the scope of this article. I intend to focus on how failing to use parameters in SQL statements can be a huge drain on resources and can cripple even the most powerful hardware.
For clarification, the context of this article is that of a large number of users executing similar, single statements concurrently as opposed to a single batch of statments within a single session (connection).
SQL Server Internals
Like any programming language, every query submitted to SQL Server must be parsed and complied before it can be run. From a high level, this involves checking syntax as well as verifying the existence of objects used in the query. Once this is completed SQL Server now has a structure which it understands and can use to retrieve the data requested. For efficiency, SQL Server stores this structure in memory (Procedure Cache) for reuse. Now, the next time this query gets submitted, SQL Server can skip the parse and compile steps.
We can see this in action in the following code sample:
SET STATISTICS TIME ON
SET STATISTICS IO OFF
USE AdventureWorks
go
DBCC FREEPROCCACHE
go
SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Miller'
WARNING - The use of the command DBCC FREEPROCCACHE
is not recommended in production systems. Doing so may cause SQL server to perform poorly until the procedure cache can be rebuilt. It's purpose here in this article is to demonstrate how SQL server uses the procedure cache to store and reuse ad hoc query statements.
If you view the "Messages" pane in SQL Server Management Studio (SSMS) you will find the following:
SQL Server parse and compile time:
CPU time = 78 ms, elapsed time = 106 ms.
(1 row(s) affected)
Among other things, just before the rows affected message, you will see the parse and compile time. These times will vary and this is obviously not a complex query so don't expect to see anything significant. I am running SQL Server 2005 x64 on Windows XP x64 with an AMD Athlon 64 x2 4200+ with 2GB of RAM. If you run only the SELECT
statement again what you will most likely see is this:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 9 ms.
(1 row(s) affected)
The reason for the drop in CPU and elapsed time is due to the fact that the second time the statement was run (without clearing the procedure cache buffer) the statement was retrieved from cache. To see this in action create the following trace:
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 5
exec @rc = sp_trace_create @TraceID output,0,N'C:\ProcedureCache',
@maxfilesize,NULL
if (@rc != 0) goto error
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 38, 1, @on
exec sp_trace_setevent @TraceID, 38, 21, @on
exec sp_trace_setevent @TraceID, 38, 10, @on
exec sp_trace_setevent @TraceID, 38, 35, @on
exec sp_trace_setevent @TraceID, 38, 12, @on
exec sp_trace_setevent @TraceID, 35, 1, @on
exec sp_trace_setevent @TraceID, 35, 10, @on
exec sp_trace_setevent @TraceID, 35, 35, @on
exec sp_trace_setevent @TraceID, 35, 12, @on
declare @intfilter int
declare @bigintfilter bigint
exec sp_trace_setfilter @TraceID, 35, 1, 6, N'AdventureWorks'
exec sp_trace_setstatus @TraceID, 1
select TraceID=@TraceID
goto finish
error:
select ErrorCode=@rc
finish:
go
When you run the script, make sure to copy the TraceID result so you can stop the trace when you're done by using the following command:
exec sp_trace_setstatus {PUT TRACEID HERE}
, 0
After creating the trace, follow the first two steps again. Then open your trace file (C:\ProcedureCache.trc if you are using the defaults from this script) and you will see several trace events. Among them, you will see an SP:CacheInsert
event for the SELECT
statement with an entry further down for the SP:CacheHit
event for the same SELECT
statement. These events demonstrate exactly what I just described above.
Ad Hoc SQL
Ad Hoc SQL is very flexible and often more familiar to developers than writing stored procedures because it allows developers to simply construct a string (usually via contatenation) and submit it to the database. Ad hoc can make writing Query By Form (QBF) processes very simple as user input directs which tables, joins and parameters are needed to satisfy the request. The resulting SQL statement can now look like this:
SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, C.Phone, E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = 'Smith'
If you run this statement again in SSMS and look at the parse and compile time you'll see the statement was recompiled. Also, if you're still running a trace, you'll see the statement was added to the procedure cache instead of being reused. This shows that unless the query (including the values of the where clause) is exactly the same, SQL server must recompile the statement. On a small database, with little traffic this will mostly go unnoticed. But on a either a larger system or one with a significant amount of traffic your procedure cache will bloat. A side effect of this is that the data in your buffer cache will be pushed out resulting in more data being read directly from disk instead of from memory. This will cause a serious IO bottleneck and most likely cripple your system.
Several years ago we purchased an ASP.Net application for ad management. The database is less than 20mb in size, but because of the traffic (100 - 200 batches/sec) it was able to bring a 3GHz Xeon with 2gb of RAM to it's knees on a server without any other databases installed. When we investigated the issue we found that SQL server was using all 2gb of memory and the CPU was stuck at 100%.
Below is an example of how our query statement above might look from a .Net client:
string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
"Application Name=AdHocClient;Server=(local);" +
"Database=AdventureWorks;Trusted_Connection=Yes;"))
{
StringBuilder sql = new StringBuilder();
sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress,");
sql.Append("C.Phone, E.*");
sql.Append(" FROM Person.Contact C");
sql.Append(" INNER JOIN HumanResources.Employee E ");
sql.Append(" ON E.ContactID = C.ContactID");
sql.AppendFormat(" WHERE C.LastName = '{0}'", eName);
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql.ToString();
cmd.CommandType = CommandType.Text;
try
{
using(SqlDataAdapter da = new SqlDataAdapter(cmd))
{
da.Fill(data);
}
}
catch(SqlException ex)
{
}
}
The variable eName
represents a value retrieved from user input. Each time the process is run with different input the statement is compiled and cached. This causes the condition I just described eventually overwhelming the database server. So how do we resolve the issue, with the least amount of changes required?
Parameterized SQL
Stored procedures are the recommended method for database access from a client application - for security and performance reasons. But if you have already written your application and don't have the time to move all your data access to stored procedures the alternative is to parameterize your SQL statements. When you define parameters your statements are the same, no matter what the value being passed. So if we were to take the above process and rewrite it as follows:
string eName = "Smith";
DataSet data = new DataSet();
using(SqlConnection conn = new SqlConnection(
"Application Name=ParameterizedClient;Server=(local);" +
"Database=AdventureWorks;Trusted_Connection=Yes;"))
{
StringBuilder sql = new StringBuilder();
sql.Append("SELECT C.Title, C.FirstName, C.LastName, C.EmailAddress, ");
sql.Append("C.Phone, E.*");
sql.Append(" FROM Person.Contact C");
sql.Append(" INNER JOIN HumanResources.Employee E ");
sql.Append(" ON E.ContactID = C.ContactID");
sql.Append(" WHERE C.LastName = @LastName");
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandText = sql.ToString();
cmd.CommandType = CommandType.Text;
cmd.Parameters.Add("@LastName", SqlDbType.VarChar, 50);
cmd.Parameters["@LastName"].Value = eName;
try
{
using(SqlDataAdapter da = new SqlDataAdapter(cmd))
{
da.Fill(data);
}
}
catch(SqlException ex)
{
}
}
Now when your client application runs this process SQL server will always use the compiled version in the procedure cache which prevents the cache from bloating and allowing more of your system memory to be used for the Buffer cache which will also improve your system performance and application scaleablility.
SqlCommand.Prepare()
A member asked weither it would be more efficient to use the Prepare()
method of the SqlCommand
object. In the context of this article the answer is, no. The Prepare() method calls sp_prepexec
, an undocumented stored procedure. Essentially, it creates a temporary stored procedure for the current session which is destroyed when the current connection is closed. This allows the query to be called as if it were a stored procedure until the connection is closed. The benefit of using Prepare()
is that it reduces network traffic because the entire command text as defined in the client application is sent across the network only once. The first time the command is executed after calling the Prepare()
method, it looks something like this:
declare @p1 int
set @p1=10001
exec sp_prepexec @p1 output,N'SELECT C.Title, C.FirstName, C.LastName,
C.EmailAddress, C.Phone,E.*
FROM Person.Contact C
INNER JOIN HumanResources.Employee E ON E.ContactID = C.ContactID
WHERE C.LastName = @LastName
',@EmployeeID=15073
select @p1
Notice how it declares @p1
and returns it as a scalar value along with the results of the command. The SqlCommand
object then can send the following command on subsequent requests:
exec sp_execute 10001,@EmployeeID=12991
The reason it is not more efficient to use the Prepare()
method in this scenario is because, unless the same connection object is used and remains open, the request sent to Sql Server will always be the call to sp_prepexec
. So this adds the overhead of creating a temporary stored procedure along with a handle to that stored procedure, but you don't get the benefit of the reduced traffic. It is unlikely that this overhead will create a large impact on SQL Server however, but I recommend omitting the call to Prepare()
unless it's needed.
The Prepare()
method will shine when the SqlCommand object will be called repeatedly during the same session (usually within a loop). When compared to not calling Prepare()
, it was much faster. I will not be providing any benchmark comparisons here. I leave that to you, the reader. But in the sample project I have included a method which will allow you to do some of your own testing.
Sample Project
The sample project included with this article runs a stored procedure taken from the AdventureWorks database and passes the entire text of the stored procedure to a command object as text to simulate a more complex query with a single parameter. There are two versions of the query, an Ad Hoc version and a parameterized version. Each one is run 10,000 times with a randomized integer as the parameter.
I recommend running perfmon while the sample project is running. I used the counters Process.% Processor Time - sqlserver
, SQL Server:SQL Statistics.Batch Requests/Sec
and SQL Server:Plan Cache - Cache Pages
. Here were my results:
| Run Time | CPU | Req/Sec | Cache Pages - Start | Cache Pages - End |
---|
Ad Hoc | 115 SEC | 95% | 109 | 600 | 43684 |
Parameterized | 10 SEC | 59% | 1153 | 50 | 260 |
As you can see, without parameters the process takes 10 times longer to run, the CPU runs almost 2 times higher and the number of 8k pages used by the procedure cache is 200 times more. And the last counter (8k pages) will continue to get worse the longer the application runs because when the query is parameterized it will only add one version of the plan to the cache. Conversely, the ad hoc version will add a new version of the plan for every unique variable passed to it.
Conclusion
While stored procedures are the recommended method for data access clients, it isn't always practical to rewrite everything if your code is already using Ad Hoc SQL. However, you can get almost the same performance benefit from using parameters that you can get from stored procedures. So if you haven't done so there really shouldn't be any excuse for not parameterizing your queries. Your application will scale better and will be more stable.
Revision History
1/10/2007
- Added clarification of article's context
- Added
SqlCommand.Prepare()
section - Added timers to sample project
- Removed use of concatenation from sample project and sample code in article
- Replaced
SqlDataAdapter.Fill()
with SqlCommand.ExecuteReader()
in sample project - Added methods to sample project to demonstrate using
SqlCommand.Prepare()