Measuring and Monitoring WCF Web Service Performance

Stuart Wheelwright

5.00/5 (17 votes)

4 Oct 2012GPL310 min read

57.1K

2.2K

Using ServiceMon to obtain performance statistics for web services

Introduction

Earlier in the year the software company I work for, released RightCalc, a solution designed to enable insurance companies to centrally manage their prices.

Prior to launch, we needed to be sure that the system could handle the anticipated traffic whilst still responding to the majority of requests in an acceptable time. Failure to do so could result in restless customers and lost sales for the insurance companies. This requirement brought about the development of ServiceMon.

Background

Before running any performance tests it's crucial to understand what is trying to be achieved by building up a realistic picture of how the service will be used. In the case of RightCalc, we were fortunate to get a consensus on performance targets early on in the development of the system. These are the benchmarks we agreed:

1. Typical load testing

The system must handle a typical load of 3 requests per second and continue to deliver respectable response times of 50ms or less, for 99% or more requests. We anticipated this would be sufficient for the first few months after the system was launched.

2. Soak Testing

Ensure the system is capable of running for a prolonged period of 1 year, with a sustained load of 1 request/sec, without any manual intervention. The system should maintain a steady response rate throughout the duration of the test.

3. Spike testing

Ensure the system can handle brief, intense traffic spikes without causing a loss of data or a dramatic performance hit. The system must handle occasional, intense loads of 100 reqs/sec for 30 seconds whilst still responding to 99% of requests within 100ms.

Introducing ServiceMon

We developed ServiceMon specifically for the RightCalc project to help monitor critical web pages and web services and to apply load and gather statistics that would prove the system was capable of handling the three scenarios above.

When we were unable to find a suitable monitoring solution, it was decided to build one that worked on Windows, was extensible and simple to configure. We needed an easily configurable solution to allow us to develop ad hoc tests to investigate new avenues of research as suggested by results. This while still having the ability to manage a standard, continuously-running monitoring suite which could be version controlled, in a similar manner to source code, using standard tools, such as Subversion.

Last week Kaleida made the decision to release ServiceMon to the community as open-source, under the GPL license, in the hope that other developers, QA and ops staff could benefit from the tool.

ServiceMon works by executing a very simple script containing operations like this:

http-get "http://www.google.com" must-contain "<title>Google</title>"

The script can contain any number of operations but they all follow the structure of request then response-handler. The request part of the operation, http-get, is executed on the tick of a timer (which occurs every second by default) that then waits for a response to be received. When the response arrives it is given to the response-handler, in this case must-contain, which checks that the HTML contains the specified phrase.

Whilst an operation is waiting to receive a response, the script execution will continue and subsequent operations will carry on being executed on every new timer tick.

If the request or response-handler is not successful, a failure is recorded and the screen changes to reflect this:

The "10 foot" status screen displaying progressively more erros

There are a number of built in request types which are useful for common testing scenarios such as monitoring if a web page is available or pinging a server.

To produce performance statistics for a proprietary SOAP web service, we'll need to write a custom request type. We'll discuss how to do this later (don't worry, it's very straightforward!) but first we'll take a closer look at the types of statistics ServiceMon produces.

Response Time Statistics

Producing a response time distribution graph is an excellent method of building up a picture of your web service's performance characteristics.

First, we make a large number of requests over a prolonged period while the server is under a stable load. Then, having recorded the time each request took, we find the appropriate "bucket" and add one to its item count.

Imagine we've made 100 requests and recorded their time taken in "buckets" 10ms wide. We may see results like this:

Response time (ms)	Count
0-10	0
11-20	2
21-30	49
31-40	22
41-50	12
51-60	8
61-70	4
71-80	2
81-90	1
91-100	0

When these values are plotted, we'll often see this characteristic curve:

The position of the peak shows the response time most often experienced; the further to the left, the better.

The shape and scale is also important. A thin peak is ideal, and shows that the web service delivers consistent response times - most visitors will have a similar wait. A large spread and extended right-hand tail is sub-optimal and further investigation is needed to determine why the response time is so variable.

The graph should ideally contain a single peak. The presence of a secondary peak should be investigated as this may indicate a frequently occurring background task affecting the processing of a significant proportion of requests, such as a backup job or re-indexing task.

For some purposes, such as for a SLA, it is useful to describe the curve quantitatively by specifying a number of "nines":

90% of requests take less than 60ms
99% of requests take less than 80ms
99.9% of requests take less than 90ms
99.99% of requests take less than 90ms

For websites, and web services indirectly consumed by humans, your target response times should consider how humans perceive time. Research shows that a response received within 100ms feels instant, within 1000ms (1 second) is tolerable but anything longer than 10,000ms (10 seconds) is enough for the user to lose interest and do something else.

Viewing Statistics with ServiceMon

Now that we've looked at performance measures of a web site or web service, we'll use ServiceMon to produce these statistics.

For this example, we'll create a script with one operation - a simple HTTP GET request:

One word of warning: Don't run aggressive performance tests against web sites or web services that you don't own. It is impolite to bombard someone else's web server with thousands of requests - in many countries it is illegal - and can be considered a denial-of-service attack. At the very least, your IP address will probably be throttled or blocked.

To begin monitoring, press the "Start" button on ServiceMon. The screen will automatically change to the "10 foot" status display which is designed to be viewed at a distance:

The green background and smiley face show that monitoring is active and no errors have been detected. We can see more detail by viewing the "Responses" tab which shows each individual response received. However, what is of real interest is the response time distribution graph viewable through the "Statistics" tab. This graph is updated every second and, after running for a while, will look something like this:

This graph, which is now plotted on a logarithmic x-axis to reveal the detail of the hump, conforms to the expected shape. It shows us at a glance that the majority of requests were sent, processed and received, within approximately 10ms. This can be confirmed by looking at the "nines" values in the boxes to the right: 99% of requests were processed in 15ms; 99.9% in 25ms. Notice how the figure for 99.99% is significantly higher than the others at 191ms? This demonstrates the importance of obtaining a large enough sample size. This test run has only made around 2,800 requests and, due to this low number, it only took one slow response to push the 99.99% figure skywards. We really need many 10s of thousands of requests before we can obtain trustworthy figures.

This test was performed with no background load on the server and a test load of 2 requests a second (1 per 500ms). To build up a more comprehensive picture of the performance profile we'll need to repeat the test with different background loads and different test loads. It is possible to use ServiceMon for both jobs by creating two scripts which are run in different instances of the utility. One instance will be used to apply a background load and then one or more extra instances of ServiceMon will apply the test load and produce the actual statistics. We used this approach to good effect when testing RightCalc prior to launch and continue to use it to assess the live service for performance problems.

Creating a Custom ServiceMon Request Type

So far we've looked at how to build up a performance profile for a website, using the built-in http-get request type. But how do we test a SOAP web service that requires more than just a URL to invoke?

A SOAP message will have a header, which may contain authentication tokens and routing information, and a body which contains the request data. Every SOAP service and web method will have its own specific requirements which make it difficult for a tool like ServiceMon to support in a generic fashion.

Fortunately, ServiceMon is built in an extensible way which allows new request types, and response-handlers to be easily plugged in to its framework, and then used in the same manner as the built-in operations.

We'll start by creating a new ServiceMon request which calls an example web service provided by W3Schools that simply converts temperatures in Celsius to Fahrenheit.

First we need to create a new Visual Studio Class Library solution:

Then, add a reference to the ServiceMon framework (this will be in the same location as ServiceMon.exe, usually at C:\Program Files (x86)\Kaleida\ServiceMon):

Kaleida.ServiceMonitor.Framework.dll

Finally, create a new public class called GetCelsius which derives from PreparedRequest:

That's all we have to do to create a new request type. Obviously, it doesn't call the web service yet, but there's enough here for us to test it in ServiceMon.

After building the DLL, copy it to the Operations folder that is located in the same place as your ServiceMon.exe (usually

C:\Program
           Files (x86)\Kaleida\ServiceMon\Operations

)

Restart ServiceMon and, to verify that the DLL was successfully discovered, view the Help tab in ServiceMon:

The new request type will be listed with the name get-celcius, which is derived from the name of the class. Notice how it doesn't take any parameters yet or have a description. We'll get to that later.

You can now use this request as you would any of the built-in requests. Create a new script and add this line:

get-celsius log-response

When you click the Start button and view the Responses tab you'll see something this:

03-Oct-2012 13:05:24.329 0ms [no description present] and log response my response

Next we'll go back to our Visual Studio solution to finish things off.

A get-celsius request isn't particularly useful unless a Fahrenheit value can be specified. The way we do this is by adding a public constructor to our class with a string parameter:

public class GetCelsius : PreparedRequest
 {
     private string fahrenheit;

     public GetCelsius(string fahrenheit)
     {
         this.fahrenheit = fahrenheit;
     }

Now we need to add a reference to the web service and complete the implementation of GetResponse.

The web service proxy is built using the "Add Service Reference..." option on the Project menu:

This is the code to add to GetResponse to actually call the web service with our Fahrenheit value:

var binding = new BasicHttpBinding(BasicHttpSecurityMode.None);
var endpoint = new EndpointAddress("http://www.w3schools.com/webservices/tempconvert.asmx");

var soapClient = new TempConvertSoapClient(binding, endpoint);
return soapClient.FahrenheitToCelsius(fahrenheit);

Finally, we'll override the Description property. Here's the finished class:

And that's it! Copy the new DLL into the Operations folder and change the script so that the Fahrenheit value is specified:

get-celsius "98.6" must-equal "37"

Click Start and ServiceMon will begin monitoring the tempconvert web service, alerting you the moment it fails to return the expected result.

To monitor RightCalc we have a number of custom request types like this, one for each web service. Our scripts constantly monitor the web pages and web services on the live system and the different staging platforms to give us instant notification of any live problems, or potential regressions in the development pipeline. Our system has been running for over six months and ServiceMon has proved to be an invaluable tool.

We hope you find ServiceMon useful and please get in touch if you have any questions or would like to help build new extensions.

The project's home page, latest downloads, and all documentation can be found here: ServiceMon Home Page

History

First Draft: 6th September 2012

Updated to reflect changes in newer versions of ServiceMon: 17th September 2012

Changed title and updated screenshots taken from v1.0 : 3rd October 2012

Changed license to GPLv3 and included ServiceMon v1.0 source code with article. Removed links to company: 8th October

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)