WCF Throughput and Scalability => Which and When

Nitin Singh India

5.00/5 (2 votes)

13 Aug 2013CPOL5 min read

9.5K

WCF throughput and scalability - which and when

The services should be always ready to process incoming requests and work with the expected load, while simultaneously ensuring to not overload server machine's resources.

The scalability of a service describes how many instances are getting created based on the requirements throughput describes how each specific instance handles client's requests.

WCF offers three parallel ways to provide scalability and throughput. These are Concurrency, Instancing and Throttling. These attributes are specified at the service level as service behavior properties.

Instancing: Determines the scope of time which the application context lives within the infrastructure. The context controls the application state, client's session (if applicable).
There are three types of instancing behavior available, set on InstanceContextMode:

PerCall: It does setup and tear down with every call to the service. If the initialization of the service is very simple and low cost, this type is to be used. Other cases are less frequent calls, so no need to maintain any state on the server. Fastest and least management option, highly scalable.
PerSession: This is the default option. It setups the instance on first call and preserves the instance till an unhandled fault occurs or idle timeout is reached. Useful in case of repeated calls like in a loop or where the service initialization is somewhat heavy.
Single: This sets the service to have only one instance active. All requests from all clients are services This is used where service needs stateful operations and the operations themselves have required concurrent access protection. In case of any unhandled exception from any client, the server will be down for all clients connected and need to be restarted.

Concurrency Mode

Concurrency happens when multiple threads attempt to access same resource, often leading to blockage. If multiple clients call the same service, multiple concurrent threads are requested.

Determines how many individual requests can access the service at any given time. Within the requests, the locking required to protect service resources is provided by the WCF infrastructure out of the box. This is set as a 'service behavior' in the configuration or at the service contract's behavior in the code as attribute.

There are three types of concurrent behavior available:

Single: This is the default configuration. In this, only a single request has access to the service object at any given time. The infrastructure creates a lock on the service instance itself to protect against any multiple access to any operation. All other requests are queued till the current request is processing, subject to the client's sendTimeout or the service's sessionTimeout (if applicable).
Multiple: In this, multiple threads for the service are created to handle multiple clients. This allows a great thoroughput, but thread concurrency needs to be taken care of as they can compete for shared resources.
ReEntrant: This is used to enable callback contracts, which are not one-way. Otherwise the outgoing response from service to client will create a deadlock when coming back to the service (as a callback). Only a single request thread has access to the service object, but that can exit the service temporarily to allow other thread, or can also call the client through callback and re-enter as a callback response without a deadlock.

The following diagram illustrates the decisioning relationship for both the instancing and concurrency.

In this diagram, there are two sessions created by client in which 3 and 2 threads have been instantiated in concurrent mode. The configuration is the default one.

Other combinations are given below:

All calls get their own service instance. No blocking and fastest operation.

Each client getting its own service instance, however with all requests going to the same instance, the other requests get queued. No effect on calls from other clients as they are separate instances.

If service initialization is too heavy, it operates as singleton with all requests from all clients being served from a single instance. However, the service attends to only one call at any time with other calls getting queued.

For callbacks, if the messaging mode is One-Way, the callbacks return on their own channel. However for request-reply mode, the callbacks need to have the concurrency as Re-Entrant. As illustrated in the given image, with Single concurrency, the callback's response get blocked at the server; but with ReEntrant, the it successfully enters due to main thread temporarily exiting, allowing callback and then "Re-entering" the service.

Callbacks which are non-OneWay

Now comes the throttling. Throttling specifies the maximum limits a service can scale based on the available resources. At any time, the server resources should not be constrained beyond its limits, and if we reach such levels, the capacity needs expansion or architecture needs a revisit.

There are three throttling parameters which can be set through the "serviceThrottling" attribute of service behavior.

MaxConcurrentCalls: Applicable for PerSession and Single instancing mode only and concurrency mode as multiple. This does not apply for Single and re-entrant concurrency.

With IIS or WAS hosting, ASP.NET handles the processing of requests, and forwards the .svc requests to the WCF thread. If the call is one-way, the ASP.NET thread is released and WCF threads allocated according to throttle setting. If the call is request-reply, the ASP.NET thread is blocked while the WCF thread does its job.

MaxConcurrentInstances: For PerCall, this should be equal or more than the number of concurrent calls. For PerSession, this should be equal or more than maximum number of active session instances. This is not applicable for Single instancing as only one instance is ever created.

MaxConcurrentSessions: Limits all types of sessions (application, transport, reliable and secure). The sessions have a lot longer lifetime than requests, so if each session was created for small requests, they will take up resources and other users will get left out.

Different types sessions are supported for specific bindings so that configuration also needs to be looked when deciding the session type and count.

XML

 <serviceBehaviors>
  <behaviorName="uatBehavior">
       <serviceThrottling
               maxConcurrentCalls="16"
               maxConcurrentInstances="2147483647"
               maxConcurrentSessions="10"  />
  </behavior>
</serviceBehavior>

Happy servicing the world. :)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)