At 9 AM in the morning, during the peak traffic for your business, you get an emergency call that the website you built is no more. It’s not responding to any request. Some people can see some pages after waiting for a long time but most can’t. So, you think it must be some slow query or that the database might need some tuning. You do the regular checks like looking at CPU and Disk status on the database server. You find nothing wrong there. Then you suspect it must be the webserver running slow. So, you check the CPU and Disk status on webservers. You find no problem there either. Both web servers and database servers have very low CPU and Disk usage. Then you suspect it must be the network. So, you try a large file copy from the webserver to the database server and vice versa. Nope, file copies perfectly fine, network has no problem. You also quickly check the RAM usage on all the servers but find the RAM usage is perfectly fine. As the last resort, you run some diagnostics on the Load Balancer, Firewall, and Switches but find everything to be in good shape. But your website is down. Looking at the performance counters on the webserver, you see a lot of requests getting queued, and there’s very high request execution time and request wait time.
So you do an IIS restart. Your website comes back online for a couple of minutes and then goes down again. After doing a restart several times, you realize it’s not an infrastructure issue. You have some scalability issue in your code. All the good things you have read about scalability and thought were fairy tales and will never happen to you are now happening right in front of you. You realize you should have made your services async.
However, just converting your sync services to async mode does not solve the scalability problem. WCF has a bug due to which it cannot serve requests as fast as you would like it to. The thread pool it uses to handle async calls cannot start threads as requests come in. It only adds a new thread to the pool every 500ms. As a result, you get a slow rampup of threads:
Read my article to learn details on how WCF works for async services and how to fix this bug to make your async services truly async and scale under heavy load.
Don’t forget to vote.