(untagged)

.NET Dynamic Software Load Balancing

Stoyan Damov

0.00/5 (No votes)

9 Dec 2002

A Draft Implementation of an Idea for .NET Dynamic Software Load Balancing

Download source code (zipped) - ~100 KB
Latest source code and documentation (would be) available here soon.

Introduction
1. A Teeny-Weeny Intro to Clustering and Load Balancing
2. My Idea for Dynamic Software Load Balancing
Architecture and Implementation
Load Balancing in Action - Balancing a Web Farm
Building, Configuring and Deploying the Solution
1. Configuration
2. Deployment
Some thoughts about MC++ and C#
1. Managed C++ to C# Translation
2. C#'s readonly fields vs MC++ non-static const members
"Bugs suck. Period."
TODO(s)
Conclusion
1. A (final) word about C#
Disclaimer

"Success is the ability to go from one failure to another with no loss of enthusiasm."
Winston Churchill

Introduction

<blog date="2002-12-05"> Yay! I passed 70-320 today and I'm now MCAD.NET. Expect the next article to cover XML Web Services, Remoting, or Serviced Components:) </blog>

This article is about Load Balancing. Neither "Unleashed", nor "Defined" -- "Implemented":) I'm not going to discuss in details what load balancing is, its different types, or the variety of load balancing algorithms. I'm not going to talk about proprieatary software like WLBS, MSCS, COM+ Load Balancing or Application Center either. What I am going to do in this article, is present you a custom .NET Dynamic Software Load Balancing solution, that I've implemented in less than a week and the issues I had to resolve to make it work. Though the source code is only about 4 KLOC, by the end of this article, you'll see, that the solution is good enough to balance the load of the web servers in a web farm. Enjoy reading...

Everyone can read this article

...but not everybody would understand everything. To read, and understand the article, you're expected to know what load balancing is in general, but even if you don't, I'll explain it shortly -- so keep reading. And to read the code, you should have some experience with multithreading and network programming (TCP, UDP and multicasting) and a basic knowledge of .NET Remoting. Contrarily of what C# developers think, you shouldn't know Managed C++ to read the code. When you're writing managed-only code, C# and MC++ source code looks almost the same with very few differences, so I have even included a section for C# developers which explains how to convert (most of the) MC++ code to C#.

I final warning, before you focus on the article -- I'm not a professional writer, I'm just a dev, so don't expect too much from me (that's my 3rd article). If you feel that you don't understand something, that's probably because I'm not a native English speaker (I'm Bulgarian), so I haven't been able to express what I have been thinking about. If you find a grammatical nonsense, or even a typo, report it to me as a bug and I'll be more than glad to "fix" it. And thanks for bearing this paragraph!

A Teeny-Weeny Intro to Clustering and Load Balancing

For those who don't have a clue what Load Balancing means, I'm about to give a short explanation of clustering and load balancing. Very short indeed, because I lack the time to write more about it, and because I don't want to waste the space of the article with arid text. You're reading an article at www.CodeProject.com, not at www.ArticleProject.com:) The enlightened may skip the following paragraph, and I encourage the rest to read it.

Mission-critical applications must run 24x7, and networks need to be able to scale performance to handle large volumes of client requests without unwanted delays. A "server cluster" is a group of independent servers managed as a single system for higher availability, easier manageability, and greater scalability. It consists of two or more servers connected by a network, and a cluster management software, such as WLBS, MSCS or Application Center. The software provides services such as failure detection, recovery, load balancing, and the ability to manage the servers as a single system. Load balancing is a technique that allows the performance of a server-based program, such as a Web server, to be scaled by distributing its client requests across multiple servers within a cluster of computers. Load balancing is used to enhance scalability, which boosts throughput while keeping response times low.

I should warn you that I haven't implemented a complete clustering software, but only the load balancing part of it, so don't expect anything more than that. Now that you have an idea what load balancing is, I'm sure you don't know what is my idea for its implementation. So keep reading...

My Idea for Dynamic Software Load Balancing

How do we know that a machine is busy? When we feel that our machine is getting very slow, we launch the Task Manager and look for a hung instance of iexplore.exe:) Seriously, we look at the CPU utilization. If it is low, then the memory is low, and disk must be trashing. If we suspect anything else to be the reason, we run the System Monitor and add some performance counters to look at. Well, this works if you're around the machine and if you have one or two machines to monitor. When you have more machines you'll have to hire a person, and buy him a 20-dioptre glasses to stare at all machines' System Monitor consoles and go crazy in about a week :). But even if you could monitor your machines constantly you can't distribute their workload manually, could you? Well, you could use some expensive software to balance their load, but I assure you that you can do it yourself and that's what this article is all about. While you are able to "see" the performance counters, you can also collect their values programmatically. And I think that if we combine some of them in a certain way, and do some calculations, they could give you a value, that could be used to determine the machine's load. Let's check if that's possible!

Let's monitor the \\Processor\% Processor Time\_Total and \\Processor\% User Time\_Total performance counters. You can monitor them by launching Task Manager, and looking at the CPU utilization in the "Performance" tab. (The red curve shows the % Procesor time, and the green one -- the %User time). Stop or pause all CPU-intensive applications (WinAMP, MediaPlayer, etc.) and start monitoring the CPU utilization. You have noticed that the counter values stay almost constant, right? Now, close Task Manager, wait about 5 seconds and start it again. You should notice a big peak in the CPU utilization. In several seconds, the peak vanishes. Now, if we were reporting performance counters values instantly (as we get each counter sample), one could think that our machine was extremely busy (almost 100%) at that moment, right? That's why we're not going to report instant values, but we will collect several samples of the counter's values and will report their average. That would be fair enough, don't you think? No?! I also don't, I was just checking you:) What about available memory, I/O, etc. Because the CPU utilization is not enough for a realistic calculation of the machine's workload, we should monitor more than one counter at a time, right? And because, let's say, the current number of ASP.NET sessions is less important than the CPU utilization we will give each counter a weight. Now the machine load will be calculated as the sum of the weighted averages of all monitored performance counters. You should be guession already my idea for dynamic software load balancing. However, a picture worths thousand words, and an ASCII one worths 2 thousand:) Here' is a real sample, and the machine load calculation algorithm. In the example below, the machine load is calculated by monitoring 4 performance counters, each configured to collect its next sample value at equal intervals, and all counters collect the same number of samples (this would be your usual case):

+-----------+  +-----------+  +-----------+  +-----------+
|% Proc Time|  |% User Time|  |ASP Req.Ex.|  |% Disk Time|
+-----------+  +-----------+  +-----------+  +-----------+
|Weight  0.4|  |Weight  0.3|  |Weight  0.2|  |Weight  0.5|
+-----------+  +-----------+  +-----------+  +-----------+
|         16|  |         55|  |         11|  |         15|
|         22|  |         20|  |          3|  |          7|
|          8|  |         32|  |         44|  |          4|
|         11|  |         15|  |         16|  |         21|
|         18|  |         38|  |         21|  |          3|
+-----+-----+  +-----+-----+  +-----+-----+  +-----+-----+
| Sum |   75|  | Sum |  160|  | Sum |   95|  | Sum |   50|
+-----+-----+  +-----+-----+  +-----+-----+  +-----+-----+
| Avg |   15|  | Avg |   32|  | Avg |   19|  | Avg |   10|
+-----+-----+  +-----+-----+  +-----+-----+  +-----+-----+
| WA  |  6.0|  | WA  |  9.6|  | WA  |  3.8|  | WA  |  5.0|
+-----+-----+  +-----+-----+  +-----+-----+  +-----+-----+

Legend:

Sum: the sum of all counter samples
Avg: the average of all counter samples (Sum/Count)
WA: the weighted average of all counter samples (Sum/Count * Weight)
% Proc Time: (Processor\% Processor Time\_Total), the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value from 100%
% User Time: (Processor\% User Time\_Total) is the percentage of elapsed time the processor spends in the user mode. User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems. The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory. The operating system switches application threads to privileged mode to access operating system services. This counter displays the average busy time as a percentage of the sample time
ASP Req.Ex.: (ASP.NET Applications\Requests Executing\__Total__) is the number of requests currently executing
% Disk Time: (Logical Disk\% Disk Time\_Total) is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests

Sum (% Proc Time) = 16 + 22 + 8 + 11 + 18 = 75 Average (% Proc Time) = 75 / 5 = 15 Weighted Average (% Proc Time) = 15 * 0.4 = 6.0 ... MachineLoad = Sum (WeightedAverage (EachCounter)) MachineLoad = 6.0 + 9.6 + 3.8 + 5.0 = 24.4

Architecture and Implementation

I wondered about half a day how to explain the architecture to you. Not that it is so complex, but because it would take too much space in the article, and I wanted to show you some code, not a technical specification or even a DSS. So I wondered whether to explain the architecture using a "top-to-bottom" or "bottom-to-top" approach, or should I think out something else? Finally, as most of you have already guessed, I decided to explain it in my own mixed way:) First, you should learn of which assemblies is the solution comprised of, and then you could read about their collaboration, the types they contain and so on... And even before that, I recommend you to read and understand two terms, I've used throughout the article (and the source code's comments).

Machine Load: the overall workload (utilization) of a machine - in our case, this is the sum of the weighted averages of all performance counters (monitored for load balancing); if you've skipped the section "My Idea for Dynamic Software Load Balancing", you may want to go back and read it
Fastest machine: the machine with the least current load

Architecture Outlook

First, I'd like to appologize about the "diagrams". I can work with only two software products that can draw the diagrams, I needed in this article. I can't afford the first (and my company is not willing to pay for it too:), and the second bedeviled me so much, that I dropped out of the article one UML static structure diagram, a UML deployment diagram and a couple of activity diagrams (and they were nearly complete). I won't tell you the name of the product, because I like very much the company that developed it. Just accept my appologies, and the pseudo-ASCII art, which replaced the original diagrams. Sorry:)

The load balancing software comes in three parts: a server, that reports the load of the machine it is running on; a server that collects such loads, no matter which machine they come from; and a library which asks the collecting server which is the least loaded (fastest) machine. The server that reports the machine's load is called "Machine Load Reporting Server" (MLRS), and the server, that collects machine loads is called "Machine Load Monitoring Server" (MLMS). The library's name "Load Balancing Library" (LBL). You can deploy these three parts of the software as you like. For example, you could install all of them on all machines.

The MLRS server on each machine joins a special, designated for the purpose of the load balancing, multicasts group, and sends messages, containing the machine's load to the group's multicast IP address. Because all MLMS servers join the same group at startup, they all receive each machine load, so if you run both MLRS and MLMS servers on all machines, they will know each other's load. So what? We have the machine loads, but what do we do with them? Well, all MLMS servers store the machine loads in a special data structure, which lets them quickly retrieve the least machine load at any time. So all machines now know which is the fastest one. Who cares? We haven't really used that information to balance any load, right? How do we query MLMS servers which is the fastest machine? The answer is that each MLMS registers a special singleton object with the .NET Remoting runtime, so the LBL can create (or get) an instance of that object, and ask it for the least loaded machine. The problem is that LBL cannot ask simultaneously all machines about this (yet, but I'm thinking on this issue), so it should choose one machine (of course, it could be the machine, it is running on) and will hand that load to the client application that needs the information to perform whatever load balancing activity is suitable. As you will later see, I've used LBL in a web application to distribute the workload between all web servers in web farm. Below is a "diagram" which depicts in general the collaboration between the servers and the library:

     +-----+          ______          +-----+
     |  A  |       __/      \__       |  B  |
     +-----+    __/            \__    +-----+
 +-->| LMS |<--/     Multicast    \-->| LMS |<--+
 |   |     |   /                  \   |     |   |
 |   | LRS |-->\__     Group    __/   |     |   |
 |   |     |      \__        __/      |     |   |
 |<--| LBL |       ^ \______/         | LBL |---+
 |   +-----+       |                  +-----+
 |                 |  +-----+
 |                 |  |  C  |
 |                 |  +-----+
 |                 |  |     |
 |                 |  |     |
 |                 +--| LRS |
 |     Remoting       |     |
 +--------------------| LBL |
                      +-----+

MLMS, MLRS and LBL Communication

Note: You should see the strange figure between the machines as a cloud, i.e. it represents a LAN :) And one more thing -- if you don't understand what multicasting is, don't worry, it is explained later in the Collaboration section.

Now look at the "diagram" again. Let me remind you that when a machine joins a multicast group, it receives all messages sent to that group, including the messages, that the machine has sent. Machine A receives its own load, and the load, reported by C. Machine B receives the load of A and C (it does not report its load, because there's no MLRS server installed on it). Machine C does not receive anything, because it has not MLMS server installed. Because the machine C's LBL should connect (via Remoting) to an MLMS server, and it has no such server installed, it could connect to machine A or B and query the remoted object for the fastest machine. On the "diagram" above, the LBL of A and C communicate with the remoted object on machine A, while the LBL of B communicates with the remoted object on its machine. As you will later see in the Configuration section, there are very few things that are hardcoded in the solution's source code, so don't worry -- you will be able to tune almost everything.

Assemblies & Types

The solution consists of 8 assemblies, but only three of them are of some interest to us now: MLMS, MLRS, and LBL, located respectively in two console applications (MachineLoadMonitoringServer.exe and MachineLoadReportingServer.exe) and one dynamic link library (LoadBalancingLibrary.dll). Surprisingly, MLMS and MLRS do not contain any types. However, they use several types get their job done. You may wonder why I have designed them in that way. Why hadn't I just implemented both servers directly in the executables. Well, the answer is quite simple and reflects both my strenghts and weaknesses as a developer. If you have the time to read about it, go ahead, otherwise click here to skip the slight detour.

GUI programming is what I hate (though I've written a bunch of GUI apps). For me, it is a mundane work, more suitable for a designer than for a developer. I love to build complex "things". Server-side applications are my favorite ones. Multi-threaded, asynchronous programming -- that's the "stuff" I love. Rare applications, that nobody "sees" except for a few administrators, which configure and/or control them using some sort of administration consoles. If these applications work as expected the end-user will almost never know s/he is using them (e.g. in most cases, a user browsing a web site does not realize that an IIS or Apache server is processing her requests and is serving the content). Now, I've written several Windows C++ services in the past, and I've written some .NET Windows services recently, so I could easily convert MLMS and MLRS to one of these. On the other hand I love console (CUI) applications so much, and I like seing hundreds of tracing messages on the console, so I left MLMS and MLRS in their CUI form for two reasons. The first reason is that you can quickly see what's wrong, when something goes wrong (and it will, at least once:), and the second one is because I haven't debugged .NET Windows services (and because I have debugged C++ Windows services, I can assure you that it's not "piece of cake"). Nevertheless, one can easily convert both CUI applications in Windows services in less than half an hour. I haven't implemented the server classes into the executables to make it easier for the guy who would convert them into Windows services. S/he'll need to write just 4 lines of code in the Window Service class's to get the job done:

declare the server member variable:
```
LoadXxxServer __gc* server;
```

instantiate and start it in the overridden OnStart method:

    server = new LoadXxxServer ();
    server->Start ();

stop it in the overriden OnStop method:
```
    server->Stop ();
```

Xxx is either Monitoring or Reporting. I'm sure you understand me now why I have implemented the servers' code in separate classes in separate libraries, and not directly in the executables.

I mentioned above that the solution consists of 8 assemblies, but as you remember, 2 of them (the CUIs) did not contain any types, one of them was LBL, so what are the other 5? MLMS and MLRS use respectively the types, contained in the libraries LoadMonitoringLibrary (LML) and LoadReportingLibrary (LRL). On the other hand, they and LBL use common types, shared in an assembly, named SharedLibrary (SL). So the assemblies are now MLMS + MLRS + LML + LRL + LBL + SL = 6. The 7th is a simple CUI (not interesting) application, I used to test the load balancing, so I'll skip it. The last assembly, is the web application that demonstrates the load balancing in action. Below is a list of the four most important assemblies that contain the types and logic for the implementation of the load balancing solution.

SharedLibary (SL) - contains common and helper types, used by LML, LRL and/or LBL. A list of the types (explained further) follows:

ServerStatus - enumeration, used by LML and LRL's LoadXxxServer classes
WorkerDoneEventHandler - delegate, ditto
Configurator - utility class (I'll discuss later), ditto
CounterInfo - "struct" class, used by LRL and SL
ILoadBalancer - interface, implemented in LML and used by LBL
IpHelper - utility class, used by LML and LRL
MachineLoad - "struct" class (with MarshalByValue semantics for the needs of the Remoting runtime), used by LML, LRL and LBL
Tracer - utility class, which most classes in LML and LRL inherit in order to trace in the console in a consistent manner

NOTE: CounterInfo is not exactly what C++ developers call a "struct" class, because it does a lot of work behind the scenes. Its implementation is non- trivial and includes topics like timers, synchronization, and performance counters monitoring; look at the Some Implementation Details section for more information about it.

LoadMonitoringLibrary (LML) - contains the LoadMonitoringServer (LMS) class, used directly by MLMS, as well as all classes, used internally in the LMS class. List of LML's types (explained further) follows:

LoadMonitoringServer - (LMS) class, the MLMS core
MachineLoadsCollection - a simulation of a priority queue that stores the machines' loads in a sorted manner, so it could quickly return the least loaded machine (its implementation is more interesting than its name)
LoadMapping - "struct" class, used internally by MachineLoadsCollection
CollectorWorker - utility class, its only (public) method is the worker thread that accepts and collects machine load reports
ReporterWorker - utility class, its only (public) method is the worker thread that accepts LBL requests and reports machine loads
WorkerTcpState - "struct" class, used internally by the CollectorWorker
WorkerUdpState - "struct" class, used internally by the ReporterWorker
ServerLoadBalancer - a special Remoting-enabled (MarshalByRefObject) class, which is registered for remoting as a Singleton, and activated on the server side by LBL to service its requests

NOTE: I used the ReporterWorker to implement the first version of LBL in some faster, more lame way, but I've dropped it later; now, LMS registers a Singleton object for the LBL requests; however, LMS is still using (the fully functional) ReporterWorker class, so one could build another kind of LBL that connects to an MLMS and asks for the least loaded machine using a simple TCP socket (I'm sorry that I've overwritten the old LBL library).

LoadReportingLibrary (LRL) - contains the LoadReportingServer (LRS) class, used directly by MLRS, as well as all classes, used internally in the LRS class. List of LRL's types (explained further) follows:

LoadReportingServer - class, the MLRS core
ReportingWorker - utility class, its only (public) method is the worker thread that starts the monitoring of the performance counters and periodically reports to one or more MLMS the local machine's load

LoadBalancingLibrary (LBL) - contains just one class, ClientLoadBalancer, which is instantiated by client applications; the class contains only one (public) method, which is "surprisingly" named GetLeastMachineLoad returning the least loaded machine:) LBL connects to LML's ServerLoadBalancer singleton object via Remoting. For more details, read the following section.

Collaboration

In order to understand how the objects "talk" to each other within an assembly and between assemblies (and on different machines), you should understand some technical terms. Because they amount to about a page, and maybe most of you do know what they mean, here's what I'll do: I'll give you a list of the terms, and if you know them, click here to read about the collaboration, otherwise, keep reading... The terms are: delegate, worker, TCP, UDP, (IP) Multicasting, and Remoting.

Delegate: a secure, type-safe way to call a method of a class indirectly, using a "reference" to that method; very similar to and at the same time quite different from C/C++ function pointers (callbacks);
Worker: utility class, usually with just one method, which is started as a separate thread; the class holds the data (is the state), needed by the thread to do its job;
TCP: a connection-based, stream-oriented delivery protocol with end-to-end error detection and correction. Connection-based means that a communication session between hosts is established before exchanging data. A host is any device on a TCP/IP network identified by a logical IP address. TCP provides reliable data delivery and ease of use. Specifically, TCP notifies the sender of packet delivery, guarantees that packets are delivered in the same order in which they were sent, retransmits lost packets, and ensures that data packets are not duplicated;
UDP: a connectionless, unreliable transport protocol. Connectionless means that a communication session between hosts is not established before exchanging data. UDP is often used for one-to-many communications that use broadcast or multicast IP datagrams. The UDP connectionless datagram delivery service is unreliable because it does not guarantee data packet delivery and no notification is sent if a packet is not delivered. Also, UDP does not guarantee that packets are delivered in the same order in which they were sent. Because delivery of UDP datagrams is not guaranteed, applications using UDP must supply their own mechanisms for reliability, if needed. Although UDP appears to have some limitations, it is useful in certain situations. For example, Winsock IP multicasting is implemented with UDP datagram type sockets. UDP is very efficient because of low overhead. Microsoft networking uses UDP for logon, browsing, and name resolution;
Multicasting: technology that allows data to be sent from one host and then replicated to many others without creating a network traffic nightmare. This technology was developed as an alternative to broadcasting, which can negatively impact network bandwidth if used extensively. Multicast data is replicated to a network only if processes running on workstations in that network are interested in that data. Not all protocols support the notion of multicasting -- on Win32 platforms, only two protocols are capable of supporting multicast traffic: IP and ATM;
IP Multicasting: IP multicasting relies on a special group of addresses known as multicast addresses. It is this group address that names a given group. For example, if five machines all want to communicate with one another via IP multicast, they all join the same group address. Once they are joined, any data sent by one machine is replicated to every member of the group, including the machine that sent the data. A multicast IP address is a class D IP address in the range 224.0.0.0 through 239.255.255.255
Remoting: the process of communication between different operating system processes, regardless of whether they are on the same computer. The .NET remoting system is an architecture designed to simplify communication between objects living in different application domains, whether on the same computer or not, and between different contexts, whether in the same application domain or not.

I'll start from inside-out, i.e. I'll first explain how various classes communicate with each other with the assemblies, and then I'll explain how the assemblies collaborate between them.

In-Assembly collaboration (a thread synchronization how-to:)

When the LoadReportingServer and LoadMonitoringServer classes are instantiated, and their Start methods are called, they launch respectively one or two threads to do their job asynchronously (and to be able to respond to "Stop" commands, of course). Well, if starting a thread is very easy, controlling it is not that easy. For example, when the servers should stop, they should notify the threads that they are about to stop, so the threads could finish their job and exit appropriately. On the other hand, when the servers launch the threads, they should be notified when the threads are about to enter their thread loops and have executed their initialization code. In the next couple of paragraphs I'll explain how I've solved these synchronization issues, and if you know a cooler way, let me know (with the message board below). In the paragraphs below, I'll refer to the instances of the LoadReportingServer and LoadMonitoringServer classes as "(the) server".

When the Start method is executed, the LMS object creates a worker class instance, passing it a reference to itself (this), a reference to a delegate and some other useful variables that are not interesting for this section. The server object then creates an AutoResetEvent object in a unsignalled state. Then the LMS object starts a new thread, passing for the ThreadStart delegate the address of a method in the worker class. (I call a worker class' method, launched as a thread a worker thread.) After the thread has been started, the server object blocks, waiting (infinitely) for the event object to be signalled. Now, when the thread's initialization code completes, it calls back the server via the server- supplied delegate, passing a boolean parameter showing whether its initialization code executed successfully or something went wrong. The target method of the delegate in the server class sets (puts in signalled state) the AutoResetEvent object and records in a private boolean member the result of the thread initialization. Setting the event object unblocks the server: it now knows that the thread's startup code has completed, and also knows the result of the thread's initialization. If the thread did not manage to initialize successfully, it has already exited, and the server just stops. If the thread succeeded to initialize, then it enters its thread loop and waits for the server to inform it when it should exit the loop (i.e. the server is stopping). One could argue that this "worker thread-to-main thread" synchronization looks too complicated and he might be right. If we only needed to know that the thread has finished the initialization code (and don't care if it initialized successfully) we could directly pass the worker a reference to the AutoResetEvent object, and the thread would then set it to a signalled state, but you saw that we need to know whether the thread has initialized successfully or not.

Now that was the more complex part. The only issue we have to solve now is how to stop the thread, i.e. make it exit its thread loop. Well, that's what I call a piece of cake. If you remember, the server has passed a reference to itself (this) to the worker. The server has a Status property, which is an enumeration, describing the state of the server (Starting, Started, Stopping, Stopped). Because the thread has a reference to the server, in its thread loop it checks (by invoking the Status property) whether the server is not about to stop (Status == ServerStatus::Stopping). If the server is stopping, so is the thread, i.e. the thread exits silently and everything's OK. So when the server is requested to stop, it modifies its private member variable status to Stopping and Joins the thread (waits for the thread to exit) for a configured interval of time. If the thread exits in the specified amount of time, the server changes its status to Stopped and we're done. However, a thread may timeout while processing a request, so the server then aborts the thread by calling the thread's Abort method. I've written the thread loops in try...catch...finally blocks and in their catch clause, the threads check whether they die a violent death:), i.e. a ThreadAbortException was raised by the server. The thread then executes its cleanup code and exits. (And I thought that was easier to explain:)

That much about how the server classes talk to worker classes (main thread to worker threads). The rest of the objects in the assemblies communicate using references to each other or via delegates. Now comes the part that explains how the assemblies "talk" to each other, i.e. how the MLRS sends its machine's load to the MLMS, and how LBL gets the minimum machine load from MLMS.

Cross-machine assembly collaboration

I'll first "talk" how the MLRS reports the machine load to MLMS. To save some space in the article (some of your bandwidth, and some typing for me:), I'll refer to the LoadReportingServer class as LRS and to the LoadMonitoringServer class as LMS. Do not confuse them with the server applications, having an "M" prefix.

LMS starts two worker threads. One for collecting machine loads, and one for reporting the minimum load to interested clients. The former is named CollectorWorker, and the latter -- ReporterWorker. I've mentioned somewhere above that the ReporterWorker is not so interesting, so I'll talk only about the CollectorWorker. In the paragraphs below, I'll call it simply a collector. When the collector thread is started, it creates a UDP socket, binds it locally and adds it to a multicast group. That's the collector's initialization code. It then enters a thread loop, periodically polling the socket for arrived requests. When a request comes, the collector reads the incomming data, parses it, validates it and if it is a valid machine load report, it enters the load in the machine loads collection of the LMS class. That's pretty much everything you need to know about how MLMS accepts machine loads from MLRS.

LRS starts one thread for reporting the current machine load. The worker's name is ReportingWorker, and I'll refer to it as reporter. The initialization code of this thread is to start monitoring the performance counters, create a UDP socket and make it a member of the same multicast group that MLMS's collector object has joined. In its thread loop, the reporter waits a predefined amount of time, then gets the current machine load and sends it to the multicast endpoint. A network device, called a "switch" then dispatches the load to all machines that have joined the multicast group, i.e. all MLMS collectors will receive the load, including the MLMS that runs on the MLRS machine (if a MLMS has been installed and running there).

Here comes the most interesting part -- how LBL queries which is the machine with the least load (the fastest machine). Well, it is quite simple and requires only basic knowledge about .NET Remoting. If you don't understand Remoting, but you do understand DCOM, assume that .NET Remoting compared to DCOM is what C++ is compared to C. You'll be quite close and at the same time quite far from what Remoting really is, but you'll get the idea. (In fact, I've read several books on DCOM, and some of them refered to it as "COM Remoting Infrastructure"). When MLMS starts, it registers a class named ServerLoadBalancer with the Remoting runtime as a singleton (an object that is instantiated just once, and further requests for its creation end up getting a reference to the same, previously instantiated object). When a request to get the fastest machine comes (GetLeastMachineLoad method gets called) the singleton asks the MachineLoadsCollection object to return its least load, and then hands it to the client object that made the remoted call.

Below is a story you would like to hear about remoted objects that need to have parameter-less constructors. If you'd like to skip the story, click here, otherwise enjoy...

Now that all of you know that an object may be registered for remoting, probably not many of you know that you do not have easy control over the object's instantiation. Which means that you don't create an instance of the singleton object and register it with the Remoting runtime, but rather the Remoting runtime creates that object when it receives the first request for the object's creation. Now, all server-activated objects must have a parameter-less constructor, and the singleton is not an exception. But we want to pass our ServerLoadBalancer class a reference to the machine loads collection. I see only two ways to do that -- the first one is to register the object with the Remoting runtime, create an instance of it via Remoting and call an "internal" method Initialize, passing the machine loads collection to it. At first that sounded like a good idea and I did it just like that. Then I launched the client testing application first, and the server after it. Can you guess what happened? The client managed to create the singleton first, and it was not initialized -- boom!!! Not what we expected, right? So I thought a bit how to find a workaround. Luckily, it occured to me how to hack this problem. I decided to make a static member of the LoadMonitoringServer class, which would hold the machine loads collection. At the begining it would be a null reference, then when the server starts, I would set it to the server's machine loads collection. Now when our "parameter-less constructed" singleton object is instantiated for the first time by the Remoting runtime, it would get the machine loads via the LoadMonitoringServer::StaticMachineLoads member variable and the whole problem has disappeared. I had to only mark the static member variable as (private public) so it is visible only within the assembly. I know my appoach is a hack, and if you know a better pattern that solves my problem, I'll be happy to learn it.

Here's another interesting issue. How does the client (LBL) compile against the remoted ServerLoadBalancer class? Should it have a reference (#using "...dll") to LML or what? Well, there is a solution to this problem, and I haven't invented it, thought I'd like much:) I mentioned before, that the SharedLibrary has some shared types, used by LBL, LMS and LRS. No, it's not what you're thinking! I haven't put the ServerLoadBalancer class there even if I wanted to, because it requires the MachineLoadsCollection class, and the latter is located in LML. What I consider an elegant solution, (and what I did) is defining an interface in the SharedLibrary, which I implemented in the ServerLoadBalancer class in LML. LBL tries to create the ServerLoadBalancer via Remoting, but it does not explicitly try to create a ServerLoadBalancer instance, but an instance, implementing the ILoadBalancer interface. That's how it works. LBL creates/activates the singleton on the LMS side via Remoting and calls its GetLeastMachineLoad method to determine the fastest machine.

Some Implementation Details

Below is a list of helper classes that are cool, reusable or worth to mention. I'll try to explain their cool sides, but you should definitely peek at the source code to see them:)

Configurator

I like the .NET configuration classes very much, and I hate reinventing the wheel, but this class is a specific configuration class for this solution and is cooler than .NET configuration class in at least one point. What makes the class cooler, is that it can notify certain objects when the configuration changes, i.e. the underlying configuration file has been modified with some text editor. So I've built my own configurator class, which uses the FileSystemWatcher class to sniff for writes in the configuration file, and when the file changes, the configurator object re-loads the file, and raises an event to all subscribers that need to know about the change. These subscribers are only two, and they are the Load Monitoring and Reporting servers. When they receive the event, they restart themselves, so they can reflect the latest changes immediately.

CounterInfo

I used to call this class a "struct" one. I wasn't fair to it :), as it is one of the most important classes in the solution. It wraps a PerformanceCounter object in it, retrieves some sample values, and stores them in a cyclic queue. What is a cyclic queue? Well, I guess there's no such animal :) but as I have "invented" it, let's explain you what it is. It is a simple queue with finite number of elements allowed to be added. When the queue "overflows", it pops up the first element, and pushes the new element into the queue. Here's an example of storing the numbers from 1 to 7 in a 5-element cyclic queue:

Pass    Queue            Running Total (Sum)
----    -----            -------------------
        []               =               0
1       [1]              = 0 + 1      =  1
2       [2 1]            = 1 + 2      =  3
3       [3 2 1]          = 3 + 3      =  6
4       [4 3 2 1]        = 6 + 4      = 10
5       [5 4 3 2 1]      = 10 + 5     = 15
6       [6 5 4 3 2]      = 15 - 1 + 6 = 20
7       [7 6 5 4 3]      = 20 - 2 + 7 = 25

Why do I need the cyclic queue? To have a limited state of each monitored performance counter, of course. If pass 5 was the state of the counter 3 seconds ago, the its average was 15/5 = 3, and if now we are at pass 7, the counter's average is 20/5 = 4. Sounds reallistic, doesn't it? So we use the cyclic queue to store the transitory counter samples and know its average for the past N samples which were measured for the past M seconds. You see how easy is calculated the running sum. Now the only thing a counter should do to tell its average is ro divide the running sum to the number of sample values it had collected. You know that the machine load is the sum of the weigted averages of all monitored performance counters for the given machine. But you might ask, what happens in the following situation:

We have two machines: A and B. Both are measuring just one counter, their CPU utilization. A Machine Load Monitoring Server is running on a third machine C, and a load balancing client is on fourth machine D. A and B's Load Reporting Servers are just started. Their CounterInfo classes have recorded respectively 50 and 100 (because the administrator on machine B has just launched IE:). A and B are configured to report each second, but they should report the weighted averages of 5 sample values. 1 second elapses, but both A and B has collected only 1 sample value. Now D asks C which is the least loaded machine. Which one should be reported? A or B? The answer is simple: None. No machine is allowed to report its load unless it has collected the necessary number of sample values for all performance counters. That means, that unless A and B has filled up their cyclic queues for the very first time, they block and don't return their weighted average to the caller (the LRS's reporter worker).

MachineLoadsCollection

This class is probably more tricky than interesting. Generally, it is used to store the loads, that one or more LRS report to LMS. That's the class' dumb side. One cool side of the class is that it stores the loads in 3 different data structures to simulate one, that is missing in the .NET BCL - a priority queue that can store multiple elements with the same key, or in STL terms, something like

std::priority_queue <std::vector <X *>, ... >

I know that C++ die-hards know it by heart, but for the rest of the audience: std::priority_queue is a template container adaptor class that provides a restriction of functionality limiting access to the top element of some underlying container type, which is always the largest or of the highest priority. New elements can be added to the priority_queue and the top element of the priority_queue can be inspected or removed. I took the definition from MSDN, but I'd like to correct it a little bit: you should read "which is always the largest or of the highest priority" as "which is always what the less functor returns as largest or of the highest priority". At the beginning, I thought to use the priority_queue template class, and put there "gcroot"-ed references, but then I thought that it would be more confusing and difficult than helping me, and you, the reader. Do you know what the "gcroot" template does? No? Nevermind then:) In .NET BCL classes, we have something which is very similiar to a priority queue -- that's the SortedList class in System::Collections. Because it can store any Object-based instances, we could put ArrayList references in it to simulate a priority queue that stores multiple elements with the same key. There's also a Hashtable to help us solve certain problems, but we'll get to it in a minute. Meanwhile, keep reading to understand why I need these data structures in the first place.

Machine loads do not enter the machine loads collection by name, i.e. they are added to the loads collection with the key, being the machine's load. That's why before each machine reports its load, it converts the latter to unsigned long and then transmits it over the wire to LMS. It helps restricting the number of stored loads, e.g. if machine A has a load of 20.1, and machine B has a load of 20.2, then the collection considers the loads as equal. When LMS "gets" the load, it adds it in the SortedList, i.e. if we have three machines -- "A", "B", and "C" with loads 40, 20 and 30, then the SortedList looks like:

[C:30][A:40]

If anyone asks for the fastest machine, we always return the 1st positional element in the sorted list, (because it is sorted in ascending order).

Well, I'd like it to be so simple, but it isn't. What happens when a 4th machine, "D" reports a load 20? You should have guessed by now why I need to store an ArrayList for each load, so here is it in action -- it stored the loads of machines B and D:

[D:20]
[C:30][A:40]

Now, if anyone asks for the fastest machine, we will return the first element of the ArrayList, that is stored in the first element of the SortedList, right? It is machine "B".

But then what happens when machine "B" reports another load, equal to 40? Shall we leave the first reported load? Of course, not! Otherwise, we will return "B" as the fastest machine, where "D" would be the one with the least load. So we should remove machine "B"'s older load from the first ArrayList and insert its new load, wherever is appropriate. Here's the data structure then:

            
[D:20][C:30][A:40]

Now how did you find machine "B"'s older load in order to remove it? Eh? I guess with your eyes. Here's where we need that Hashtable I mentioned above. It is a mapping between a machine's older load and the list it resides in currently. So when we add a machine load, we first check whether the machine has reported a load before, and if it did, we find the ArrayList, where the old load was placed, remove it from the list, and add the new load to a new list, right? Wrong. We have one more thing to do, but first let me show you the problem, and you'll guess what else I've done to make the collection work as expected.

Imagine that machine "D" reports a new load -- 45. Now you'll say that the data now looks like the one below:

      
[C:30][A:40][D:45]

You wish it looks like this! But that's because I made a mistake when I was trying to visualize the first loads. Actually the previous loads collection looked like this:

^
|
M
A
C       .   .   .   .
H       .   .   .   .
I       .   .   .   .
N       .   .   .   .
E       .   .   B   .
S       D   C   A   .

LOAD    20  30  40  .   .   .   -->

So you now will agree, that the collection actually looks like this:

^
|
M
A
C       .   .   .   .
H       .   .   .   .
I       .   .   .   .
N       .   .   .   .
E       .   .   B   .
S       .   C   A   D

LOAD    20  30  40  45  .   .   -->

Yes, the first list is empty, and when a request to find the least loaded machine comes and you try to pop up the first element of the ArrayList for load 20 (which is the least load), you'll get IndexOutOfRangeException, as I got it a couple of times before I debugged to understand what was happenning. So when we remove an old load from an ArrayList, we should check whether it has orphaned (is now empty), and if this is the case, we should remove the ArrayList from the SortedList as well.

Here's the code for the Add method:

void MachineLoadsCollection::Add (MachineLoad __gc* machineLoad)
{
    DEBUG_ASSERT (0 != machineLoad);
    if (0 == machineLoad)
        return;

    String __gc* name = machineLoad->Name;
    double load = machineLoad->Load;
    Object __gc* boxedLoad = __box (load);

    rwLock->AcquireWriterLock (Timeout::Infinite);

    // a list of all machines that have reported this particular

    // load value

    //

    ArrayList __gc* loadList = 0;

    // check whether any machine has reported such a load

    //

    if (!loads->ContainsKey (boxedLoad))
    {
        // no, this is the first load with this value - create new list

        // and add the list to the main loads (sorted) list

        //

        loadList = new ArrayList ();
        loads->Add (boxedLoad, loadList);
    }
    else
    {
        // yes, one or more machines reported the same load already

        //

        loadList = static_cast<ArrayList __gc*> (loads->get_Item (boxedLoad));
    }

    // check if this machine has already reported a load previously

    //

    if (!mappings->ContainsKey (name))
    {
        // no, the machine is reporting for the first time

        // insert the element and add the machine to the mappings

        //

        loadList->Add (machineLoad);
        mappings->Add (name, new LoadMapping (machineLoad, loadList));
    }
    else
    {
        // yes, the machine has reported its load before

        // we should remove the old load; get its mapping

        //

        LoadMapping __gc* mappedLoad =
            static_cast<LoadMapping __gc*> (mappings->get_Item (name));

        // get the old load, and the list we should remove it from

        //

        MachineLoad __gc* oldLoad = mappedLoad->Load;
        ArrayList __gc* oldList = mappedLoad->LoadList;

        // remove the old mapping

        //

        mappings->Remove (name);

        // remove the old load from the old list

        //

        int index = oldList->IndexOf (oldLoad);
        oldList->RemoveAt (index);

        // insert the new load into the new list

        //

        loadList->Add (machineLoad);

        // update the mappings

        //

        mappings->Add (name, new LoadMapping (machineLoad, loadList));

        // finally, check if the old load list is totally empty

        // and if so, remove it from the main (sorted) list

        //

        if (oldList->Count == 0)
            loads->Remove (__box (oldLoad->Load));
    }

    rwLock->ReleaseWriterLock ();
}

Now, for the curious, here's the get_MinimumLoad property's code:

MachineLoad __gc* MachineLoadsCollection::get_MinimumLoad ()
{
    MachineLoad __gc* load = 0;

    rwLock->AcquireReaderLock (Timeout::Infinite);

    // if the collection is empty, no machine has reported

    // its machineLoad, so we return "null"

    //

    if (loads->Count > 0)
    {
        // the 1st element should contain one of the least

        // loaded machines -- they all have the same load

        // in this list

        //

        ArrayList __gc* minLoadedMachines =
            static_cast<ArrayList __gc*> (loads->GetByIndex (0));
        load = static_cast<MachineLoad __gc*> (minLoadedMachines->get_Item (0));
    }

    rwLock->ReleaseReaderLock ();

    return (load);
}

Well, that's prety much about how the MachineLoadsCollection class works in order to store the machine loads, and return the least loaded machine. Now we will see what else is cool about this class. I called it the Grim Reaper, and that's what it is -- a method, named GrimReaper (GR), that runs asynchronously (using a Timer class) and kills dead machines!:) Seriously, GR knows the interval at which each machine, once reported a load, should report it again. If a machine fails to report its load in a timely manner it is removed from the MachineLoadsCollection container. In this way, we guarantee that a machine, that is now dead (or is disconnected from the network) will not be returned as the fastest machine, at least not before it reports again (it is brought back to the load balancing then). However, in only about 30 lines of code, I managed to made two mistakes in the GR code. The first one was very lame -- I was trying to remove an element from a hash table while I was iterating over its elements, but the second was a real bich! However, I found it quite quickly, because I love console applications:) I was outputting a start (*) when GR was executing, and a caret (^) when it was killing a machine. I then observed that even if the (only) machine was reporting regularly its load, at some time, GR was killing it! I was staring at the console at least for 3 minutes. The GR code was simple, and I thought that there's no chance to make a mistake there. I was wrong. It occured to me that I wasn't considering the fact, that the GR code takes some time to execute. It was running fast enough, but it was taking some interval of time. Well, during that time, GR was locking the machine loads collection. And while the collection was locked, the collector worker was blocked, waiting for the collection to be unlocked, so it can enter the newly received load ther. So when the collection was finally unlocked at the end of the GR code, the collector entered the machine's load. You can guess what happens when the GR is configured to run in shorter intervals and the machines report in longer intervals. GR locks, and locks and locks, while the collector blocks, and blocks and blocks, until a machine is delayed by the GR itself. However, because GR is oblivious to the outer world, it thinks that the machine is a dead one, so it removes the machine from the load balancing, until the next time it reports a brand new load. My solution for this issue? I have it in my head, but I'll implement it in the next version of the article, because I really ran out of time. (I couldn't post the article for the November's contest, because I couldn't finish this text in time. It looks that writing text in plain English is more difficult than writing Managed C++ code, and I don't want to miss December's contest too:)

If anyone is interested, here is Grim Reaper's code:

void MachineLoadsCollection::GrimReaper (Object __gc* state)
{
    // get the state we need to continue

    //

    MachineLoadsCollection __gc* mlc = static_cast<MachineLoadsCollection __gc*> (state);
    // temporarily suspend the timer

    //

    mlc->grimReaper->Change (Timeout::Infinite, Timeout::Infinite);
    // check if we are forced to stop

    //

    if (!mlc->keepGrimReaperAlive)
        return;
    // get the rest of the fields to do our work

    //

    ReaderWriterLock __gc* rwLock = mlc->rwLock;
    SortedList __gc* loads = mlc->loads;
    Hashtable __gc* mappings = mlc->mappings;
    int reportTimeout = mlc->reportTimeout;

    rwLock->AcquireWriterLock (Timeout::Infinite);

    // Bring out the dead :)

    //


    // enumerating via an IDictionaryEnumerator, we can't delete

    // elements from the hashtable mappings; so we create a temporary

    // list of machines for deletion, and delete them after we have

    // finished with the enumeration

    //

    StringCollection __gc* deadMachines = new StringCollection ();

    // walk the mappings to get all machines

    //

    DateTime dtNow = DateTime::Now;
    IDictionaryEnumerator __gc* dic = mappings->GetEnumerator ();
    while (dic->MoveNext ())
    {
        LoadMapping __gc* map = static_cast<LoadMapping __gc*> (dic->Value);
        // check whether the dead timeout has expired for this machine

        //

        TimeSpan tsDifference = dtNow.Subtract (map->LastReport);
        double difference = tsDifference.TotalMilliseconds;
        if (difference > (double) reportTimeout)
        {
            // remove the machine from the data structures; it is

            // now considered dead and does not participate anymore

            // in the load balancing, unless it reports its load

            // at some later time

            //

            String __gc* name = map->Load->Name;

            // get the old load, and the list we should remove it from

            //

            MachineLoad __gc* oldLoad = map->Load;
            ArrayList __gc* oldList = map->LoadList;

            // remove the old mapping (only add it to the deletion list)

            //

            deadMachines->Add (name);

            // remove the old load from the old list

            //

            int index = oldList->IndexOf (oldLoad);
            oldList->RemoveAt (index);

            // finally, check if the old load list is totally empty

            // and if so, remove it from the main list

            //

            if (oldList->Count == 0)
                loads->Remove (__box (oldLoad->Load));
        }
    }

    // actually remove the dead machines from the mappings

    //

    for (int i=0; i<deadMachines->Count; i++)
        mappings->Remove (deadMachines->get_Item (i));

    // cleanup

    //

    deadMachines->Clear ();

    rwLock->ReleaseWriterLock ();

    // resume the timer

    //

    mlc->grimReaper->Change (reportTimeout, reportTimeout);
}

Load Balancing in Action - Balancing a Web Farm

I've built a super simple .NET Web application (in C#), that uses LBL to perform a load balancing in a web farm. Though the application is very little, it is interesting and deserves some space in this article, so here we go. First, I've written a class that wraps the load balancing class ClientLoadBalancer from LBL, named it Helper, and implemented it as a singleton so the Global class of the web application and the web page classes could see one instance of it. Then I used it in the Session_OnStart method of the Global class to redirect every new session's first HTTP request to the most available machine. Furthermore, in the sample web page, I've used it again to dynamically build URLs for further processing, replacing the local host again with the fastest machine. Now one may argue (and he might be right) that a user can spend a lot of time reading that page, so when he eventually clicks on the "faster" link, the previously faster machine could not be the fastest one at that time. Just don't forget that hitting another machine's web application will cause its Session_OnStart trigger again, so anyway, the user will be redirected to the fastest machine. Now, if you don't get what am I talking about, that's because I haven't shown any code yet. So here it its:

protected void Session_Start (object sender, EventArgs e)
{
    // get the fastest machine from the load balancer

    //

    string fastestMachineName = Helper.Instance.GetFastestMachineName ();

    // we should check whether the fastest machine is not the machine,

    // this web application is running on, as then there'll be no sence

    // to redirect the request

    //

    string thisMachineName = Environment.MachineName;
    if (String.Compare (thisMachineName, fastestMachineName, false) != 0)
    {
        // it is another machine and we should redirect the request

        //

        string fasterUrl = Helper.Instance.ReplaceHostInUrl (
            Request.Url.ToString (),
            fastestMachineName);
        Response.Redirect (fasterUrl);
    }
}

And here's the code in the sample web page:

private void OnPageLoad (object sender, EventArgs e)
{
    // get the fastest machine and generate the new links with it

    //

    string fastestMachineName = Helper.Instance.GetFastestMachineName ();
    link.Text = String.Format (
        "Next request will be processed by machine '{0}'",
        fastestMachineName);
    // navigate to the same URL, but the host being the fastest machine

    //

    link.NavigateUrl = Helper.Instance.ReplaceHostInUrl (
        Request.Url.ToString (),
        fastestMachineName);
}

If you think that I hardcoded the settings in the Helper class, you are wrong. First, I hate hardcoded or magic values in my code (though you may see some in an article like this). Second, I was testing the solution on my coleagues' computers, so writing several lines of code in advance, helped me to avoid the inevitable otherwise re-compilations. I just deployed the web application there. Here's the trivial C# code of the Helper class (note that I have hardcoded the keys in Web.config file ;-)

class Helper
{
    private Helper ()
    {
        // assume failure(s)

        //

        loadBalancer = null;
        try
        {
            NameValueCollection settings = ConfigurationSettings.AppSettings;

            // assume that MLMS is running on our machine and the web app

            // is configured to create its remoted object using the defaults;

            // if the user has configured another machine in the Web.config

            // running MLMS, try to get its settings and create the remoted

            // object on it

            //

            string machine = Environment.MachineName;
            int port = 14000;
            RemotingProtocol protocol = RemotingProtocol.TCP;

            string machineName = settings ["LoadBalancingMachine"];
            if (machineName != null)
                machine = machineName;

            string machinePort = settings ["LoadBalancingPort"];
            if (machinePort != null)
            {
                try
                {
                    port = int.Parse (machinePort);
                }
                catch (FormatException)
                {
                }
            }

            string machineProto = settings ["LoadBalancingProtocol"];
            if (machineProto != null)
            {
                try
                {
                    protocol = (RemotingProtocol) Enum.Parse (
                        typeof (RemotingProtocol),
                        machineProto,
                        true);
                }
                catch (ArgumentException)
                {
                }
            }

            // create a proxy to the remoted object

            //

            loadBalancer = new ClientLoadBalancer (
                machine,
                protocol,
                port);
        }
        catch (Exception e)
        {
            if (e is OutOfMemoryException || e is ExecutionEngineException)
                throw;
        }
    }

    public string GetFastestMachineName ()
    {
        // assume that the load balancing could not be created or it will fail

        //

        string fastestMachineName = Environment.MachineName;
        if (loadBalancer != null)
        {
            MachineLoad load = loadBalancer.GetLeastMachineLoad ();
            if (load != null)
                fastestMachineName = load.Name;
        }
        return (fastestMachineName);
    }

    public string ReplaceHostInUrl (string url, string newHost)
    {
        Uri uri = new Uri (url);
        bool hasUserInfo = uri.UserInfo.Length > 0;
        string credentials = hasUserInfo ? uri.UserInfo : "";
        string newUrl = String.Format (
            "{0}{1}{2}{3}:{4}{5}",
            uri.Scheme,
            Uri.SchemeDelimiter,
            credentials,
            newHost,
            uri.Port,
            uri.PathAndQuery);
        return (newUrl);
    }

    public static Helper Instance
    {
        get { return (instance); }
    }

    private ClientLoadBalancer loadBalancer;
    private static Helper instance = new Helper ();
} // Helper

If you wonder how the servers look like when running, and what a great look and feel I've designed for the web application, here's a screenshot to disappoint you:)

Building, Configuring and Deploying the Solution

There's a little trick you need to do, in order to load the solution file. Open your IIS administration console (Start/Run... type inetmgr) and create a new virtual directory LoadBalancingWebTest. when you're asked about the folder, choose X:\Path\To\SolutionFolder\LoadBalancingWebTest. You can now open the solution file (SoftwareLoadBalancing.sln) with no problems. Load it in Visual Studio .NET, build the SharedLibrary project first, as the others depend on it, then build LML and LRS, and then the whole solution. Note that the setup projects won't build automatically so you should select and build them manually.

Note: When you compile the solution, you will get 15 warnings. All of them state: warning C4935: assembly access specifier modified from 'xxx', where xxx could be private or public. I don't know how to make the compiler stop complaining about this. There are no other warnings at level 4. Sorry if these embarass you.

That's it if you have VS.NET. If you don't, you can compile only the web application, as it is written in C# and can be compiled with the free C# compiler, coming with the .NET framework. Otherwise, buy a copy of VS.NET, and become a CodeProject (and Microsoft) supporter :) BTW, I realized right now, that I should write my next articles in C#, so the "poor" guys like me can have some fun too. I'm sorry guys! I promise to use C# in most of the next articles I attempt to write.

Configuration

If you look at the Common.h header, located in the SharedFiles folder in the solution, you'll notice that I've copied and pasted the meaning of all configuration keys from that file. However, because I know you won't look at it until you've liked the article (and it's high time do so, as it is comming to its end:), here's the explanation of the XML configuration file, and various macros in Common.h.

What's so common in Common.h?

This header file is used by almost all projects in the solution. It has several (helpful) macros I'm about to discuss, so if you're in the mood to read about them, go ahead. Otherwise, click here to read only about the XML configuration file.

First, I'm going to discuss the .NET member access modifiers. There are 5 of them, though you may use only four of them, unless you are writing IL code. Existing languages refer to them in a different way, so I'll give you a comparison table of their names and some explanations.

.NET term	C# keyword(s)	MC++ keyword(s)	Explanation
private	private	private private	the member is visible only in the class, it is defined in, and is not visible from other assemblies; note the double use of the `private` keyword in MC++ -- the first one (in this table) specifies whether the member is visible from other assemblies, and the other specifies whether the member is visible from other classes within the same assembly.
public	public	public public	visible from all assemblies and classes
family	protected	public protected	visible from all assemblies, but can be used only from derived classes
family and assembly	internal	private public	visible from all classes withing the assembly, but not visible to external assemblies

Because I like most the C# keywords, I #defined and used thoughout the code four macros to avoid typing the double keywords in MC++:

#define PUBLIC      public  public
#define PRIVATE     private private
#define PROTECTED   public  protected
#define INTERNAL    private public

Here comes the more interesting "stuff". You have three options for communication between a load reporting and monitoring servers: via UDP + multicasting, UDP-only, or TCP. BTW, if I was writing the article in C#, you wouldn't have them. Really! C# is so lame in preprocessing, and the compiler writers were so wrong that they did not include some real preprocessing capabilities in the compiler, that I have no words! Nevertheless, I wrote the article in MC++, so I have the cool #define directives I needed so badly, when I started to write the communication code of the classes. There are two macros you can play with, to make the solution use one communication protocol or another, and/or disable/enable multicasting. Here are their definitions:

#define USING_UDP           1
#define USING_MULTICASTS    1

Now, a C# guru:) will argue that I could still write the protocol-independent code with several pairs of #ifdef and #endif directives. To tell you the truth, I'm not a fan of this coding style. I'd rather define a generic macro in such an #if block, and use it everywhere I need it. So that's what I did. I've written macros that create TCP or UDP sockets, connect to remote endpoints, and send and receive data via UDP and TCP. Then I wrote several generic macros that follow the pattern below:

#if defined(USING_UDP)
#   define SOCKET_CREATE(sock) SOCKET_CREATE_UDP(sock)
#else
#   define SOCKET_CREATE(sock) SOCKET_CREATE_TCP(sock)
#endif

You get the idea, right? No #ifdefs inside the real code. I just write SOCKET_CREATE (socket); and the preprocessor generates the code to create the appropriate socket. Here's another good macro, I use for exception handling, but before that I'll give you some rules (you probably know) about .NET exception handling:

Catch only the exceptions you can handle, and no more. This means that if you expect the method you're calling to throw ArgumentNullException and/or ArgumentOutOfRangeException, you should write two catch clauses and catch only these exceptions.
Another rule is to never "swallow" an exception you caught, but cannot handle. You must re-throw it, so the caller of your method knows why it failed.
This one relates to the 2nd rule: there are 2 exceptions you can do nothing about but report them to the user and die: these are the OutOfMemoryException, and ExecutionEngineException. I don't know which one is worse -- probably the latter, though if you're out of memory, there's almost nothing you can do about it.

Because I'm not writing production code here, I allowed myself to catch (in most of the source code) all possible exceptions, when I don't need to handle them except to know that something went bad. So I catch the base class Exception. This violates all rules, I've written above, but I wrote some code to fit into the second and third one -- if I catch an OutOfMemoryException or ExecutionEngineException, I re-throw it immediately. Here's the macro I call, after I catch the generic Exception class:

#define TRACE_EXCEPTION_AND_RETHROW_IF_NEEDED(e)        \
    System::Type __gc* exType = e->GetType ();          \
    if (exType == __typeof (OutOfMemoryException) ||    \
        exType == __typeof (ExecutionEngineException))  \
        throw;                                          \
    Console::WriteLine (                                \
        S"\n{0}\n{1} ({2}/{3}): {4}\n{0}",              \
        new String (L'-', 79),                          \
        new String ((char *) __FUNCTION__),             \
        new String ((char *) __FILE__),                 \
        __box (__LINE__),                               \
        e->Message);

And finally, a word about assertions. C has assert macro, VB had Debug.Assert method, .NET has a static method Assert in the Debug class too. One of the overloads of the method takes a boolean expression, and a string, describing the test. C's assert is smarter. It just needs an expression, and it builds the string, containing the expression automatically by stringizing the expression. Now, I really hate the fact, that C# lacks some real preprocessing features. However, MC++ (thanks God!) was not slaughtered by the compiler writers (long live legacy code support), so here's my .NET version of the C's assert macro:

#define DEBUG_ASSERT(x) Debug::Assert (x, S#x)

If I was writing in C# the code for this article, I should have typed

Debug.Assert (null != objRef, "null != objRef");

everywhere I needed to assert. In MC++, I just write

DEBUG_ASSERT (0 != objRef);

and it is automatically expanded into

Debug::Assert (0 != objRef, S"0 != objRef");

Not to speak about the __LINE__, __FILE__ and __FUNCTION__ macros I could use in the DEBUG_ASSERT macro! Now let's everybody scream loudly with me: "C# sucks!":)

Tweaking the configuration file

I know you're all smart guys (otherwise what the heck are you doing on CodeProject?:), and smart guys don't need lengthy explanations, all they need is to take a look at an example. So here it is -- the XML configuration file, used by both the Machine Load Monitoring and Reporting servers. The explanation of all elements is given below the file:

<?xml version="1.0" encoding="utf-8"?>

<configuration>

    <LoadReportingServer>

        <IpAddress>127.0.0.1</IpAddress>
        <Port>12000</Port>
        <ReportingInterval>2000</ReportingInterval>

    </LoadReportingServer>

    <LoadMonitoringServer>

        <IpAddress>127.0.0.1</IpAddress>
        <CollectorPort>12000</CollectorPort>
        <CollectorBacklog>40</CollectorBacklog>
        <ReporterPort>13000</ReporterPort>
        <ReporterBacklog>40</ReporterBacklog>
        <MachineReportTimeout>4000</MachineReportTimeout>
        <RemotingProtocol>tcp</RemotingProtocol>
        <RemotingChannelPort>14000</RemotingChannelPort>

        <PerformanceCounters>
            <counter alias="cpu"
                    category="Processor"
                    name="% Processor Time"
                    instance="_Total"
                    load-weight="0.3"
                    interval="500"
                    maximum-measures="5" />
            <-- ... -->
        </PerformanceCounters>

    </LoadMonitoringServer>

</configuration>

Even though you're smart, I know that some of you have some questions, I am about to answer. First, I'm going to explain the purpose of all elements and their attributes, and I'll cover some wierd settings, so read on... (to save some space, I'll refer to the element LoadReportingServer as LRS, and I'll write LMS instead of LoadMonitoringServer).

Element/Attribute	Meaning/Usage
`LRS/IpAddress`	When you're using UDP + multicasting (the default), the IpAddress is the IP address of the multicast group, MLMS and MLRS join, in order to communicate. If you're not using multicasting, but are still using UDP or TCP, this element specifies the IP address (or the host name) of the MLMS server, MLRS report to. Note that because you don't use multicasting, there's no way for the MLRS servers to "multicast" their machine loads to all MLMS servers. There's no doubt that this element's text should be equal to `LMS/IpAddress` in any case.
`LRS/Port`	Using UDP + multicasting, UDP only or TCP, that's the port to which MLRS servers send, and at which MLMS servers receive machine loads.
`LRS/ReportingInterval`	MLRS servers report machine loads to MLMS ones. The `ReportingInterval` specifies the interval (in milliseconds) at which, a MLRS server should report its load to one or more MLMS servers. If you have paid attention in the Some Implementation Details section, I said, that even if the interval has ellapsed, a machine may not report its load, because it has not gathered the raw data it needs to calculate its load. See the `counter` element's `interval` attribute for more information.
`LMS/IpAddress`	In the UDP + multicasting scenario, that's the multicast group's IP address, as in the `LRS/IpAddress` element. When you're using UDP or TCP only, this address is ignored.
`LMS/CollectorPort`	The port, on which MLMS servers accept TCP connections, or receive data from, when using UDP.
`LMS/CollectorBacklog`	This element specifies the maximum number of sockets, a MLMS server will use, when configured for TCP communication.
`LMS/ReporterPort`	If haven't been reading the article carefully, you're probably wondering what does this element specify. Well, in my first design, I was not thinking that Remoting will serve me so well to build the Load Balancing Library (LBL). I wrote a mini TCP server, which was accepting TCP requests and returning the least loaded machine. Because LBL had to connect to an MLMS server and ask which is the fastest machine, you can imagine that I've written several overloads of the `GetLeastLoadedMachine` method, accepting timeouts and default machines, if there're no available machines at all. At the moment I finished the LBL client, I decided that the design was too lame, so I rewritten the LBL library from scratch (yeah, shit happens:), using Remoting. Now, I regret to tell you that I've overwritten the original library's source files. However, I left the TCP server completely working -- it lives as the `ReporterWorker` class, and persists in the `ReporterWorker.h/.cpp` files in the `LoadMonitoringLibrary` project. If you want to write an alternative LBL library, be my guest -- just write some code to connect to the LMS reporter worker and it will report the fastest machine's load immediatelly. Note that the worker is accepting TCP sockets, so you should always connect to it using TCP.
`LMS/ReporterBacklog`	It's not difficult to figure out that this the backlog of the TCP server I was talking about above.
`LMS/MachineReportTimeout`	Now that's an interesting setting. The `MachineReportTimeout` is the biggest interval (in milliseconds) at which a machine should report its successive load in order to stay in the load balancing. This means, that if machine has reported 5 seconds ago, and the timeout interval is set to 3 seconds, the machine is being removed from the load balancing. If it later reports, it is back in business. I think this is a bit lame, because one would like to configure each machine to report in different intervals, but I don't have time (now) to fix this, so you should learn to live with this "feature". One way to work around my "lameness" is to give this setting a great enough value. Be warned though, that if a machine is down, you won't be able to remove it from the load balancing until this interval ellapses -- so don't give it too big values.
`LMS/RemotingProtocol`	Originally, I thought to use Remoting only over TCP. I thought that HTTP would be too slow (it is one level above TCP in the OSI stack). Then, after I recalled how complex Remoting was, I realized that the HTTP protocol is blazingly faster than the Remoting itself. So I decided to give you an option which protocol to use. Currently, the solution supports only the TCP and HTTP protocols, but you can easily extend it to use any protocol you wish. This setting accepts a string, which is either "tcp" or "http" (without the quotes, of course).
`LMS/RemotingChannelPort`	That's the port, MLMS uses to register and activate the load balancing object with the Remoting runtime.
`LMS/PerformanceCounters`	This element contains a collection of performance counters, used to calculate the machine's load. Below are the given the attributes of the `counter` XML element, used to describe a `CounterInfo` object, I written about somewhere above.
`counter/alias`	Though currently not used, this attribute specifies the alias for the otherwise too long performance counter path. See the TODO(s) section for the reason I've put this attribute.
`counter/category`	The general category of the counter, e.g. `Processor`, `Memory`, etc.
`counter/name`	The specific counter in the category, e.g. `% Processor Time`, `Page reads/sec`, etc.
`counter/instance`	If there are two or more instances of the counter, the `instance` attribute specifies the exact instance of the counter. For example, if you have two CPUs, then the first CPU's instance is "0", the second one is "1", and both are named "_Total"
`counter/load-weight`	The weight that balance the counter values. E.g. you can give more weight to the values of `Processor\% Processor Time\_Total` then to `Processor\% User Time\_Total` ones. You get the idea.
`counter/interval`	The interval (in milliseconds) at which a performance counter is asked to return its next sample value.
`counter/maximum-measures`	The size of the cyclic queue (I talked about above), that stores the transient state of a performance counter. In other words, the element specifies how many counter values should be collected in order to get a decent weighted average (WA). The counter does not report its WA until it collects at least `maximum-measures` of sample values. If the `CounterInfo` class is asked to return its WA before it collects the necessary number of sample values, it blocks and waits until it has collected them.

... and the other configuration file:)

Which is "the other configuration file"? Well, it is the Web.config file in the sample load-balanced web application. It has 3 vital keys defined in the appSettings section. They are the machine on which MLMS runs, and the Remoting port and protocol where the machine has registered its remoted object.

<appSettings>
    <add key="LoadBalancingMachine" value="..." />
    <add key="LoadBalancingPort" value="..." />
    <add key="LoadBalancingProtocol" value="..." />
</appSettings>

You can figure out what the keys mean, as you have seen the code in the Helper class of the web application. The last key accepts a string, which can be either "TCP" or "HTTP" and nothing else.

Deployment

There are 7 ways to deploy the solution onto a single machine. That's right -- seven. To shorten the article and lengthen my life, I'll refer to the Machine Load Monitoring Server as LMS, to the Machine Load Reporting Server as LRS, and the Load Balancing Library as LBL. Here're the variations:

LMS, LRS, LBL
LMS, LRS
LMS, LBL
LMS
LRS, LBL
LRS
LBL

It is you, who decide what to install and where. But is me, who developed the setup projects, so you have to pay some attention to what I'm about to tell you. There are 4 setups. The first one is for sample load-balanced web application. The second one is for the server part of the solution, i.e. the Machine Load Monitoring and Reporting Servers. They're bundled in one single setup, but it's your call which one you run, once you've installed them. The 3rd setup contains only the load balancing library and the 4th one contains the entire source code of the solution, including the source for the setups.

Here is a simple scenario to test whether the code works: (you should have setup a multicast group on your LAN or ask an admin to do that). We'll use 2 machines -- A and B. On machine A, build the SharedLibrary project first, then build the whole solution (you may skip the setup projects). Deploy the web application. Modify the XML configuration file for MLMS and MRLS. Run the servers. Deploy the web application, modify its Web.config file and launch it. Click on the web page's link. It should work, and the load balancing should redirect you to the same machine (A). Now deploy only MLRS, and the web application to machine B. Modify the configuration files, but this time, in Web.config, set the LoadBalancingMachine key to "A". You've just explained to B's LBL to use machine A's remoted load balancing object. Run MLRS on machine B. It should start to report B's load to A's MLMS. Now do some CPU-intensive operation (if < WinXP, right-click the Desktop and drag your mouse behind the Task Bar; this should give you about 100% CPU utilization) on machine A. Its web application should redirect you to the web app on machine B. Now stop the B's MLRS server. Launch B's web application. It should redirect you to A's one. I guess that's it. Enjoy playing around with all possible deployment scenarios:)

Some thoughts about MC++ and C#

Managed C++ to C# translation

There's nothing easier than converting pure managed C++ code to C#. Just press Ctrl-H in your mind and replace the following sequences (this will work only for my source files, as other developers may not use the whitespace in the same way I use it).

MC++                C#
----                ----
::                  .
->                  .
__gc*
__gc
__sealed __value
using namespace     using
: public            :
S"                  "
__box (x)           x

While the replacements above will translate 85% of the code, there're several things you should do manually:

You have to translate all preprocessor directives, e.g. remove the header guards (#if !defined (...) ... #define ... #endif), and manually replace the macros with the code, they are supposed to generate.

You have to convert all C++ casts to C# ones, i.e.

static_cast<SomeType __gc*> (expression) to

((SomeType) expression) or (expression as SomeType)

You have to put the appropriate access modifier keyword to all members in a class, i.e. you should change:

PUBLIC:
    ... Method1 (...) {...}
    ... Variable1;
PRIVATE:
    ... Method3 (...) {...}

public  ... Method1 (...) {...}
public  ... Variable1;
private ... Method3 (...) {...}

You have to combine the header and the implementation files into a single C# source file.

C#'s readonly fields vs MC++ non-static const members

It is really frustrating that MC++ does not have an equivalent of C#'s readonly fields (not properties). In C# one could write the following class:

public class PerfCounter
{
    public PerfCounter (String fullPath, int sampleInterval)
    {
        // validate parameters

        //

        Debug.Assert (null != fullPath);
        if (null == fullPath)
            throw (new ArgumentNullException ("fullPath"));
        Debug.Assert (sampleInterval > 0);
        if (sampleInterval <= 0)
            throw (new ArgumentOutOfRangeException ("sampleInterval"));

        // assign the values to the readonly fields

        //

        FullPath = fullPath;
        SampleInterval = sampleInterval;
    }

    // these are marked public, and make a great replacement of

    // read-only (getter) properties

    //

    public readonly String  FullPath;
    public readonly int     SampleInterval;
}

You see that the C# programmer doesn't have to implement read-only properties, because the readonly fields are good enough. In Managed C++, you can simulate the readonly fields, by writing the following class:

public __gc class PerfCounter
{
public:
    PerfCounter (String __gc* fullPath, int sampleInterval) :
        FullPath (fullPath),
        SampleInterval (sampleInterval)
    {
        // validate parameters

        //

        Debug::Assert (0 != fullPath);
        if (0 == fullPath)
            throw (new ArgumentNullException (S"fullPath"));
        Debug::Assert (sampleInterval > 0);
        if (sampleInterval <= 0)
            throw (new ArgumentOutOfRangeException (S"sampleInterval"));

        // the values have been assigned in the initialization list

        // of the constructor, we have nothing more to do -- COOL!>

        //

    }

public:
    const String __gc*  FullPath;
    const int           SampleInterval;
};

So far, so good. You're probably wondering why I am complaining about MC++. It looks like the MC++ version is even cooler than the C# one. Well, the example class was too simple. Now, imagine that when you find an invalid parameter, you should change it to a default value, like in the C# class below:

public class PerfCounter
{
    public PerfCounter (String fullPath, int sampleInterval)
    {
        // validate parameters

        //

        Debug.Assert (null != fullPath);
        if (null == fullPath)
            throw (new ArgumentNullException ("fullPath"));
        Debug.Assert (sampleInterval > 0);
        // change to a reasonable default value

        //

        if (sampleInterval <= 0)
            sampleInterval = DefaultSampleInterval;

        // you can STILL assign the values to the readonly fields

        //

        FullPath = fullPath;
        SampleInterval = sampleInterval;
    }

    public readonly String  FullPath;
    public readonly int     SampleInterval;
    private const int       DefaultSampleInterval = 1000;
}

Now, the corresponding MC++ code will not compile, and you'll see why below:

public __gc class CrashingPerfCounter
{
public:
    CrashingPerfCounter (String __gc* fullPath, int sampleInterval) :
        FullPath (fullPath),
        SampleInterval (sampleInterval)
    {
        // validate parameters

        //

        Debug::Assert (0 != fullPath);
        if (0 == fullPath)
            throw (new ArgumentNullException (S"fullPath"));
        Debug::Assert (sampleInterval > 0);
        // the second line below will cause the compiler to

        // report "error C2166: l-value specifies const object"

        //

        if (sampleInterval <= 0)
            SampleInterval = DefaultSampleInterval;

        // the values have been assigned in the initialization list

        // of the constructor, and that's the only place we can

        // initialize non-static const members -- NOT COOL!

        //

    }

public:
    const String __gc*  FullPath;
    const int           SampleInterval;

private:
    static const int    DefaultSampleInterval = 1000;
};

Now, one may argue, that we could initialize the const member SampleInterval in the initialization list of the constructor like this:

SampleInterval (sampleInterval > 0 ? sampleInterval : DefaultSampleInteval)

and he would be right. However, if we need to connect to a database first, in order to do the check, or we need to perform several checks for the parameter I can't figure out how to do this in the initialization list. Do you? That's why MC++ sucks compared to C# for readonly fields. Now, the programmer is forced to make the const fields non-const and private, and write code to implement read-only properties, like this:

public __gc class LamePerfCounter
{
public:
    LamePerfCounter (String __gc* fullPath, int sampleInterval)
    {
        // validate parameters

        //

        Debug::Assert (0 != fullPath);
        if (0 == fullPath)
            throw (new ArgumentNullException (S"fullPath"));
        Debug::Assert (sampleInterval > 0);
        if (sampleInterval <= 0)
            sampleInterval = DefaultSampleInterval;

        // assign the values to the member variables

        //

        this->fullPath = fullPath;
        this->sampleInterval = sampleInterval;
    }

    __property String __gc* get_FullPath ()
    {
        return (fullPath);
    }

    __property int get_SampleInterval ()
    {
        return (sampleInterval);
    }

private:
    String __gc*        fullPath;
    int                 sampleInterval;
    static const int    DefaultSampleInterval = 1000;
};

"Bugs suck. Period."

John Robins

"I trust that I and my colleagues will use my code correctly. To avoid bugs, however, I verify everything. I verify the data that others pass into my code, I verify my code's internal manipulations, I verify every assumption I make in my code, I verify data my code passes to others, and I verify data coming back from calls my code makes. If there's something to verify, I verify it. This obsessive verification is nothing personal against my coworkers, and I don't have any psychological problems (to speak of). It's just that I know where the bugs come from; I also know that you can't let anything by without checking it if you want to catch your bugs as early as you can."

John Robins

I do what John preaches in his book, and you should do it too. Trust me, but verify my code. I think I have debugged my code thoroughly, and I haven't met bugs in it since the day before yesterday (when I started to write the article). However, if you see one of those nasty creatures, let me know. My e-mail is stoyan_damov[at]hotmail.com.

Though I think I don't have bugs (statistically, I should have 8 bugs in the 4 KLOC) I'd love to share an amazing Microsoft bug with you. It caused me quite some time to find it, but unfortunatelly, I was not able to reproduce it, when it disappeared (yes! it disappeared) later. I've wrapped all of my classes in the namespace SoftwareLoadBalancing. So far so good. Now I have several shared classes in the SharedLibrary assembly. The Load Monitoring Library uses one of these classes to do its job, so it is #using the SharedLibrary. I was able to build LML several times, and then suddenly, the linker complained that it cannot find the shared class I was using in the namespace SoftwareLoadBalancing. I'll name that class X to save myself some typing. I closed the solution, went to the Debug folder of the shared library, deleted everything, deleted all files in the common Bin folder and tried again. Same result! I let the linker grumble for three more tries and then launched the ILDAsm tool. When I looked at the SharedLibrary.dll, I found that the class X was "wrapped" twice in the namespace SoftwareLoadBalancing, i.e. it was now SoftwareLoadBalancing::SoftwareLoadBalancing::X. Because I wanted to do some tests and had no time to deal with the bug, I tried to alias the namespace in LML like this:

using namespace SLB = SoftwareLoadBalancing;

Then, I tried to access the X class, using the following construct:

SLB::SLB::X __gc* x = new SLB::SLB::X ();

Maybe I don't understand C++ namespace aliasing very well, or maybe the documentation does not explain it, but what happened this time was that the linker complained again, that it can't find SoftwareLoadBalancing::SLB::X class!!! The compiler "replaced" SLB with SoftwareLoadBalancing only once. Needless to say, I was quite embarassed. Not only the compiler put my class wrapped in two namespaces, but it was not helping me to work around the problem!:) Do you know what I did then? I aliased the namespace in a way the linker or compiler should understand:

using namespace SLB = SoftwareLoadBalancing::SoftwareLoadBalancing;

Then, I tried to instantiate the X class like this:

SLB::X __gc* x = new SLB::X ();

I'm sure you don't know what happened then, because I was hiding a simple fact from you. I was rebuilding each time. Now are you able to guess what happened? The linker complained again that it cannot find class X in the namespace SoftwareLoadBalancing::SoftwareLoadBalancing. WTF?! I was furious! I gone crazy! I launched ILDasm once again, and looked at the SharedLibrary. The class was properly wrapped once in the namespace SoftwareLoadBalancing. Now, I don't know if this is a bug in the compiler or in the linker or in my mind. What I know is that when I have such a problem next time, I won't go to chase unexisting bugs in my source files, but will launch my love ILDasm and see whether I'm doing something wrong, or Microsoft are trying to drive me crazy:)

TODO(s)

(So what the heck have you done, when there're so much TODOs?!)

Re-write MLMS and MLRS executables into .NET Windows services?
No management console - no time for that, but who knows, maybe in the next version of the article, I'll build a configuration GUI, and write some code to enable remote server management.
In this version, the machine load is calculated almost staticly, i.e. counters and their weights are configurable, but the algorithm to calculate the machine load is the same. If I have some time, (e.g. my lovely wife goes on vacation to her home town for a while :), I'll implement an expression interpreter so you guys would be able to type arithmetic and boolean expressions (formulas) to calculate the machine load as you wish, i.e. one could type expressions like:
```
cpu * 0.2 + ((sessions < 10) * 0.2 + (sessions >= 10) * 0.5) * sessions
```
Discover a better way to store machine loads (e.g. implementing a real priority queue), though the current serves me well:)
Do you remember what I've written earlier in the article about returning the least loaded machine? "Now, if anyone asks for the fastest machine, we will return the first element of the ArrayList, that is stored in the first element of the SortedList, right?". Though I may seem to think right, actually I'm a bit wrong. Imagine that we have 5 machines, reported the least load 10. Now imagine that 100 queries for the least loaded machine arrive to the MLMS. If we manage to answer the queries before machine B reports its load again, we will send machine B as the fastest machine to all 100 queries. Once the clients receive the answer, they'll all rush to overload machine B, so next time, it may not even be able to report its load:) What we have to do, is either to report the fastest machine in some round-robin way, or return a random machine from the list of fastest machines. But that's something I'll implement in the next version of the article.
Change the Grim Reaper's code so that it does not remove the machine once it hadn't reported in time (which could be due to UDP packet loss, not because the machine is dead). Rather have a configurable counter, that decrements each time a machine fails to report its load in time, and when the counter reaches zero, the machine is removed from the load balancing.
SECURITY - there's no security code beside checking the parameters in the methods, and the validity of the TCP/UDP requests that LMS receives. If Michael Howard or David LeBlanc (the authors of the must-read book "Writing Secure Code") have read this article, I'm sure they'd rated it zero. I'm sorry! If I've implemented security, there would be no article this month, and I really want to win CodeProject's December contest:)
"Clean up" some classes. They use other classes' internal members directly, and that's not very cool OO programming, don't you think?
Code coverage - I know that's not an item, which should go on a TODO list, but though I could swear I've checked the entire code, do not trust me and experiment putting the code in some really weird situations.
Build a NDoc-generated .CHM documentation? Really, I don't have a time. I work round the clock to meet some "Mission Impossible" deadlines, and I steal from my teeny-weeny sleep time to write articles like this. Maybe some day, maybe.
Write your feature request in the message board below, and I'll consider implementing it for the next version of the article.

Conclusion

Thank you for reading the article! It was the longest article I've written in my entire life (I started writing articles several months ago:) I'm really impressed how patient you are! Now that I thanked you, I should also say "Thanks!" to Microsoft, which brought us the marvelous .NET technology. If .NET did not exist, I doubt if I would write such an article, and even if I did, it wouldn't include the C++ source code. The .NET framework makes programming so easy! It just forces you to write all day long:) I feel like I'm not programming, but rather prototyping. It is easier than VB was once. Really!

Now let's see what you've learned (or just read) from the article:

what load balancing is in general
my idea for dynamic software load balancing, the architecture and some of the implementation details
some multithreading issues and how to solve them
network programming basics, including TCP, UDP and multicasting
some (I hope) helpful tips and workarounds
that if COM is love, then .NET is PASSION
that I'm a pro-Microsoft guy:)

AaaaaaaaaaaaaI forgot to tell you! Please do not post messages in the message board below that teach me to not use __gc*, when I could just type *. I just love the __gc keyword, that's it:)

Below, there are two books, I've read once upon a time, that served me well to write this article's source code. You'll be surprised that they are no .NET books. I'm not joking -- there're maybe over 250 .NET books, and I've read 10 or so, that's why I can't recommend you any .NET book, really. It wouldn't be fair if I say "Book X is the best on topic Y" because I haven't read at least 1/2 of the .NET books to give you an (authoritive) advice. The books below are not just a "Must-Have" and "Must-Have-Read":) ones. They are priceless for the Windows developer. Stop reading this article, and go buy them now! :)

Programming Server-Side Applications for Microsoft Windows 2000 (ISBN 0-7356-0753-2) by Jeffrey Richter, Jason D. Clark

"We developers know that writing error-tolerant code is what we should do, but frequently we view the required attention to detail as tedious and so omit it. We've become complacent, thinking that the operating system will 'just take care of us.' Many developers out there actually believe that memory is endless, and that leaking various resources is OK because they know that the operatingm system will clean up everything automatically when the process dies. Certainly many applications are implemented in this way, and the results are not devastating because the applications tend to run for short periods of time and then are restarted. However, services run forever, and omitting the proper error-recovery and resource-cleanup code is catastrophic!"

Debugging Applications (ISBN 0-7356-0886-5), by John Robins

"Bugs suck. Period. Bugs are the reason you endure death-march projects with missed deadlines, late nights, and grouchy coworkers. Bugs can truly make your life miserable because if enough of them creep in to your software, customers will stop using your product and you could lose your job. Bugs are serious business... As I was writing this book, NASA lost a Mars space probe because of a bug that snuck in during the requirements and design phase. With computers controlling more and more mission-critical systems, medical devices, and superexpensive hardware, bugs can no longer be laughed at or viewed as something that just happens as a part of development."

And the best text on Managed Extentions for C++ .NET (besides the specification and the migration guide), which is not a book, but a Microsoft Official Curriculum (MOC) course: "Programming with Managed Extensions for Microsoft Visual C++ .NET" (2558). You should definitely visit this course if you're planning to do any .NET development using Managed C++.

A (final) word about C#

Many of you are probably wondering why I have implemented this solution in Managed C++, and not in C# since I'm writing only managed code. I know that most of the .NET developers are on the C# bandwagon. So am I. I'm programming in C# all day long. That's my job. However, I love C++ so much, that I prefer to write in Managed C++. There are probably hundreds of reasons I prefer MC++ to C#, and IJW (and unmanaged code in general) is the probably the last on the list. C# has nothing to do with C or C++, except for some slight similarities in the syntax, no matter that Microsoft are trying to convince us in the opposite. It is way closer to Java, than to C/C++. Do I sound extreme? Well, a C/C++ die-hard coleague of mine (Boby -- http://606u.dir.bg/) forced himself to learn and use VB.NET in order to not forget C++ when he was developing a .NET application. Now who's extreme?:) Microsoft are pushing us very badly to forget C++, so they and some open-source C++ die-hards are the only ones who use it:) Haven't you noticed? As of today, the ATL list generates a couple of unique posts each day, compared to at least 10-20, several months ago, before Microsoft suddenly decided to drop it, and put it back in a week, when catcalled by the ATL community. And what does Microsoft say about this, eh? COM is not dead! COM is here to stay! ATL lives (somewhere in time). Blah, blah:) I adore .NET, but I don't want to search for the "The C Programming Language" and "The C++ Programming Language" books in the dusty bookshelves in 3 or 4 years, when it will be replaced by some even-cooler technology. Long live C++!:) Anyway, I was tempted to implement the solution in C#, because the monthly prizes for it were better-looking:)

Disclaimer

This software comes "AS IS" with all faults and with no warranties whatsoever. If you find the source code or the article useful, wish me a Merry Christmas:)

Wishing you the merriest Christmas, and looking forward to seeing you next year:)

Stoyan

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here