The background
Not too long ago, I had to design and implement a kind of a load balancer system for an existing application. The platform being .NET, the choice was of course .NET Remoting.
I had written several little client-server applications using remoting before, so I was aware of the mechanism and the different techniques in remoting. After some planning, shortly I started coding. The servers (nodes) that hosted the application had to be able to be controlled remotely, I mean to start and stop them at least.
Starting a server node, it registered a channel on a specific port, and also registered a type (NodeManager
) as a singleton WellKnownSeverType
. The whole node was controlled through this singleton object. On calling its start method, it created a new appdomain and registered another channel, and the application type as a SingleCall type. This was actually a requirement that the application type should be a SingleCall. So, the node came to a running state, and the client requests were serviced in this new appdomain. Everything just went properly, until I started implementing the stop method of the NodeManager
class.
Under stopping I mean, to stop the service without terminating the process, so that clients can�t invoke methods on the application objects, but it can still be controlled, thus started again through the NodeManager
object which is still available.
So, on stopping, it had to unload the new appdomain hosting the application objects, but I wanted to implement a mechanism that first waited until the requests being processed at that time finished properly, but any other new requests to the application objects had to be rejected of course.
There is an object called DomainManager
in a separate appdomain that hasn�t been covered yet. When the NodeManager
is controlled to start the service, it creates the new appdomain (servicedomain), and creates a new DomainManager
object in this new appdomain, and call its Start
method. This Start
method then registers the remoting channel on a specific port, and also registers the application types as SingleCall. At this point, the application objects are reachable by clients and serving requests in this service appdomain. When we tell the NodeManager
to stop, it calls a Stop
method on the DomainManager
that should somehow make the application objects unreachable, without interrupting the current requests under processing. Therefore, unloading the service appdomain immediately is not enough, because a ThreadAbortException
is thrown on all threads operating in the appdomain being unloaded, and our current requests would terminate.
Besides, to have one DomainManager
object in each appdomain is always nice if you have to deal with more appdomains, but note that DomainManager
must be remotable (MarshalByRefObject
), as it receives cross appdomain calls.
Now, I�m going to cover a few techniques that can be useful if you are implementing a service in .NET Remoting, and planning to add ability to be stopped without terminating the service host process.
Closing the listening port
What to do in the above situation was to just simply unregister the channel in the appdomain, causing the TCP port to stop listening, so clients couldn�t connect anymore, and to somehow count how many requests are being processed currently which is detailed later. When this counter reached zero, the new application domain could be unloaded. So the server got stopped, without interrupting any requests currently under processing.
The first thing I realized was that after unregistering the channel, some clients could henceforward invoke methods on the application type.
Yes, this is because remoting services cache the connections, and closes only if the timeout elapsed. Therefore, if the server is under a heavy load, clients invoke methods very frequently on the application objects, the sockets don�t timeout, and thus stopping the server can last for a long period of time. And the requirement, that it should reject any new requests, will not be met.
Socket cache timeout
Of course, we have no control over this socket cache, so we can not force to close those connections immediately, but I read somewhere that, in .NET 2.0, the timeout period will be adjustable, and setting it to zero will have the connections closed just after a remote method call finished. Besides the fact that my server should run on .NET 1.1, the other problem with this is that this way we are likely to experience significant performance loss, because every remote method invocation has to establish a new TCP connection with the server, which imposes an unnecessary latency on the system, resulting in the overall throughput decreasing.
Unregistering registered remote types..?
Another solution would be to unregister the channel and unregister the registered remote types as well. This way, we could stop the service, because clients would get the usual RemotingException
(Requested service not found), even if they have a cached connection to the server, and we are done.
Unfortunately, .NET doesn�t provide us any opportunity to unregister a registered remoting type, the only way to do this is to unload the hosting appdomain, but we don�t want to do this for the above mentioned reason, which is before unloading the domain we first want to wait for the requests that are currently being processed to finish.
RemotingServices.Disconnect(MarshalByRefObject o)
However, there is a method RemotingServices.Disconnect
that might come useful in hand. The only parameter of this static method is a MarshalByRefObject
. This method disconnects the given published object. This way, if you have a pointer to an object that serves remote invocations, and call RemotingServices.Disconnect
on this, your object won�t be reachable anymore, and the client will get an exception.
As you can see, this can be a good solution if you have the instance of the published remote type. You may use it on Client Activated Objects, any marshaled objects (when a remotable object pointer is returned to the client, or using RemotingServces.Marshal
method), WellKnownServiceType
Singletons with a little work, but not with SingleCall types.
When using a SingleCall type, each request to the type is invoked on a new instance, so you can�t have any instance pointer in your hand, therefore you just can not disconnect this kind of a remote type without unloading the hosting appdomain.
Anyway as you can see, using RemotingServices.Disconnect
is the best way to stop your service without having to unload the domain. Using it, you don�t even need to unregister the listening channel.
But, you have to store the references to the instances of the published types that are serving the requests. So, do not use RegisterWellKnownServiceType
, for example, to create a Singleton, instead use RemotingServices.Marshal()
.
MyService obj = new MyService();
RemotingServices.Marshal(obj, �myService.bin�);
RemotingServices.Disconnect(obj);
For other marshaled stateful objects, you can use the practice below.
All you need to do is to create a remotable factory class, register it the same way as above, and the Create
method of the factory class returns an instance of the requested service type, and adds it to a collection.
public class AppFactory : MarshalByRefObject, IAppFactory
{
IMyService Create()
{
MyService obj = new MyService();
Collection.Add(obj);
return obj;
}
}
AppFactory appfactory = new AppFactory();
RemotingServices.Marshal( appfactory, �AppFactory.bin�);
RemotingServices.Disconnect( appfactory );
foreach(MarhsalByRefObject obj in Collection)
{
RemotingServices.Disconnect( obj );
}
Collection.Clear();
The only problem is that if the server is running for a long time, and several MyService
objects are created in a short period of time, then it will lead to huge memory consumption especially if MyService
instances require quite a lot of memory. Soon, your server will start throwing OutOfMemoryException
s. That is because you keep holding references in your collection even if the objects� lease time already expired and the lease manager disconnected the objects, so they cannot be reached from outside anymore. There is a service in Remoting, which can be helpful at this time too, namely the TrackingServices. Using the TrackingServices, you can be informed when an object is disconnected and so you can remove it from the collection, letting the garbage collector destroy it. I never used ClientActivatedObject
s, but I think this practice can be used with them as well.
Introducing Dynamic Sinks
I found pretty useful the dynamic sink that can be jammed in the sink chain. Installing a dynamic sink in an appdomain, you are notified of every remote method call starting and finishing. It can be used to count how many remote calls are being processed currently in the appdomain, and if the server is stopped, reject the calls by throwing an exception.
Counting inbound calls
All to be done, is to write a class which has a counter, and methods to increase and decrease the counter, and one for waiting until the counter reaches zero. For doing this, a ManualResetEvent
object can be used. In the Leave
method, you decrease the counter, and if it is zero, just set the event to a signal state (calling Set
). In the Enter
method, you increase the counter and set the event to a not signal state (calling Reset
). In the Wait
method, you simply call the event�s WaitOne()
method which will block the calling thread until the event becomes signaled, which means that the counter reached zero, so no inbound calls are being processed. Note that this object will be called from different threads, so applying a macro lock on these methods is not a bad idea.
There will be one object of this class per appdomain, and the Enter
method will be called from the sink�s ProcessMessageStart
method, and the Leave
from the ProcessMessageFinish
method.
Reject calls if server stopped
On stopping the server, you just simply set a boolean variable, then in the ProcessMessageStart
method, you first examine this boolean, and if it is true you throw a kind of exception. From this point, all remote method calls will be rejected.
public class InboundCallCounterDynamicSink : IDynamicMessageSink
{
public void ProcessMessageStart(IMessage reqMsg,
bool bClientSide, bool bAsync)
{
if (bClientSide)
return;
IMethodMessage message = (IMethodMessage)reqMsg;
WellKnownServiceTypeEntry[] services =
RemotingConfiguration.GetRegisteredWellKnownServiceTypes();
for (int i=0; i<services.Length; i++)
{
if (("/"+services[0].ObjectUri) == message.Uri)
{
if (ServiceDomainManager.Current.IsStopped)
throw new NodeStoppedException("Requested Service is stopped.");
else
ServiceDomainManager.Current.CallCounter.Enter();
break;
}
}
}
public void ProcessMessageFinish(IMessage reqMsg,
bool bClientSide, bool bAsync)
{
if (bClientSide)
return;
IMethodMessage message = (IMethodMessage)reqMsg;
WellKnownServiceTypeEntry[] services =
RemotingConfiguration.GetRegisteredWellKnownServiceTypes();
for (int i=0; i<services.Length; i++)
{
if (("/"+services[0].ObjectUri) == message.Uri)
{
ServiceDomainManager.Current.CallCounter.Leave();
break;
}
}
}
}
The code snippet demonstrates how the described sink can look.
Somehow, you should check what type of object the inbound call targets. Because, this mechanism should affect only calls to Application objects (your published services). These are the remote calls coming from the clients. Other cross appdomain calls, for example, calling the DomainManager
by the NodeManager
, should not be counted or rejected. That�s why a filtering is needed which first checks the targeted object�s type and if it is an Application object, set the counter or throw the NodeStoppedException
if the server is stopped.
The best place to install the dynamic sink is in the DomainManager
�s Start
method before registering the channel and registering the application types for remoting. This technique solves the problem around stopping SingleCall application types.
At last the NodeManager
�s Stop
method should look something like this:
public void Stop()
{
domainManager.Stop();
domainManager.CallCounter.Wait();
domainManager.Dispose();
AppDomain.Unload( domainManager.GetHostingAppDomain() );
}
Socket.Listen..
There is one more thing that I want to share.
I realized several times that when I start my service, and also a test client which stresses the service invoking the Application methods, then I just stop the service and start again immediately, it just can�t reserve the TCP port again. I searched the net, and found many people complaining about the same thing. Some wrote that, after two minutes it will be able to reserve again. I don�t know why it is happening, but the point is that it makes my server unreliable as I can�t stop and start at any time, and I didn�t find any solution for this.
As for me, I implemented a dynamic port allocation manager (PAM) class which has a range of port numbers, and when DomainManager
registers the channel, it first asks for a port number from the PAM. If the port happens to be unusable, it asks and tries another. When allocating a port, the PAM sets its state to reserved, when the port is unusable it sets the state to unusable, and a maintenance method of the PAM is called periodically by a timer that puts the unusable ports back to available state.
When the channel is successfully created on a port, then that port number must be known by clients to be able to reach the Application objects which may be awkward in many situations, but the most simple solution is when clients can ask for the port number or the entire URI of the service from the NodeManager
which is always running (in the default appdomain) and always serves on the same well known port.
Conclusion
When using Singleton or marshaled objects, use RemotingServices.Disconnect
to unregister the objects, but dynamic sinks are also needed here if you want to count and wait for the current requests just being processed to finish before unloading the service appdomain.
In the case of SingleCall types, use the technique in the last section, where RemotingServices.Disconnect
doesn�t help and you have to throw exceptions in the dynamic sink�s ProcessMessageFirst
method to reject the client request.
Thank you.