Failover Socket Client

Bart Meirens

4.82/5 (10 votes)

20 Aug 2007CPOL9 min read

787

A socket client which fails over to a different host when the connection drops

Download source - 13.75 KB

Introduction

While a vast amount of exemplary socket applications exists, none of the examples seemed to address the needs I wanted to explore. Because community sites like The Code Project helped me find the right direction in the past, I thought this article would be my chance to share the insights I gathered while developing this prototype with the community.

Outline

A brief brainstorming session with a colleague revealed some features would be useful. For our journey, let's assume the following business requirements exist for the socket client:

Probe hosts and use the first available host encountered
Failover to a different host if the connection drops
Send the messages in the exact same order they are handed to it
Backup any unsent messages if none of the hosts are responding and send them when a host is available again, and
Be configurable using a settings file

Implementation 101

Before describing any of the steps taken to get to the feature set, it is worth noting that the socket client has been split into two parts. One part is the "front end", being the interface (in this case a console application); the other part is the message service.

The message service is responsible for sending the messages and probing the hosts and is implemented as a singleton, since we want:

Only one queue with all messages in the order they are handed over
Only one client connecting to the host
Only one backup file, and
Only one log file

The Configuration File

Of all the requirements we could come up with, this one was the easiest to implement. All features needed to be configurable. To achieve this, XML is used for the configuration file because of its ease of use, the possibility to have recurring elements, the possibility to alter the contents in a simple text editor and my personal preference for this format. To ensure compliance to the expected format in an editor capable of verifying this, I also designed an XML schema. If you want more information regarding XML schemas, you could learn about them in W3 schools [^] excellent tutorial.

This is a generic configuration file:

XML

<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation=
        "http://www.webappexpert.com/schemas/socketconfiguration.xsd">
    <hosts>
        <host>
            <IP>127.0.0.1</IP>
            <port>1234</port>
        </host>
        <host>
            <IP>127.0.0.1</IP>
            <port>13000</port>
        </host>
    </hosts>
    <backupFile>D:\temp\messages.txt</backupFile>
    <logPath>D:\temp\</logPath>
    <interval>300000</interval>
</configuration>

As you can see, we can specify more than one host, give locations for the backup file, log file and last but not least, the probing interval.

Probing Hosts

Probing hosts can be achieved in various ways. The scenarios I explored were:

Attempt a TCP connection using the IP-port pair, and
Attempt an ICMP ping to the IP address, and - if successful - attempt a TCP connection using the IP-port pair

Using the first method, we would only attempt to connect to the host. If there would be a failure on the network path to that host, we would have to wait for a timeout in the TCP connection. Precious time could be lost here. The alternative method, where the network connection is checked before attempting a TCP connection, works somewhat faster as an ICMP echo is received much faster. This echo can indicate success but also "destination host unreachable". In either case, the response comes within milliseconds and is received from the host or a router in the network path. Because the second method saves us some time and is more failsafe, this method has been implemented.

Probing active hosts is a two-step process as said before. First, a ping message is sent to the host. When a reply is received, a TCP connection is attempted using the specified port. If this is successful as well, the host is assumed active until either a message fails to be sent or the probing interval has elapsed and the process starts from scratch.

Since the release of the .NET 2.0 Framework, we can use a ping class out of the box, whereas this was something you had to implement yourself in the previous versions of the Framework. The choice to use the 2.0 Framework saved us some time getting the pings to work.

Message Order

By using a queue, we can ensure the messages are sent in the order they were received. While sending one message, another one can be added to the queue without any problems. The feeding of messages does not have to stop while the client is sending a message.

Failover

Whenever the client cannot send a message to its active host, a failover is initiated. The client looks for another host using the exact same steps described in the section "Probing hosts". While in failed state, the socket client will save the messages in the queue and any incoming messages to the backup file, thus ensuring persistence over socket client application shutdowns. After finding an active host, the backed-up messages are loaded in the queue and sent.

Logging

While logging is not listed as a feature, we all know good software should log its actions for troubleshooting purposes.

Logging has been implemented in two layers:

Logging to a file using the date as name, and
Logging to the consuming application

The logging to file happens for all actions the socket client takes (connection - and message specific actions). Only log entries for message actions are forwarded to the consuming application using a public event. The consuming application can choose to implement an event handler for this event.

The log file path is the path specified in the configuration file, with the current date appended in yyyymmdd format (e.g. D:\temp\20070820.log).

Code Walkthrough

I thought it would be more useful to provide the code examples in the order they are executed as opposed to discussing the code in the respective sections detailing why it was implemented the way I did.

Even though singleton architecture is well-known throughout the industry, I include the code here to have an entry point.

private static MessageService _instance = new MessageService();

public static MessageService Instance
{
    get { return _instance; }
}

private MessageService()
{
    //load configuration
    ParseConfig();

    //check failed messages on start
    FileInfo fi = new FileInfo(_filepath);
    if (fi.Exists)
        _hasFailures = true;

    //look for a host
    _activeHost = FindHost();

    //set timer for periodic connectivity check
    TimerCallback tcb = new TimerCallback(TimerElapsed);
    Timer timer = new Timer(tcb, null, _checktime, _checktime);
}

First, the configuration is loaded using the following method:

private void ParseConfig()
{
    //load configuration file
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load("configuration.xml");

    //parse elements
    XmlElement root = xmlDoc.DocumentElement;
    foreach (XmlElement element in root.ChildNodes)
    {
        if (element.Name == "hosts")
        {
            foreach (XmlElement subElement in element.ChildNodes)
            {
                if (subElement.Name == "host")
                {
                    if (subElement.ChildNodes.Count == 2)
                    {
                        XmlNode IP = subElement.ChildNodes[0];
                        XmlNode port = subElement.ChildNodes[1];
                        Host myHost = new Host(IP.InnerText, int.Parse(port.InnerText));
                        _hosts.Add(myHost);
                    }
                }
            }
        }
        if (element.Name == "backupFile")
        {
            _filepath = element.InnerText;
        }
        if (element.Name == "logPath")
        {
            _logpath = ((element.InnerText.EndsWith(@"\") ?
                       element.InnerText.Substring(0, element.InnerText.Length - 1) :
                       element.InnerText) + @"\" +
                       DateTime.Today.ToString("yyyyMMdd") + ".log");
        }
        if (element.Name == "interval")
        {
            _checktime = long.Parse(element.InnerText);
        }
    }
    //Logging deleted for brevity
}

The configurable variables are loaded, now a check has to be done to ensure no unsent messages were left when the application was closed. Since we always backup to the same file, and delete the file once the messages were loaded from it, the mere presence of the file indicates a previous problem. The FindHost method will take care of sending any unsent messages. How do we find a host? Like this (logging entries have been deleted for brevity):

private Host FindHost()
{
    TcpClient client = new TcpClient();
    Ping pingSender = new Ping();
    for (int i = 0; i < _hosts.Count; i++)
    {
        PingReply reply = pingSender.Send(_hosts[i].IP);
        if (reply.Status == IPStatus.Success)
        {
            try
            {
                client.Connect(IPAddress.Parse(_hosts[i].IP), _hosts[i].Port);
                if (client.Connected)
                {
                    client.Client.Shutdown(SocketShutdown.Both);
                    client.Client.Disconnect(false);
                    //reset blocking messages to be sent
                    if (_hasFailures)
                    {
                        _hasFailures = false;
                        SendSavedMessages();
                    }
                    return _hosts[i];
                }
            }
            catch
            {
                //logging connection problem here
            }
        }
        else
        {
            //logging ping problem
        }
    }
    //if no active host returned in loop, default to error mode
    _hasFailures = true;
    return null;
}

Quite simple, you think? Not really. We have to ensure the connection is closed as well. A TCP connection is made out of three distinct parts:

The SYN: Here a request is sent to the host to establish a connection to a given port, the host sends back a SYN ACK (an acknowledgement)
The connection itself, where data is sent between the client and the host, and
The FIN: The client sends a packet saying all data it wanted to send was sent. The host sends back a FIN ACK, after which the connection is terminated.

The most straightforward way to close the connection would be TcpClient.Close(); but this would only dispose the TcpClient instance without closing the underlying connection. Doing so, no FIN would be sent to the host and another connection attempt would fail as the host still has an active connection. To force the TCP connection to close, we have to call TcpClient.Client.Shutdown(SocketShutdown how) to disable sending and/or receiving on the socket and TcpClient.Client.Disconnect(bool reuseSocket) to close the socket connection. The boolean value indicates whether the socket connection should be re-usable.

On to loading the saved messages then?

void SendSavedMessages()
{
    //get messages from file and attempt to send
    UTF8Encoding myEnc = new UTF8Encoding();
    FileStream fs = new FileStream(_filepath, FileMode.Open, FileAccess.ReadWrite);
    byte[] theBytes = new byte[fs.Length];
    fs.Read(theBytes, 0, theBytes.Length);
    fs.Flush();
    fs.Close();
    FileInfo fi = new FileInfo(_filepath);
    fi.Delete();

    //copy file contents, omit UTF-8 Byte Offset Marker
    string fileString = myEnc.GetString(theBytes).Substring(1);
    string[] messages = fileString.Split(Environment.NewLine.ToCharArray());
    //fill the queue
    for (int i = 0; i < messages.Length; i++)
    {
        if(!string.IsNullOrEmpty(messages[i]))
            _messages.Enqueue(messages[i]);
    }

    //start the worker
    if (_backgroundWorker == null)
    {
        CreateWorker();
    }
}

We are saving the messages in UTF-8 format so that we're not limiting the character set used for the messages. The strange thing is that the Framework has a lot of encoding schemes under the hood, but when reading a file, it does not omit the byte offset marker at the beginning of the UTF-8 encoded file. Therefore, it has been omitted using myEnc.GetString(theBytes).Substring(1);

Our background worker will wait for a job to come in. If it does not exist when a message is handed over or loaded from the backup file, it has to be created.

private void CreateWorker()
{
    _backgroundWorker = new BackgroundWorker();
    _backgroundWorker.DoWork += new DoWorkEventHandler(OnDoWork);

    _backgroundWorker.RunWorkerAsync();
}

The background worker's work method is invoked when a job is received:

void OnDoWork(object sender, DoWorkEventArgs e)
{
    while (_messages.Count > 0)
    {
        TcpClient client = new TcpClient();
        try
        {
            client.Connect(IPAddress.Parse(_activeHost.IP), _activeHost.Port);

            string message = _messages.Dequeue();

            byte[] buffer = Encoding.Unicode.GetBytes(message);

            client.Client.Send(buffer);

            Log(new MessageEventArgs
                 (message, _messages.Count, MessageStatus.Success, null));
        }
        catch (SocketException ex)
        {
            _hasFailures = true;
            string[] messages = _messages.ToArray();
            Log(new MessageEventArgs
                 (null, _messages.Count, MessageStatus.Failure, ex.Message));

            _messages.Clear();
            //save array to file
            StringBuilder allMessages = new StringBuilder();
            foreach (string message in messages)
            {
                allMessages.Append(message + Environment.NewLine);
            }
            File.AppendAllText(_filepath, allMessages.ToString(), Encoding.UTF8);
            //failover
            _activeHost = FindHost();
        }
        finally
        {
            if (client.Connected)
            {
                client.Client.Shutdown(SocketShutdown.Both);
                client.Client.Disconnect(false);
            }
        }
    }
    _backgroundWorker = null;
}

A connection is established to the active host, the message is dequeued and sent to the host. If all goes well, the connection is closed before moving on to the next item in the queue. If a problem is encountered, the catch block kicks in, saving the entire queue to the backup file and initiating the failover.

You might ask yourself why the connection is closed after each attempt. The answer is because of the server. We have to make sure each message is treated as a single one. If we were to send messages over an existing connection, unpredictable results occur. The implementation we used for the test server closes the connection once no more data is available in the stream. This ensures the server listens to new connection requests on the specified port. Testing showed we can sometimes send five messages in one go, after which the connection is reset, resulting in a failure for any other messages, but sometimes we could only send one and a half or three or ten or... Because we want to be certain messages arrive, they are sent using a separate TCP connection.

And for completeness: the timer call-back:

private void TimerElapsed(object info)
{
    _activeHost = FindHost();
}

After the timer has elapsed, this method is called and the hosts are probed again to find an active host. This may be the same one as before, but it could be one we prefer just as well. Because we probe them in the order specified in the configuration file, we could put host A in the same subnet as the client before host B in a different subnet. If host A fails, host B will be used, but if host A comes up again, it is our preferred one with a shorter network path.

The service has one public method: Send. This is where the front-end hands off a message it wants to send.

public void Send(string message)
{
    if (!_hasFailures)
    {
        _messages.Enqueue(message);

        if (_backgroundWorker == null)
        {
            CreateWorker();
        }
    }
    else
    {
        //save to file
        File.AppendAllText(_filepath, message + Environment.NewLine, Encoding.UTF8);
        Log(new MessageEventArgs
            (null, 0, MessageStatus.Failure, "Communications error"));
    }
}

Conclusion

With this implementation, I think I succeeded in creating a client which implements the required features. If you think there are shorter ways to achieve this, have any suggestions to optimize the code, or find a bug, please leave a comment.

History

08-20-2007: Initial version
08-22-2007: Refactored the server code as per _NightOwl_'s remark

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)