Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

IOCPNet - Ultimate IOCP

3.95/5 (50 votes)
17 Jan 20068 min read 1   17.3K  
Easy to use, high performance, large data transfer by using IO Completion Port.

Introduction

There are many articles on IOCP (Input/Output Completion Port). But they are not easy to understand because the IOCP technique itself has some arcane things and it doesn't have related standard documents which have enough explanation or code samples. So, I decided to make an IOCP sample (OIOCPNet) in high performance and write a document that deals with the way IOCP operates and its related key issues.

Objectives

I focused on:

  1. More than 65,000 concurrent connections (the maximum port number (unsigned short(65535)) of IP version 4).
  2. Function to transfer more than thousand bytes through the network.
  3. Easy method for users of the OIOCPNet class.

Key ideas to achieve the objectives

IOCP

Yeah, the first thing is IOCP. Well, why should we use IOCP? If we use the well known select function (with FD_SET, FD_ZERO, ...), we can't help looping to detect socket events which means that the socket has some received or sent data packets. And when we develop a game server or a chat server, a socket is used as an ID of the user action. So to find the user data on the server, we use a finding loop or hash tables with the socket number. Loops are very serious in making the server slow when the number of users is more than tens of thousands. But with IOCP, we need not do these loops. Because IOCP detects socket events at the kernel level and IOCP provides the mechanism to associate a socket (i.e. completion port) with a user data pointer directly. In short, with IOCP we can avoid loops and get the user data on the server side faster.

AcceptEx

By using Accept (or WSAAccept) we get WSAENOBUFS (10055) error, when the number of (almost) concurrent connections is more than 30,000 (it depends on the system resource). The reason for the error is that the system can't catch up with preparing the system resource for a socket structure as fast as connections are made. So we should find a way to make socket resources before we use them, and AcceptEx is the answer. The main advantage of AcceptEx is just this - preparing sockets before use! The other features of AcceptEx are pesky, and not understandable. (See MSDN Library.)

Static memory

The use of static memory (or pre-allocated memory) on server side applications is somewhat natural and crucial. When we receive or send packets, we must use static memory. In OIOCPNet, I use my own class (OPreAllocator) to get the pre-allocated memory area.

Sliced data chunk

Have you ever met with a situation where you had to sent a large data packet (more than thousand bytes) using one function call (like WriteFile, WSASend or send) and then the receiver didn't get the data packet you had sent? If you have met, then you might have met with the problem of network hardware (routers, HUBs, and so on) and buffer - MTU (Most Transfer Unit). The least MTU of network hardware is 576 bytes, so it is better that the large packet is sliced into many smaller packets less than the least MTU size. In OIOCPNet, I have defined the unit data block size as BUFFER_UNIT_SIZE (512 bytes). If you need a bigger one, you can change it.

Don't spawn many threads

If your server logic has some kind of IO operations, it may be better to spawn many threads. Because threading is meaningful only if the environment has IO operations. But don't forget 'the more threads, the more efforts of CPU for thread scheduling'. If there are more than 10,000 threads and they are running, the operating system and the processes can't hold their normal running state, because CPU pumps all its capability into finding which thread runs next time - scheduling or context switching. For reference, OIOCPNet has two (experimental value) threads per CPU and doesn't spawn any more.

OIOCPNet - the Key

OIOCPNet is the class applied with the above ideas. The operation steps of OIOCPNet are the following:

  1. OIOCPNet prepares its resources like pre-allocated memory area, completion port, other handles and so on.
  2. OIOCPNet makes a listening socket.
  3. OIOCPNet pre-generates sockets (65,000, but I defined it as 30,000 in IOCPNet.h for OS not Win 2003, change MAX_ACCEPTABLE_SOCKET_NUM depending on your needs) and its own buffered sockets, and then puts them into acceptable mode by using AcceptEx.
  4. When a user tries to connect to the server, OIOCPNet accepts it.
  5. When a socket reads data packets, OIOCPNet puts them into its pre-allocated reading slots and then puts an event for use of the server logic.
  6. When the sever logic writes data packets, OIOCPNet puts them into its pre-allocated writing blocks and then calls PostQueuedCompletionStatus so that a worker thread sends the data packets.
  7. When a user closes the connection, OIOCPNet closes the socket but it doesn't release the memory of the buffered socket, just re-assigns it.

The following picture shows the entire mechanism of OIOCPNet. It is very simple:

Image 1

Key points when writing the code

LPOVERLAPPED parameter

GetQueuedCompletionStatus and PostQueuedCompletionStatus lack the parameter to present the result of the IO operation. Besides the default parameters of GetQueuedCompletionStatus (or PostQueuedCompletionStatus), OIOCPNet needs more parameters for classifying the type of IO operation and a little additional information. So I used the LPOVERLAPPED parameter of GetQueuedCompletionStatus and PostQueuedCompletionStatus as my custom parameter like the thread parameter (LPVOID lpParameter, the fourth parameter) of CreateThread. OVERLAPPEDExt is the extended type of OVERLAPPED structure and it has more information. See the definition code below:

struct OVERLAPPEDExt
{
  OVERLAPPED OL;
  int IOType;
  OBufferedSocket *pBuffSock;
  OTemporaryWriteData *pTempWriteData;
}; // OVERLAPPEDExt

Life time of a variable used by an asynchronous function

In OIOCPNet, WSASend and WSARecv operate in an asynchronous way. So take care of the life time of the variables passed to the asynchronous functions.

// pTempWriteData will be freed when send IO ends.
pTempWriteData = (OTemporaryWriteData *)
m_SMMTempWriteData.Allocate(sizeof (OTemporaryWriteData));

...

// the size of pData 
// (the second parameter of GetBlockNeedsExternalLock)
// does not be over BUFFER_UNIT_SIZE.
m_pWriteBlock->GetBlockNeedsExternalLock
  (&pBuffSockToWrite, pTempWriteData->Data, 
  &ReadSizeToWrite, &DoesItHaveMoreSequence);

...

try
{
  ResSend = WSASend(pTempWriteData->Socket, 
    &pTempWriteData->DataBuf, 1, 
    &WrittenSizeUseless, Flag, 
    (LPOVERLAPPED)&pTempWriteData->OLExt, 0);
}

In the above code snippet, pTempWriteData is allocated for being used by WSASend. WSASend returns immediately, but pTempWriteData must be alive until the real sending operation of WSASend at the OS level is over. When the sending operation is over, then release pTempWriteData like this:

if (0 != pOVL)
{
  if ((IO_TYPE_WRITE_LAST == 
    ((OVERLAPPEDExt *)pOVL)->IOType 
    || IO_TYPE_WRITE == 
    ((OVERLAPPEDExt *)pOVL)->IOType))
  {
    if (0 != ((OVERLAPPEDExt *)pOVL)->pTempWriteData)
    {
      m_SMMTempWriteData.Free(
        ((OVERLAPPEDExt *)pOVL)->pTempWriteData);
    }    
    continue;
  }
}

The uniqueness of socket

A normal SOCKET number itself is unique. But the OS assigns the socket number arbitrarily, the latest closed socket number could be re-assigned to a new socket connected right next to it. So it could be that:

  1. A socket is assigned with a socket number 3947 (as an example) for new connection.
  2. The server logic reads data packets using the socket.
  3. The socket is closed suddenly for user closing while the server logic doesn't know about that fact.
  4. A different socket is assigned with the same socket number 3947, (the resurrection of that socket number).
  5. The server logic writes data packets to the socket, the server meets with no problem to do so. But the data packets might be sent to a different user as a result.

To prevent this troublesome situation, OIOCPNet manages its own socket number SocketUnique, a member of OBufferedSocket.

How to use OIOCPNet

Usage

The usage of OIOCPNet is simple. See the following code snippet:

int _tmain(int argc, _TCHAR* argv[])
{
  ...

  WSAStartup(MAKEWORD(2,2), &WSAData);

  pIOCPNet = new OIOCPNet(&EL);
  pIOCPNet->Start(TEST_IP, TEST_PORT);
    
  hThread = CreateThread(0, 0, LogicThread, 
    pIOCPNet, 0, 0);

  ...
  
  InterlockedExchange((long *)&g_dRunning, 0);
  WaitForSingleObject(hThread, INFINITE);

  ...

  pIOCPNet->Stop();
  delete pIOCPNet;

  WSACleanup();

  return 0;
} // _tmain()

DWORD WINAPI LogicThread(void *pParam)
{
  ...
  
  while (1 == InterlockedExchange((long *)&g_dRunning, 
    g_dRunning))
  {
    iRes = pIOCPNet->GetSocketEventData(WAIT_TIMEOUT_TEST,
      &EventType, &SocketUnique, &pReadData, 
      &ReadSize, &pBuffSock, &pSlot, &pCustData);
    if ...
    else if (RET_SOCKET_CLOSED == iRes)
    {
      // release pCustData.
      continue;
    }

    // Process main logic.
    MainLogic(pIOCPNet, SocketUnique, pBuffSock, 
      pReadData, ReadSize);
        
    pIOCPNet->ReleaseSocketEvent(pSlot);
  }

  return 0;
} // LogicThread()

void MainLogic(OIOCPNet *pIOCPNet, DWORD SocketUnique,
  OBufferedSocket *pBuffSock, BYTE *pReadData, DWORD ReadSize)
{
  pIOCPNet->WriteData(SocketUnique, pBuffSock, 
    pReadData, ReadSize); // echo.
} // MainLogic()

We can set the IP address and port number with Start which prepares the necessary resources. In logic thread we can get the data packets with GetSocketEventData and we can send data packets with WriteData. After using the data, release pSlot has the pointer (pReadData) that indicates the data packet with ReleaseSocketEvent. Finally, when the main logic ends, call Stop to that OIOCPNet which releases its resource. That's all.

Take care of read and write at client side

OIOCPNet slices a large data packet into smaller packets. It adds 4-bytes packet length information to the original data packet. But the slicing and assembling operation is abstracted by GetSocketEventData and WriteData of OIOCPNet. So, we need not care about it. But you should use TCPWrite and TCPRead (see TCPFunc.h, TCPFunc.cpp in NetTestClient project) to communicate with OIOCPNet when you make the client side application connect to the server.

Test

My report

I compiled OIOCPNet in .NET 1.1 environment. (also VC++ 6.0, blocking #include "stdafx.h"). And I located the server (IOCPNetTest) in Windows 2003 Enterprise Edition and located the test clients (NetTestClient) in several machines. The specification and performance result:

  • Test Server - OS: Windows 2003 Enterprise Edition
  • Test Server - CPU: Intel 2.8GHz (x 2)
  • Test Server - RAM: 2GB
  • Test Client: Windows XP (3~5 machines used, changing thread number)
  • Result: about 15% ~ 20% CPU Usage (when established TCP connection number is 65,000)

Other tips

When a client can't generate more than 5,000 (~ 2,000) connections to the server, check the registry. The checking step includes:

  1. Run regedit
  2. Open 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters'
  3. Add 'MaxUserPort' as DWORD value and set the value (maximum value is 65534 in decimal number).

If you need to increase the thread number of your test client to more than 2,0xx, revise the function stack size of the client application using compile option '/STACK:BYTE' or a parameter of CreateThread. Before you run the test server and test client, set TEST_IP and TEST_SERVER_IP with the IP address of your server. To see the connection number, use performance monitor or 'netstat -s' in command prompt.

History

  • August, 2005
    • IOCPNet first version.
    • Fixed a bug during the ending process.
    • Added a new demo and src, using Windows thread pool. (Because there've been some requests for the sample uses BindIoCompletionCallback.)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here