Introduction
There are many articles on IOCP (Input/Output Completion Port). But they are not easy to understand because the IOCP technique itself has some arcane things and it doesn't have related standard documents which have enough explanation or code samples. So, I decided to make an IOCP sample (OIOCPNet
) in high performance and write a document that deals with the way IOCP operates and its related key issues.
Objectives
I focused on:
- More than 65,000 concurrent connections (the maximum port number (unsigned short(65535)) of IP version 4).
- Function to transfer more than thousand bytes through the network.
- Easy method for users of the
OIOCPNet
class.
Key ideas to achieve the objectives
IOCP
Yeah, the first thing is IOCP. Well, why should we use IOCP? If we use the well known select
function (with FD_SET
, FD_ZERO
, ...), we can't help looping to detect socket events which means that the socket has some received or sent data packets. And when we develop a game server or a chat server, a socket is used as an ID of the user action. So to find the user data on the server, we use a finding loop or hash tables with the socket number. Loops are very serious in making the server slow when the number of users is more than tens of thousands. But with IOCP, we need not do these loops. Because IOCP detects socket events at the kernel level and IOCP provides the mechanism to associate a socket (i.e. completion port) with a user data pointer directly. In short, with IOCP we can avoid loops and get the user data on the server side faster.
AcceptEx
By using Accept
(or WSAAccept
) we get WSAENOBUFS
(10055) error, when the number of (almost) concurrent connections is more than 30,000 (it depends on the system resource). The reason for the error is that the system can't catch up with preparing the system resource for a socket structure as fast as connections are made. So we should find a way to make socket resources before we use them, and AcceptEx
is the answer. The main advantage of AcceptEx
is just this - preparing sockets before use! The other features of AcceptEx
are pesky, and not understandable. (See MSDN Library.)
Static memory
The use of static memory (or pre-allocated memory) on server side applications is somewhat natural and crucial. When we receive or send packets, we must use static memory. In OIOCPNet
, I use my own class (OPreAllocator
) to get the pre-allocated memory area.
Sliced data chunk
Have you ever met with a situation where you had to sent a large data packet (more than thousand bytes) using one function call (like WriteFile
, WSASend
or send
) and then the receiver didn't get the data packet you had sent? If you have met, then you might have met with the problem of network hardware (routers, HUBs, and so on) and buffer - MTU (Most Transfer Unit). The least MTU of network hardware is 576 bytes, so it is better that the large packet is sliced into many smaller packets less than the least MTU size. In OIOCPNet
, I have defined the unit data block size as BUFFER_UNIT_SIZE
(512 bytes). If you need a bigger one, you can change it.
Don't spawn many threads
If your server logic has some kind of IO operations, it may be better to spawn many threads. Because threading is meaningful only if the environment has IO operations. But don't forget 'the more threads, the more efforts of CPU for thread scheduling'. If there are more than 10,000 threads and they are running, the operating system and the processes can't hold their normal running state, because CPU pumps all its capability into finding which thread runs next time - scheduling or context switching. For reference, OIOCPNet
has two (experimental value) threads per CPU and doesn't spawn any more.
OIOCPNet - the Key
OIOCPNet
is the class applied with the above ideas. The operation steps of OIOCPNet
are the following:
OIOCPNet
prepares its resources like pre-allocated memory area, completion port, other handles and so on.
OIOCPNet
makes a listening socket.
OIOCPNet
pre-generates sockets (65,000, but I defined it as 30,000 in IOCPNet.h for OS not Win 2003, change MAX_ACCEPTABLE_SOCKET_NUM
depending on your needs) and its own buffered sockets, and then puts them into acceptable mode by using AcceptEx
.
- When a user tries to connect to the server,
OIOCPNet
accepts it.
- When a socket reads data packets,
OIOCPNet
puts them into its pre-allocated reading slots and then puts an event for use of the server logic.
- When the sever logic writes data packets,
OIOCPNet
puts them into its pre-allocated writing blocks and then calls PostQueuedCompletionStatus
so that a worker thread sends the data packets.
- When a user closes the connection,
OIOCPNet
closes the socket but it doesn't release the memory of the buffered socket, just re-assigns it.
The following picture shows the entire mechanism of OIOCPNet
. It is very simple:
Key points when writing the code
LPOVERLAPPED parameter
GetQueuedCompletionStatus
and PostQueuedCompletionStatus
lack the parameter to present the result of the IO operation. Besides the default parameters of GetQueuedCompletionStatus
(or PostQueuedCompletionStatus
), OIOCPNet
needs more parameters for classifying the type of IO operation and a little additional information. So I used the LPOVERLAPPED
parameter of GetQueuedCompletionStatus
and PostQueuedCompletionStatus
as my custom parameter like the thread parameter (LPVOID lpParameter
, the fourth parameter) of CreateThread
. OVERLAPPEDExt
is the extended type of OVERLAPPED
structure and it has more information. See the definition code below:
struct OVERLAPPEDExt
{
OVERLAPPED OL;
int IOType;
OBufferedSocket *pBuffSock;
OTemporaryWriteData *pTempWriteData;
};
Life time of a variable used by an asynchronous function
In OIOCPNet
, WSASend
and WSARecv
operate in an asynchronous way. So take care of the life time of the variables passed to the asynchronous functions.
pTempWriteData = (OTemporaryWriteData *)
m_SMMTempWriteData.Allocate(sizeof (OTemporaryWriteData));
...
m_pWriteBlock->GetBlockNeedsExternalLock
(&pBuffSockToWrite, pTempWriteData->Data,
&ReadSizeToWrite, &DoesItHaveMoreSequence);
...
try
{
ResSend = WSASend(pTempWriteData->Socket,
&pTempWriteData->DataBuf, 1,
&WrittenSizeUseless, Flag,
(LPOVERLAPPED)&pTempWriteData->OLExt, 0);
}
In the above code snippet, pTempWriteData
is allocated for being used by WSASend
. WSASend
returns immediately, but pTempWriteData
must be alive until the real sending operation of WSASend
at the OS level is over. When the sending operation is over, then release pTempWriteData
like this:
if (0 != pOVL)
{
if ((IO_TYPE_WRITE_LAST ==
((OVERLAPPEDExt *)pOVL)->IOType
|| IO_TYPE_WRITE ==
((OVERLAPPEDExt *)pOVL)->IOType))
{
if (0 != ((OVERLAPPEDExt *)pOVL)->pTempWriteData)
{
m_SMMTempWriteData.Free(
((OVERLAPPEDExt *)pOVL)->pTempWriteData);
}
continue;
}
}
The uniqueness of socket
A normal SOCKET
number itself is unique. But the OS assigns the socket number arbitrarily, the latest closed socket number could be re-assigned to a new socket connected right next to it. So it could be that:
- A socket is assigned with a socket number 3947 (as an example) for new connection.
- The server logic reads data packets using the socket.
- The socket is closed suddenly for user closing while the server logic doesn't know about that fact.
- A different socket is assigned with the same socket number 3947, (the resurrection of that socket number).
- The server logic writes data packets to the socket, the server meets with no problem to do so. But the data packets might be sent to a different user as a result.
To prevent this troublesome situation, OIOCPNet
manages its own socket number SocketUnique
, a member of OBufferedSocket
.
How to use OIOCPNet
Usage
The usage of OIOCPNet
is simple. See the following code snippet:
int _tmain(int argc, _TCHAR* argv[])
{
...
WSAStartup(MAKEWORD(2,2), &WSAData);
pIOCPNet = new OIOCPNet(&EL);
pIOCPNet->Start(TEST_IP, TEST_PORT);
hThread = CreateThread(0, 0, LogicThread,
pIOCPNet, 0, 0);
...
InterlockedExchange((long *)&g_dRunning, 0);
WaitForSingleObject(hThread, INFINITE);
...
pIOCPNet->Stop();
delete pIOCPNet;
WSACleanup();
return 0;
}
DWORD WINAPI LogicThread(void *pParam)
{
...
while (1 == InterlockedExchange((long *)&g_dRunning,
g_dRunning))
{
iRes = pIOCPNet->GetSocketEventData(WAIT_TIMEOUT_TEST,
&EventType, &SocketUnique, &pReadData,
&ReadSize, &pBuffSock, &pSlot, &pCustData);
if ...
else if (RET_SOCKET_CLOSED == iRes)
{
continue;
}
MainLogic(pIOCPNet, SocketUnique, pBuffSock,
pReadData, ReadSize);
pIOCPNet->ReleaseSocketEvent(pSlot);
}
return 0;
}
void MainLogic(OIOCPNet *pIOCPNet, DWORD SocketUnique,
OBufferedSocket *pBuffSock, BYTE *pReadData, DWORD ReadSize)
{
pIOCPNet->WriteData(SocketUnique, pBuffSock,
pReadData, ReadSize);
}
We can set the IP address and port number with Start
which prepares the necessary resources. In logic thread we can get the data packets with GetSocketEventData
and we can send data packets with WriteData
. After using the data, release pSlot
has the pointer (pReadData
) that indicates the data packet with ReleaseSocketEvent
. Finally, when the main logic ends, call Stop
to that OIOCPNet
which releases its resource. That's all.
Take care of read and write at client side
OIOCPNet
slices a large data packet into smaller packets. It adds 4-bytes packet length information to the original data packet. But the slicing and assembling operation is abstracted by GetSocketEventData
and WriteData
of OIOCPNet
. So, we need not care about it. But you should use TCPWrite
and TCPRead
(see TCPFunc.h, TCPFunc.cpp in NetTestClient project) to communicate with OIOCPNet
when you make the client side application connect to the server.
Test
My report
I compiled OIOCPNet
in .NET 1.1 environment. (also VC++ 6.0, blocking #include "stdafx.h"
). And I located the server (IOCPNetTest) in Windows 2003 Enterprise Edition and located the test clients (NetTestClient) in several machines. The specification and performance result:
- Test Server - OS: Windows 2003 Enterprise Edition
- Test Server - CPU: Intel 2.8GHz (x 2)
- Test Server - RAM: 2GB
- Test Client: Windows XP (3~5 machines used, changing thread number)
- Result: about 15% ~ 20% CPU Usage (when established TCP connection number is 65,000)
Other tips
When a client can't generate more than 5,000 (~ 2,000) connections to the server, check the registry. The checking step includes:
- Run regedit
- Open 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters'
- Add 'MaxUserPort' as DWORD value and set the value (maximum value is 65534 in decimal number).
If you need to increase the thread number of your test client to more than 2,0xx, revise the function stack size of the client application using compile option '/STACK:BYTE' or a parameter of CreateThread
. Before you run the test server and test client, set TEST_IP
and TEST_SERVER_IP
with the IP address of your server. To see the connection number, use performance monitor or 'netstat -s' in command prompt.
History
- August, 2005
- IOCPNet first version.
- Fixed a bug during the ending process.
- Added a new demo and src, using Windows thread pool. (Because there've been some requests for the sample uses
BindIoCompletionCallback
.)