Introduction
What if you have one or more modules, running independently in a system, which should collaborate between each other? Well, they will have to be able to exchange data, organized as events, messages or packets of some sort. By sending and receiving these data the two (or more) applications can process request, or identify the state in which its counterpart camps, reacting to them as a conseguence (for example when it's waiting for a command to arrive). IPC (Inter-Process Communication) is one of the base services which an Operating System provide to applications running on the top of it.
IPC itself is usually a functionality already fully documented and presented in lot of articles around the Web. If from one side this helps a lot application developers, the mechanisms are known also to surveillance applications and malwares of any sort, which scans for the provided API/Handles (or perform hooking) in order to detect IPC elements and identify them. What if you desire to use something not common? What if you need to be able to communicate between a bunch of processes/modules without triggering immediate detection (by firewall applications, for example)?
In this small article I propose an alternative way to use an existing mechanism in order to perform IPC between various processes. The software will not rely on any system call nor known IPC mechanisms which is officially recognized by Windows. Such method should provide a hidden enough way to exchange data because it does not create any System/Named Object that can be detected by using the OS APIs.
The offered prototype works with Windows 10, both 32- and 64-bits version, and ideally can be exported to any other OS, since it reside on a basic mechanism which is usually already included in the OS itself.
Background
The reader must be confident with C programming language. It is advised to have knowledge of Windows subsystems, in particular about memory and synchronization. I will keep low the level of the technical details on the article, leaving to the reader the job of investigate further on them.
IPC in Windows
Windows already provides lot of ways to allow communication between processes which belongs to different address spaces. Since memory is virtualized in modern OSes, processes lives in their own "universe" which can virtually overlap with the universe of another one. Under the hood the OS memory management subsystem performs all the necessary tricks and translation to map an address to a different physical region of the memory, in order to avoid overwriting existing areas which are under the control of another process.
This means that the address 0x00007c00 can exists for two process at the same time (and will contains different code/data). The two applications do not realize that the OS is mapping this address in a different physical location for each of them, and they should (and do not) not care after all.
IPC allows the data to cross such "universes" and pass from a side to another one. This operation normally is not possible because the process does not see the other universe (address space) of another process, and need someone with an extended view of the memory (the OS) to perform the operation for it.
Existing IPC mechanism in Windows are:
- Clipboard
- COM
- Data Copy
- DDE
- File Mapping
- Mailslots
- Pipes
- RPC
- Windows Sockets
and every of this methods needs a system-provided utility in order to perform the communication.
So how can we organize a communication between one or more processes (and threads too) by avoiding to use existing methods and by avoiding to use some call to the system/subsystem? Remeber, the application can't see outside its "observable universe".
DLLs
DLLs are Dinamically Loaded Libraries which provides common functionalities which can be invoked by multiple processes at the same time. MSDN gives a good description of what they are and how they can be used here. DLLs basically provides a way to export once a module that can be used by multiple other elements (usually applications and services) at the same time.
This has be done in order to avoid replication of the same code around the memory for every application which uses that functionality. For example, by default, an application automatically loads ntdll.dll and kernel32.dll at a specific address within its address space. This behavior cannot be changed without modifying the Windows Loader, and the Windows Loader cannot be easily modified (because it's a really delicate part of the OS), at least not by me. DLLs mechanisms allows to have just a single copy of such routines in memory, which otherwise ends up consuming an enormous amount of memory.
Now, starting from this assumption, one can easily think that defining data inside a DLL allows to have a common pool between applications which shares that library, but this is not true. Data sections in fact are parts of the library that are personalized for every process which import them. You don't want other process to mess with the state of your state machines while you are running your services, and data variables usually contains elements which are bound to a particular application only.
A dead end? Not at all!
A magic shared section
Shared sections are special marker for memory pages which indicated that they exists in multiple address spaces at the same time. Shared section are those parts in the process space which contains the same elements of another (or more) address spaces, and the virtual locations of those pages are mapped to the same physical pages (meaning that if the memory change for one process, it change for everyone else). This can recall the File Mapping IPC mechanism offered by Windows API (and the technology behind it is the same), but how do you achieve it is completely different (in fact no CreateFileMapping/MapViewOfFile is invoked here).
What is done in AIPC DLL is marking a particular set of variables to live within a section marked as shared. This is done, in Visual Studio and MSBuild using the following syntax:
typedef struct __AIPCEntry {
volatile long lock;
int head;
int tail;
BYTE pool[AIPC_POOL_SIZE];
} AIPCEntry;
#pragma data_seg(".aipcs")
AIPCEntry AIPCEntries[AIPC_NOF_POOLS] = { 0 };
#pragma comment(linker, "/section:.aipcs,RWS")
#pragma data_seg()
A structure AIPCEntry (defined at the start of the code sheet) organize of the memory inside this region. Within each entries of this array we have some indexes (in order to handle read/writes with a circular buffer policy) and the area of the data itself (the pool variable). If you notice the area is statically allocated, so no calls to memory protection or memory allocation routines are made (this is a wanted feature).
When the Windows Loader maps the DLL (using the LoadLibrary routine or by having a reference to it in the application Import Table), it will magically arrange everything for us within our address space. Again, no more system calls are necessary here, since everything is already taken in account by the system applications loader.
Since it's shared, now the section is accessible by every application which import the library.
Critical regions and concurrence
The problem is not solved just by exposing memory to applications. They will have to coordinate the access to such resources, or the data will end up being corrupted. This usually happens if two processes tries to write at the same time, or if one is preempted (suspended) during the operation, leaving half of the data written and half to write.
Luckily enough Windows provides some primitive API which will help you organize critical sections: the Interlocked* operations. Interlocked procedures provides a way to access shared variables in a safe way. This is granted to be true between concurrent Interlocked operations.
This means that, as long as you access a variable by just using an Interlocked call, that operation will be atomic and will not mess up the state of the variable. This is more than enough for us in order to prepare a basic locking mechanism to access our resources!
AIPCRead and AIPCWrite, the only two procedures exported by the AIPC DLL, uses the lock variable defined in the shared region of data to synchronize the access to the buffer used by the IPC mechanism. Any attempt to perform modification on the data is guarded by:
while (InterlockedCompareExchange(
&AIPCEntries[id].lock,
AIPC_LOCKED,
AIPC_UNLOCKED) == AIPC_LOCKED) {
if (flags & AIPC_FLAGS_NOBLOCK) {
return -ERROR_RETRY;
}
Sleep(0);
}
This cycle waits by default for the lock to be in a released state before moving on. Note that if you specified 1 (the value of the AIPC_FLAGS_NOBLOCK macro) in the AIPC* routines flags, then you requested a NOBLOCK operation and the execution is immediately returned with the negative error code ERROR_RETRY (in the case the lock is holded by someone else). Use non-blocking operations if you want to avoid your thread to be scheduled by the system.
At the end of every operations the lock relative to the entry is then released, giving other processes the ability to require the lock and check/modify the data.
InterlockedExchange(&AIPCEntries[id].lock, AIPC_UNLOCKED);
Testing the mechanism
AIPCTest utility is a small application written to test the functionality offered by the DLL. As you can see, the program is nothing serious and the tests which can perform are limited, but indicative of the correct behavior. You can personalize how the application works by modifying the code and, for example, removing the waiting for writing new data inside the shared area, which usually is set to 1 second.
So comment this out to makes the application run as fast as possible (but with blocking calls):
#define WAIT
In case you want to test out non-blocking mechanisms, then you need to set the flags pre-processor declaration to 1, like the following:
#define FLAGS 1
The first test will be with unmodified code, just to show up that the IPC mechanism is working. Build the project and navigate to the Bin\Debug\Win32 folder (take in account that Debug and Win32 depends on how you decided to build the project). Shift right-click and launch then two Power Shell instances in the same folder in oder to perform the experimentation.
On the first console run the command:
.\AIPCTest.exe 1 0 1 10 0
you will see the instance running with id 1 (first argument), and waiting for something to happen. It will read on the IPC slot 0, write in the IPC slot 1 for a total of 10 times, and will not initiate with a writing operation (the last 0 argument).
On the second console then run the following command:
.\AIPCTest.exe 2 1 0 10 1
this instance will have the id 2, read from slot 1, write on slot 0 for a total of 10 times, and will initiate the data exchange with the write, triggering the waiting instance 1. What you will end up having will be the following, on PS 1:
PS E:\Projects\AIPC\Bin\Debug\Win32> .\AIPCTest.exe 1 0 1 10 0
Running instance 1...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
While on the second instance the trace will be:
PS E:\Projects\AIPC\Bin\Debug\Win32> .\AIPCTest.exe 2 1 0 10 1
Running instance 2...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
We did it!
Two separate processes have successfully exhanged data with each other without having to use any existing IPC mechanism or system call of any sort.
For the next test you can disable the waiting behavior by commenting out the WAIT pre-processor definition and repeat the test. You will end up seeing a very quick trace which is exactly the same as the last one. Alternatively you can try to run both instances on the same read/write slot, so you can simulate a setup with more concurrency.
Commands for such test are (output included):
.\AIPCTest.exe 1 1 1 10 0
Running instance 1...
Hello from 2...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 2...
Hello from 2...
Hello from 1...
Hello from 1...
Hello from 1...
Hello from 1...
and (again, output included):
.\AIPCTest.exe 2 1 1 10 1
Running instance 2...
Hello from 2...
Hello from 1...
Hello from 2...
Hello from 1...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Hello from 2...
Again, if you want to put on place even more complex setup, you can try to increase the number of processes which fight for accessing the IPC resources, like the following (you will need other consoles to carry on with the test):
Console 1:
.\AIPCTest.exe 1 1 1 50 0
Console 2:
.\AIPCTest.exe 2 1 1 50 0
Console 3:
.\AIPCTest.exe 3 1 1 50 0
Console 4:
.\AIPCTest.exe 4 1 1 50 1
The output will depends on how your OS will schedule the processes between each other, but the important thing to notice is the the output seen is never corrupted, since it's protected by those primordial (but working) locks.
Conclusions
This article has presented a way to perform IPC in a discrete way, which does not rely on any existing official IPC mechanism. The mechanism is autonomous since it does not need any support from the OS (except for the first load of the DLL) and grant safety of the exchanged data.
It makes no assumption on the given data. You write in a slot, loading it with data, and read from it by consuming such data. The protocol and synchronization used is up to the application which exchange such information. Since generic bytes are exchanged, an application can also write encrypted messages, granting safety from external application which could sniff the shared section and peek at the data.
Security considerations
I did not really wanted to bring in security aspect with this approach, since the article is just focused on IPC mechanism, but since I got poked (thank you Randor for the feedbacks) by some comments here we go...
The mechanism it's safe, but not secure; this means that it protects again concurrency but not malicious approach. Anyone which loads the DLL can access the shared section (well, that was the purpose), and gain the access to the memory area which contains the data. Since the API is tiny, the efforts for resverse engineering the code is risible. Once you detect at which address the data is, you can peek at its content or even mess and corrupt it.
I did not design this API to be secure against malicious access, but there are some points which can provide some additional security to the use of this approach:
- Consider that the mechanism is unsafe and consider the fact that anyone can hold the lock and not giving it back anymore; just like any network communication, you don't want your application to crash just because you don't have network connection.
- Provide additional security by encrypting the data which flows in the shared buffers; someone will be still able to peek at it, but is unlikely that it will be able to decypher it.
- To provide additional consistency you can attach also a signature (like SHA) to detect if some data has been corrupted in the meanwhile (some external application which randomly change buffers bytes to provoke a crash at your IPC-enabled utility).
If you want some additional security you can choose for official API of Windows, bu this was not the scope of this article. I provided you, in the first chapter, a list of common and debugged mechanism that can be exploited; take a look at those.
I want to highlight that most of the concern which applies to this mechanism also appy to other official IPC mechanism. For example, you can always peek named pipes, or mess with internal communication (for example intercepting locak socket packets or injecting some). What changes is that you have the OS providing you a secure environment which can be used to minimize the security problems.
If you decide to use such approach with your application, you must know what are you going to do. It can be a powerful tool, but also a terrible headache (as any other approach, in my opinion), so is up to you to take the right pill; red or blue?
History
12 Dec 2017 - Initial release of the article