Introduction
I have been a .NET developer for a couple of years and this is my first CodeProject article (to be honest - my first article ever!) and it just happens that it covers areas which don't have anything to do with
the .NET Framework. Over the last couple of years I have worked mostly with .NET so I consider myself somewhat experienced in that field. I also had a job which required working with Delphi before so I
am not totally foreign to unmanaged languages. My last job had a strong requirement for C++ (besides .NET) which I needed to get familiar with and learn quickly. This was an year ago. From that time on I dedicated most of my spare time to learning C++. It was a hard way for me, but I believe all that time spent finally paid off because I think I have
a much better understanding of what is going on "under the hood" when writing my code nowadays than before.
Nevertheless, I still find it hard to work with C++ after getting used to
the commodity that the .NET Framework offers. I have found it to be specially annoying that everyday programming problems like working with file systems on major
Operating Systems don't have easily-findable and well covered solutions so I decided to do something useful - exercise my newly acquired C++ skills with
a more challenging task and give back something to the community which I consulted every five minutes looking for answers and explanations when I started this journey. I just hope you'll find it useful too.
Using the code
In this article you'll find a download link for QT Creator solution which can be compiled with GCC 4.6.3 - 4.7.2 and MinGW 4.6.2 as well as complete, compiled command line applications without any additional dependencies for Linux (x64) and Windows (x86). My solution tries to extend what
boost::filesystem
has to offer so you'll need Boost libraries source 1.47 - 1.52 compiled with
the -std=c++0x or -std=c++11 (GCC ) flags. Just for illustration to C++ newcomers like
me, I have compiled it with static linkage trying to avoid any additional dependencies like so:
./b2 cxxflags="-std=c++0x" toolset=gcc link=static runtime-link=static
threading=multi --libdir="<path to your boost libraries folder>
" --includedir="<path to your boost include folder> "
You don't really need static linking, that was just my choice at the time but keep in mind that C++11 support is a requirement. The solution that
is available for download is a completely usable command line/terminal application which illustrates what functionalities I
have exposed over the main FileSystemController
class.
To use it in your own project, you won't need anything from the main.cpp, COMMAND_PROCESSOR.hpp, and CustomLocale.hpp files, so you can safely exclude them.
The file COMMAND_PROCESSOR.hpp also contains usage examples
for the FileSystemController
class which was one of
the main objectives of the code provided for download. The bottom line is that I tried to create backend support for
the standard file browser with all basic functionalities like Copy, Move, Delete, Move to Trash /Recycle Bin, watching changes on
a specified location, and search. I hope it will cover most of the stuff that you'll ever need when managing files on Windows or Linux.
I
have used Qt Creator which seems to be a great tool even for non-Qt C++ projects and is free to use, and most importantly projects can be used across different platforms with almost no adjustments. To compile the project yourself you will need to set include/lib paths for Boost in
the Qt Creator .pro file. I believe that you could encounter a few problems compiling them with MinGW on Windows when
the -std=C++0x flag is included. In that case, compile problematic libs separately with
the "--with=..." bjam option excluding the previous C++11 flag and use them instead, at least that is what I did and haven't had any problems so far. In
the provided .pro file, there are two commented QMAKE_CXXFLAGS
sets which you also need for compilation on different platforms (GCC-Linux or MinGW-Windows). To set platform specific code for compilation, open
the "BaseDecl.h" header and uncomment/comment the appropriate
#define
statement which indicates the targeting platform (#define WINDOWS
or
#define LINUX
).
Addition: I adapted previous Qt project to make it compile with VS2012 by creating VS solution with modified code and added download link for it. Overall structure of the program and FileSystemController dependencies stays the same, only a couple of files specific to linux were removed since in this case they weren't needed. If portability is an imperative, you can still use previously presented code which can be compiled on both Windows and Linux platforms using GCC or MinGW compilers. Keep in mind that you would still need Boost libraries compiled with VC++ and to modify Additional Include Directories, Additional Library Directories and Additional Dependencies (if Boost naming scheme is changed during compilation) in project properties to make provided VS project work.
Finally, I need to mention that I have tested
the FileSystemController
class (and the FSHelper application) with Windows versions from XP SP3 to Windows 8 (excluding Vista, but I don't see the reason why it wouldn't work there), and multiple Linux distributions like OPEN SUSE 12.2 (kernel:3.4), Fedora 17 (kernel:3.6), Ubuntu 12.04 (kernel:3.2), and Linux Mint(kernel:3.2).
About
the article content
Although I have provided more or less a complete solution to standard file system functionalities, I'll try to cover just two of them which were to my understanding almost neglected so far:
watching file system changes and sending / recovering files from Trash on Linux. I'll try to illustrate what I have discovered so far while writing and testing this portable code in
a two-part article.
Implementing FileSystemWatcher .NET-like solution on Linux
The only place I have found
a complete usable solution to this problem out of the box was in the Qt framework which is great but introduces dependencies you would probably want to avoid most of the time for various reasons when working with C++. I have managed to end up with platform specific API calls, C++ standard library, and Boost libraries which are widely used and considered to my understanding
as natural extension of the previous. When I started coding FileSystemWatcher, it wasn't really that hard to come up with
the first working samples on Linux since documentation happens to be quite extensive and detailed. You can easily find
a number of code samples that show off how to read file system events with read()
function calls. That part is easy. All we need is the buffer to store results from
the read()
call and two variables - for storing the actual length of
the buffer content and for tracking the current position while reading the buffer since one reading can contain multiple events:
ssize_t len = 0;
ssize_t i = 0;
char buff[BUFF_LEN]={0};
len = read(fileDescriptor, &buff, BUFF_LEN);
while((i < len) || stopLoop)
{
struct inotify_event* event = (struct inotify_event*)&buff[i];
...
i += EVENT_SIZE + event->len;
}
The buffer we have provided to the read()
function can return multiple events at once so we need to parse each of them in the form of
struct inotify_event
. The problem is that the call to the read()
function is a blocking function call, and I can't imagine who would need such a limited solution which doesn't allow
an easy way for interruption so I tried to come up with something more acceptable. By further research, I stumbled upon
the select()
function system call which should provide what we need in this case - that is,
a way to get notification that something happened in the file system and that we need to call
the blocking read()
function in that case only. Once I thought I have finally shaped my solution I was unpleasantly surprised that it didn't work on my Ubuntu 12.04 x64 machine as expected. By going through
a trial-and-error process and further research I found something that I considered finally acceptable. In the main class I have provided in
the solution for download (FileSystemController), you can see that I'm using
a dedicated thread for listening to file system events and that I instantiate this class only when file system monitoring is needed (all other functionalities are provided
through static function calls). This is just a convenience to save you from the trouble of implementing your own cleanup after you're done with event listening. The main part of
the Linux implementation is the function process_events()
which runs on
a dedicated thread:
void FileSystemWatcher::process_events()
{
m_wathcerThreadActive = true;
int fileDescriptor = inotify_init();
if(fileDescriptor < 0)
{
boost::unique_lock<boost::mutex> lock(m_ResultsLock);
m_results.AddResult(OperationResult(false, TO_STRING_TYPE("FileSytemWatcher::process_events"),
TO_STRING_TYPE("FILE DESCRIPTOR ERROR")));
return;
}
auto watchEvents = m_changesToWatch == ChangeKind::BasicChanges ? IN_CREATE | IN_MOVED_FROM | IN_MOVED_TO
| IN_DELETE | IN_DELETE_SELF | IN_MOVE_SELF
: IN_ALL_EVENTS;
int watchDescriptor = inotify_add_watch(fileDescriptor,m_locationPath.c_str(), watchEvents);
if(watchDescriptor < 0)
{
boost::unique_lock<boost::mutex> lock(m_ResultsLock);
m_results.AddResult(OperationResult(false, TO_STRING_TYPE("FileSytemWatcher::process_events"),
TO_STRING_TYPE("FILE WATCH DESCRIPTOR ERROR")));
return;
}
register bool stopLoop = false;
struct timespec timeout;
timeout.tv_sec=0;
timeout.tv_nsec = WAIT_FOR_EACH_EVENT_DURATION;
struct pollfd pfd;
pfd.events = POLLIN; pfd.fd = fileDescriptor;
while((!m_stopThread))
{
if(stopLoop)
break;
if(!boost::filesystem::exists(m_locationPath))
break;
ssize_t len = 0;
register ssize_t i = 0;
char buff[BUFF_LEN]={0};
int result = ppoll(&pfd, 1, &timeout,NULL);
if(result == -1)
{
boost::unique_lock<boost::mutex> lock(m_ResultsLock);
m_results.AddResult(OperationResult(false, TO_STRING_TYPE("FileSytemWatcher::process_events"),
TO_STRING_TYPE("ERROR IN EVENT PROCESSING")));
continue;
}
if( result > 0)
{
len = read(fileDescriptor, &buff, BUFF_LEN);
while((i < len) || stopLoop)
{
struct inotify_event* event = (struct inotify_event*)&buff[i];
stopLoop = fireEvent(event);
i += EVENT_SIZE + event->len;
}
}
}
string empty;
m_changed(_EDATA(EventKind::watcherSignOut, empty));
inotify_rm_watch(fileDescriptor, watchDescriptor);
close(fileDescriptor);
m_pause = false;
m_stopThread = false;
m_wathcerThreadActive = false;
m_finalizer.notify_one();
}
Basically, the function starts by setting a signaling flag that the listening thread
has started (m_watcherThreadStarted
) and after that tries to obtain
a file descriptor integer value for the file system location we want to monitor. This is done with
a Linux system call inotify_init()
:
int fileDescriptor = inotify_init();
if(fileDescriptor < 0)
{
}
This is also one of the critical parts of the function which can return
an error in the form of a file descriptor value lesser than zero. It can happen for various reasons and while it usually succeeds, it just happens on rare occasions that it doesn't, without my clear understanding why. Because of that we should check
the returned value and stop further execution if unsuccessful. After that we need to obtain
a watch descriptor value by associating the file descriptor we previously obtained to
the actual file system location we want to monitor. We also need to provide the kind of events we want to watch for. These events are identified by predefined bit values like
IN_DELETE
, IN_CREATE
, and others and their combination defines the signal mask. This is also critical for further event processing and we cannot continue if this call returns error. The most frequent reason why this part of
the function could fail is because of insufficient read permissions to the location specified:
int watchDescriptor = inotify_add_watch(fileDescriptor,m_locationPath.c_str(),watchEvents);
if(watchDescriptor < 0)
{
}
The next thing is to decide how you should be handling the blocking
read()
function call in non blocking manner. You can use select()
/
pselect()
Linux functions to check if there is something to read by
the read()
function or you can use the approach I have chosen and use
a poll()
/ ppoll()
function call for the same purpose. All of these provide a way to supply them with timeout values so the listening loop can continue and check if watching needs to be stopped. I must warn you that the first approach I'm going to discuss (select()
/
pselect()
) didn't work well on all Linux distros I've tested - to be precise, it didn't detect events on Ubuntu 12.04 x64 with kernel version 3.2.0.35. On the other hand,
the second approach (poll()
/ ppoll()
) should be safe to use from kernel version 2.6.16 (ppoll()
documentation) but I didn't have the opportunity to support these claims with testing on lower kernel versions.
Using select() system call:
Parts of this approach you can see commented inside
the code sample I have provided above. If you had the same problem to solve like I did, you could find clues around various forums on the internet for
a solution similar to the next one. First you would need to create three file descriptor sets which will be used to call the select()
Linux API function since Linux man documentation clearly states that this function can be used to monitor multiple file descriptors waiting while at least one of them becomes "ready", which in our case means to become ready to call
the read()
function without blocking:
fd_set read_fds, write_fds, except_fds;
As
the file descriptor set names imply, you could guess what they are for. The read file descriptor set is used to watch for changes which indicate there is something to read without blocking while
write_fds
and except_fds
monitor if we are able to write without blocking or if
an exception occurred. I was a bit surprised by the fact that I needed to provide
write_fds
and except_fds
sets as well as read_fds
to have
a successful read notification by the select()
call.
After that we should start a loop with a couple of signaling flags which will be used to stop further event processing if needed. Basically, the listening loop should be stopped if we decide to stop monitoring
a specified location or in case this location doesn't exist as such any more. This is why we have to act on events defined by bits
IN_DELETE_SELF
and IN_MOVE_SELF
(when handling an event notification) which means that the location we are currently watching for events is either permanently deleted, moved to Trash, moved somewhere else on the file system, or renamed (last one usually brings
a pair of events identified with IN_MOVE_FROM
and IN_MOVE_TO
). For details on this, look at my fireEvent()
function implementation.
A call to select()
also needs associated file descriptor(s) to each file descriptor set. My code sample (commented parts of the process_events()
function)
shows that I've done this association in the loop since I have found information that
the select()
call also modifies file descriptor sets in a destructive way during the process.
One more thing is needed to make the select()
call and that is defining the timeout period during which select()
should block and listen for possible events.
I have seen that a lot of people have made the same mistake implementing this timeout, while searching for my answers on various forums. If you choose to use
the select()
call,
you should be aware that it makes adjustments to the time interval specified by struct timeval
to indicate how much time is left until timeout.
Because of this we need to reinitialize this value each time in the loop. On the other hand, if you choose to use
a pselect()
call which uses struct timespec
with a higher time resolution (nanoseconds) you don't have to worry about it because no changes are made to
the timeout:
while((!m_stopThread))
{
...
FD_ZERO(&read_fds);
FD_ZERO(&write_fds);
FD_ZERO(&except_fds);
FD_SET(fileDescriptor, &read_fds);
FD_SET(fileDescriptor, &write_fds); FD_SET(fileDescriptor, &except_fds);
struct timeval timeout;
timeout.tv_sec = 0; timeout.tv_usec = WAIT_FOR_EACH_EVENT_DURATION;
auto result = select(fileDescriptor, &read_fds, &write_fds, &except_fds, &timeout,);
...
}
Finally, we should check for the final outcome of the select()
function call. In case of an error it returns -1, in case of
a timeout, the return value is 0, and the value we are most interested in is
a number greater than 0 which indicates that the read()
call would not block.
Using the poll() / ppoll() API call
As I've mentioned previously, I encountered a problem when using the previous approach on one of my test Linux configurations. After digging through
the Linux documentation I discovered that the same thing as previous can be achieved with poll()
/ ppoll()
function calls. This approach proved to be much more reliable and I didn't have any problems detecting changes this way. The difference between
the poll()
and ppoll()
functions is they later prevent certain race conditions concerning the timings of signal arrival and
the actual underlying poll()
calls. That applies in case we are using signal handlers as an answer to specific system signals as well as
a poll()
call in the same code. At least I understood it that way since I don't have any experience with Linux signals to be honest. Since I don't use signal handlers the basic reason I decided to go with
a ppoll()
call over poll()
at the time was the possibility to have a finer timeout control (nanosecond vs. millisecond resolution). This argument is beaten over time also because I ended up with
a time interval of 0.1s which can be easily achieved with a lower time resolution in milliseconds. I didn't even set
the sigmask
argument of ppoll()
and I left it with
NULL
which makes it, according to documentation, the same as calling
poll()
, so I presume you could get away with a poll()
call instead without any problems. These system calls accept an array of struct pollfd
which has
a field where the file descriptor can be set and an events
field where we can set what we want to monitor. According to documentation,
the POLLIN
value should be fine for notifications that there is something to read on
the specified file descriptor. In my example of FileSystemWatcher
, I am monitoring changes for only one location (file descriptor) so I just needed
a way of reliable notification provided by poll()
/ppoll()
. I didn't use the ability of this system call to get events that occurred from
the revents
field in struct pollfd
because I wanted to extract the actual path of
the affected file system item and that is easily achievable over event information returned by
the read()
system call:
std::string getNotificationFileName(const inotify_event *event, const std::string& path)
{
std::string fName(event->name, event->len / sizeof(char));
if(fName[fName.length() - 1] != '/')
fName = path + "/" + fName;
else
fName = path + fName;
return fName;
}
If you look at my complete implementation of FileSystemWatcher in C++, you will see that I'm using boost signals from
the boost::signals2
namespace which should be safe to use across different threads to send
the actual event paired with a full file/folder path that was affected on the location watched. The only thing I would suggest is that you implement
a consumer pattern implementation the right way which I did not do for the sake of simplicity inside
the registered handler so that events signaled this way can be processed sequentially when received.
Brief overview of Windows FileSystemWatcher in C++
Although this implementation deserves
a much detailed explanation than just a "brief" one I'm not going to do that because I adapted something which was already well described in another great article here so mostly all credits go to
the person who wrote it. I can just emphasize a couple of steps that can be problematic or at least where problematic for me. Watching is done on
a dedicated thread with the loop just like in Linux with a difference that we are actually opening
an additional thread which belongs to the asynchronous call to the ReadDirectoryChangesW()
WinAPI function. This second thread should be used to continue watching by making another call to
ReadDirectoryChangesW()
inside the callback function and it should signal actual event data over Boost signals (boost/signals2). The first thread is needed to accept
a potential request to stop watching and to check if the location we want to watch still exists. I haven't been able to get
a notification in case the actual file system location we are watching is deleted, moved, or renamed, and that is why I decided to periodically check if it actually exists in the loop.
If you want to check my implementation, look for it in the FileSystemWatcher_Windows.cpp file which is part of
the Qt Creator project available for download.
Conclusion of Part 1
The next time, I am going to discuss a way to implement Trash Linux functionality and how to recover files from there using C++ and Boost. This seems to be
a widely neglected area without much explanations to follow. I personally haven't been able to find any solution for this problem so far. I will also briefly touch
the problems I encountered while implementing the same functionality on Windows, targeting
the Recycle Bin.