Abstract
Several of my recent consulting projects dealt with composite applications,
specifically desktop composite applications. A composite application consists
of a host (shell) and a number of plugins, often developed by different teams
of programmers.
In this scenario it is usually desirable to isolate the host from plugin failures.
I often used AppDomains
for this purpose. Eventually, I came to the conclusion
that AppDomains
are not very good isolators, for two main reasons:
- Error handling is very difficult to do right.
- Unloading plugins is not guaranteed.
Thus, meeting even basic robustness requirements for the application is difficult or even impossible.
This does not mean that AppDomains
are useless. They still provide convenient
partitioning mechanism, especially if one team controls all moving parts. The shortcomings outlined in this
article may or may not be important for a particular project. If the project is able to tolerate certain
degree of failure, AppDomains
may still be a viable isolation solution for it.
The Idea Behind AppDomains
In a nutshell, AppDomains
were invented for efficient isolation of third party code (plugins, components, web applications).
A host process, such as ASP.NET server, needs to load plugins (web applications) securely and
efficiently. Of course, Win32 processes already provide such isolation, but they were deemed too
heavyweight for the job, as described in
this blog entry by
Chris Brumme from Microsoft.
Main difference between an AppDomain
and a process is that processes have
their own threads, and AppDomains
don't. To visualize that, let's compare
threads to cars. While you drive your Mercedes Thread in the AppDomain
of USA you can see only American data. You can drive it to the AppDomain
of Canada,
but the moment you cross the border, you are cut off from American data and can now see
only Canadian data. Your Mercedes Thread is not pinned to a particular AppDomain
.
However, no matter what road it takes, it cannot leave the process
of North America (we ignore Panama land bridge for the sake of argument).
Similarly, someone driving a Mercedes Thread in the process of Europe may cross
from the AppDomain
of France to the AppDomain
of Spain, but they
can never reach North America and access American or Canadian data.
Isolation Requirements
To run a reliable, secure, and efficient host, our isolation mechanism should have the following properties:
- We must be able to load and execute plugins, with restricted security if necessary.
- Plugins should not be able to corrupt host data.
- If a plugin fails, the host must be able to detect this and unload the failing plugin.
- It must be possible to unload plugins on demand.
- Unloading a plugin should clean up any resources allocated for that plugin. If it does not,
the host process will accumulate waste and will eventually fail.
Operating system processes satisfy all these requirements. Achieving restricted security
for a child process may be tricky, but it is typically possible.
Unfortunately, AppDomains do not fare very well with these requirements. They do excellent job
with #1 and #2. One can easily restrict security of the plugin,
and host data is protected. However, we run into major difficulties with #3, #4.
The sad reality is that
- There is no way to reliably detect a failure in an
AppDomain
. And, even if we could - There is no way to reliably unload a failing
AppDomain
.
Also, there are some issues with #5. Per Chris Brumme
there is a small memory leak on each AppDomain
unload. More importantly, there is no way to unload
any domain neutral assemblies: once loaded into the process, they are there to stay. This, however, looks minor compared
to the problems we have with the exception handling.
Legacy vs. Default Exception Handling
Default Exception Handling
By default, an unhandled exception in any thread terminates
the application unconditionally. This is bad news for runtime hosts. If a plugin creates a thread and that thread causes
an unhandled exception, the whole host process dies. We can do last ditch effort error handling in AppDomain.UnhandledException
handler, but termination of the process cannot be stopped.
In WPF and Windows Forms applications, UI threads can be protected from unhandled exceptions, because they have a
built-in try/catch
block supplied by the UI framework. However, worker threads lack such protection.
In a desktop application it is considered best practice to perform long operations on a worker thread. So, the
scenario where a plugin spawns a worker thread and that thread causes and unhandled exception is very real and possible.
This makes default exception handling policy a bad choice for host-plugin architecture.
Legacy Exception Handling
Fortunately, default exception handling is not the only option. Prior to .NET 2.0 unhandled
exceptions in worker threads did not automatically kill the process. To revert to this legacy behavior
we can add the following snippet to the application configuration:
<configuration>
<runtime>
<legacyUnhandledExceptionPolicy enabled="1"/>
</runtime>
</configuration>
Unfortunately, this still does not buy us full protection from plugin failures - read on.
Exception! Whose Fault Is That?
To effectively unload the crashing plugin we must first detect which plugin has crashed.
Frankly, even with legacy exception handling this is virtually impossible.
When an unhandled exception occurs, the framework raises
AppDomain.UnhandledException
event. Each AppDomain
may have its own UnhandledException
handler. In a typical scenario,
UnhandledException
will first be raised in the failing AppDomain
and then again
in the main AppDomain
. This works reasonably well if the exception type is [Serializable]
.
But if it's not, by the time the flow execution reaches main AppDomain
things become muddy:
- The original exception is replaced with
SerializationException
. - Information about
AppDomain
that caused the exception is lost. - A parasitic
SerializationException
will be thrown in the main AppDomain
.
SerializationException
contains surprisingly little information about what happened.
At this point is not distinguishable from a genuine unhandled SerializationException
that could have occurred in the host itself.
The original idea of the AppDomain.UnhandledException
design was perhaps to allow main AppDomain
to process all unhandled exceptions regardless of origin. In practice that goal was not achieved.
It is also worth noting that most user-defined exception classes will not be marked as [Serializable]
.
simply because application programmers don't see a need to do that.
The host may try to pass exception information from the plugin's AppDomain
using some custom
method. E.g. UnhandledException
handler in the plugin's AppDomain
can explicitly call
a centralized exception monitor object located in the main AppDomain
, passing it only
serializable objects like plugin's AppDomain
name and exception string. This scheme, however,
would still be prone to failure, because the plugin's AppDomain
may be in unknown state after
an unhandled exception, and successful communication with the host's exception monitor cannot be guaranteed.
A mechanism supported by the framework is required for reliable operation, but such mechanism does not exist.
Unloading Failing Plugin
Even if we managed to figure out what plugin is causing trouble, this is not the end of the story.
There is no way to gracefully unload plugin that is in an unknown state.
If plugin is executing native code that cannot be interrupted (e.g., file I/O), it will not
be unloaded at all. AppDomain.Unload()
will fail with an exception similar to this:
System.CannotUnloadAppDomainException: Error while unloading appdomain. (Exception from HRESULT: 0x80131015)
If plugin is executing background threads, they will be aborted with ThreadAbortException
.
In default exception handling mode this exception will be then quietly swallowed by the framework. However,
in legacy exception handling mode it will raise AppDomain.UnhandledException
in the main
AppDomain
with AppDomainUnloadedException
.
Again, AppDomainUnloaded
exception carries surprisingly little information. In particular,
it does not say what AppDomain
was unloaded. Therefore, it is impossible to figure out
whether this is an expected exception from dying background threads of a plugin that is being unloaded, or
some other peculiar error.
ASP.NET uses AppDomains. How Does It Survive?
Experiment shows that ASP.NET takes a hands-off approach to reliability. Each application pool runs
a worker process (w3wp.exe
). Each web application in the pool runs in an AppDomain
.
When an application causes an exception on a worker thread, the whole process dies, taking down all
other applications, perfectly good applications with it. If those applications were processing web
requests, these requests will be remembered. ASP.NET will then create a new worker process, and pass
it cached requests (if any) for handling.
This approach works relatively well mostly because the Web is stateless. Any state passed between
requests, such as cookies is small and well-defined. The demise and resurrection of the ASP.NET worker
process remains invisible to the user or the application programmer, unless they take special steps
to detect it.
Obviously, such hands-off approach is not viable for a desktop application: restarting the whole
application and losing unsaved data when a single plugin fails would not be welcome by the users.
Conclusion
AppDomains
provide certain degree of isolation between parts of the application,
but this isolation is limited. A number of design decisions and features of .NET framework
make proper error handling very difficult. Exceptions pop up in unexpected places, and exception
objects carry very little context with them.
Unloading plugins is not guaranteed. This is hardly framework designer's fault: Windows
threads were not designed to be gracefully interruptible, but this gives little consolation
to the application authors.
Depending on the requirements, AppDomains
still can be very useful, especially
if efficiency is more important than absolutely reliability, such as in case of ASP.NET.
However, for truly isolated application one may want to consider using processes
instead of AppDomains
, like in Baktun Shell.
Unfortunately, this is not a panacea either: multi-process desktop application are not
mainstream, and many unexpected pitfalls may arise, especially when using third party libraries.
For better or for worse, such is the nature of software development: there is no easy way out,
it is all about tradeoffs.