(untagged)

ZipFS: Using ZIP files as virtual directories or read-only resource containers in ASP.NET

JaiQ

0.00/5 (No votes)

13 May 2008

A library to use a ZIP archive (with images, script files, resources and so on...) as if it is a virtual directory in ASP.NET application.

Download ZipFS - 111.59 KB

Introduction

In your ASP.NET application, you may have a set of static files and directories with many image, script, resource files in them. Especially extensive javascript libraries with plugin support (like TinyMCE) may have hundreds of files on their own. You probably never modify any of these files, unless you upgrade to a newer version of those libraries.

When you create a new project, which is using same set of such libraries, you will have to copy whole set of directories and files to new project folder (if you want a simple x-copy deployment you have to...) and end up with thousands of duplicate files polluting your hard-drive. (and if you are like me and using version control systems similar to SVN (a must-have for me), than your job will be easier on repository side, as copying a directory means just creating a link in SVN, but this will not save you from having a new local copy of every file in your working copy).

And here comes my solution proposal: Why not keep these libraries (or other resources) in a zip file of their own (as most of them already distributed as zip files) and use them as if that ZIP file is a virtual directory on ASP.NET server? (note: "virtual directory" here is not related to IIS virtual directories)

Background

First of all, why ZIP?

Zip is a simple and standard format that can be manipulated with almost any archiving software or Windows itself. Using a non-standard format (like a virtual file system) would make preparing and modifying required files harder.
Almost any javascript library comes as ZIP archives (you may easily create one your own if they don't).
Thanks to small and powerful Ionics .NET ZIP library (DotNetZip, http://www.codeplex.com/DotNetZip) it is very easy to read ZIP file contents in ASP.NET applications without any third party dependencies (just a 35kbyte DLL file is enough).

Note: When you open a ZIP archive, ZipFile object was loading the whole ZIP file into memory till 1.3 version, but as of version 1.4, by making use of streams, it is only loading file header entries which is far more memory effective, and faster.

Our requirements for completed system are:

ZIP files should be seen as virtual directories, and no configuration should be done on IIS side (as we might not have access to IIS configuration).
Accessing a file inside a ZIP archive should be almost as fast as accessing a physical file directly (particulary when the file is packed with no compression. you might want to use no compression option when performance, and server load is a concern).
Even though DotNetZip library loads only file headers of a ZIP file (not file data itself), it takes some time to locate and read this information. And each time a file inside a ZIP archive requested, we should avoid opening that ZIP file again, and load same headers. So, we should open the ZIP file at first request and load headers, and use that reuse that information by caching in subsequent request.
As we are going to cache ZIP file headers, we should find a way to recognize when the ZIP file itself is modified (by loading a new one?) and update our cache automatically (using last modified time, and size of ZIP file should suffice).
As long as ZIP file itself doesn't change, we should help client browser to cache files, and avoid load on our server to extent possible.
It should be as easy as possible to reference files in ZIP archives. In this solution, an URL such as "http://www.site.com/zip.axd/js/tiny_mce.zip/license.txt" will be equal to "license.txt" inside "http://www.site.com/js/tiny_mce.zip" archive (where zip.axd is a virtual page which will be handled by a IHttpHandler implementation we are going to develop)
As libraries like TinyMCE loads their additional files from subfolders below the location of their main .js file, solution shouldn't cause them to stop working. For example, TinyMCE's directory structure on disk looks like this:

\js\tiny_mce\
    tiny_mce.js
    simple\
        editor_template.js
    advanced\
        editor_template.js
    plugins\
        advhdr\
            editor_plugin.js
    ...        
When TinyMCE's main file, tiny_mce.js loaded, it loads its advanced\editor_template.js, and plugins\advhdr\editor_plugin.js itself. So if we had used a querystring based system for our virtual file system like "http://www.site.com/zip.axd?zip=/js/tiny_mce.zip&file=tiny_mce.js" then browser would try to load editor_template.js from a location like "http://www.site.com/advanced/editor_template.js", because browser would think it had loaded tiny_mce.js from our application's root directory (as zip.axd virtual page seems to reside in root), and it would simply fail.

So we will rely on a useful feature of ASP.NET named PathInfo. For a requested URL like "http://www.site.com/zip.axd/js/tiny_mce.zip/tiny_mce.js", handler page URL would still be "http://www.site.com/zip.axd" and Request.PathInfo would return "/js/tiny_mce.zip/tiny_mce.js". Taking it's part till the ".zip" extension, we would say that the ZIP archive we are going to work with is "/js/tiny_mce.zip" and the requested file is "tiny_mce.js" in archive root. But this time browser thinks that tiny_mce.js resides in "http://www.site.com/zip.axd/js/tiny_mce.zip/", so when it requests "advanced/editor_template.js" it will use "http://www.site.com/zip.axd/js/tiny_mce.zip/advanced/editor_template.js" and our virtual page (http://www.site.com/zip.axd) will handle that request happily too...

And may be a confusing, but very useful one... I may want to make some modifications to a few files in archive or completely replace them with something else, but don't want to modify the original ZIP file every time. I want to keep my modifications in a special folder and want my handler to automatically load them instead of files in ZIP archive itself. For example, you only changed tiny_mce.js in root folder of TinyMCE distribution. Now instead of re-archiving it in "tiny_mce.zip", you create a folder as "tiny_mce" and put your modified "tiny_mce.js" there. When the "tiny_mce.js" file in "tiny_mce.zip" is requested, our handler will first lookup for a file in a folder called "tiny_mce" (without ".zip" extension, in the same folder as ".zip" file) and search file "tiny_mce.js" there. If it exists (which is true in our sample), it will simply return its contents, and ignores the one in the archive. If didn't exist, fall back to load from the ZIP file itself. This will be very helpful to track changes of what you changed in a base library, and simplify patching.

Using the code

Our solution is solely composed of a IHttpHandler implementation (Poligon.ZipFS.ZipFSHandler), which handles requests with URL's that starts with "~/zip.axd". This handler resides in "Poligon.ZipFS.dll" file which can be found in sample archive. Place it in your web site BIN folder, along with Ionic.Utils.Zip.dll and add following lines to your web.config file under "system.web\httpHandlers" section:

<configuration>
  <system.web>
    <httpHandlers>
      <add verb="GET,HEAD" path="zip.axd" validate="false" 
         type="Poligon.ZipFS.ZipFSHandler"/>
    </httpHandlers>
  </system.web>
</configuration>

If you are using ASP.NET development server, and not IIS, it may not work as expected, as ASP.NET development server doesn't act like IIS for URL's with path information. To fix that, add following lines in system.web\httpModules section:

<configuration>
  <system.web>
    <httpModules>
      <add name="InternalServerFix" type="Poligon.ZipFS.InternalServerFix" />
    </httpModules>
  </system.web>
</configuration>

Please note that, this second change is only required when using ASP.NET development server. You don't have to use it when working with IIS 6 (didn't test with other IIS versions...)

To demonstrate how it works, i created a sample web site with a single page in it (default.aspx). This page contains a link to "index.html" in "beatiful.zip" that resides in same folder. "Beatiful.zip" has a simple web site template with some html, css, gif, jpg files in it. So, when you click the link, Beatiful Day web site template will be launched directly from the zip archive.

Let's have a brief look at classes in Poligon.ZipFS library...

ZipFileCache represents a single ZIP archive, whose file header entries are cached in memory. It automatically detects changes in underlying ZIP archive (by means of last modification date, and size) and reloads header information when needed. It also keeps a dictionary of [file name --> zip file entry] pairs for fast access to zip file entries by file name. You create an instance of it by providing absolute path of a ZIP file to its constructor. ZipFileCache provides only one remarkable public function which is:

public bool ExtractStream(string filePath, Stream stream)

Given name of a file in zip archive, it first checks if ZIP archive is modified since last time headers are loaded, reloads them it if it did, than extracts file contents to given stream. If no file by that name is found in archive, false is returned.

ZipFileCache keeps ZIP file open but only locks it for writing during extraction. ZIP archive can be safely replaced when no extraction is in process.

ZipFileCache is also thread safe, as it synchronizes access through the cache. I used OneManyResourceLock by Jeffrey Richter (Wintellect) for synchronization, which is simply a lock that allows multiple readers, and a single writer. It is more effective than a mutex (C# lock keyword) when more than one reader can access a resource at same time. In our case, multiple threads can read archive at same time but when archive is modified (which is rare), and needs to be reloaded, other threads should wait till it is done reloading.

Please note that filePath should be specified using forward slashes (not backward slashes) like "folder/subfolder/file.txt".

ZipFSCache is simply a static collection of ZipFileCache objects, one per each ZIP file. It also allows thread-safe access to this collection. It creates an instance of ZipFileCache object when a ZIP file is first accessed using its ExtractStream function:

public static bool ExtractStream(string zipFilePath, 
    string filePath, Stream outputStream)

It requires a third parameter (first one) in addition to the two in ZipFileCache.ExtractStream method. It is the full path to ZIP archive file (this time using backward slashes...)

ZipFSHandler is our IHttpHandler implementation, that handles requests starting with "~/zip.axd". It uses ZipFSCache to extract files from ZIP archives and may also choose to send content directly from physical files, if they are found in a special location (the folder without ".zip" extension, as i explained before).

As library code is well documented, you may inspect it to understand how it works.

Points of Interest

ZipFSHandler does it's work nicely, and simply, but may be simpler than it should be. It may sometimes be a security risk to send every ZIP file content without checking if user allowed to access it or not. So, if you have some ZIP files, that you don't want everybody to access, you may have to secure them somehow or add some configuration options to ZipFSHandler. One simple thing i would suggest is to rename ZIP files that you want public to access to .zipfs extension, and modify ZipFSHandler to work with that extension, instead of every ".zip" file.

It also keeps ZIP files open, and never close them. I think that when thousands of users requests files from zip files, reopening them for each request would be slower and use more resources than keeping one file handle per each ZIP file open. You may choose to close files after each extraction...

History

14.05.2008 - First version released in CodeProject.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here