JavaScript and CSS Caching using CRC32

PRMan!!!

4.00/5 (5 votes)

16 Jul 2012MIT5 min read

37.8K

186

A simple method to ensure that new .JS and .CSS files always get downloaded and cached ones never get re-downloaded

Download source - 7.97 KB

Introduction

Every time I do a web project, I always find myself reinventing the wheel when it comes to handling caching of web content. Sometimes, I try to do it by version number by adding a ?v=01.04.23, but then a release fails because somebody forgot to update it. Or I try to do it by date ?v=20120101, figuring that downloading once a day isn't too bad and it ensures that the customers see any changes (if rollouts are at night), but then development is a pain and your servers are still hit hard every morning.

What if there was some way to uniquely identify any file in 8 bytes, so that we could only download it again if it actually changed? Well, PKZIP has been doing this for decades, by calculating a CRC32 string for each file in the .zip archive. And I figured that PKZIP and its variants can zip a lot of files really fast, which means calculating a CRC32 must be pretty quick.

Background

If you are not familiar with using false querystrings to version web content, look here for a quick introduction.

If you are not familiar with CRC32, you can read up on it here.

Using the Code

Because CRC32 is a small hash of the file in question, adding or removing even a single byte will change the CRC hash. Which means that this method is 100% reliable to download when the file is changed and not download when the file is the same and cached. While this may not be the fastest method of all time (although it's really fast), it is foolproof because there is no way that anyone can defeat it by forgetting a version number. And files using this method will stay cached, even for years, until the file changes in which case all your users will immediately get the new file.

So how do we use this magical new method? Just add a , true to your Url.Content calls. (You are using Url.Content to add your .js and .css files to your MVC projects, right?)

So, <%=Url.Content("~/Scripts/MyScript.js")%> becomes <%=Url.Content("~/Scripts/MyScript.js", true)%> and in Razor, @Url.Content("~/Scripts/MyScript.js") becomes @Url.Content("~/Scripts/MyScript.js", true). That's it!

Now, instead of http://www.example.com/Scripts/MyScript.js, your file will be downloaded as http://www.example.com/Scripts/MyScript.js?crc=2D1BA10F.

So, first, let's look at the code in question:

using System.IO;
using System.Web;

namespace WebCRC
{
    public static class CRC
    {
        public static string Content(this System.Web.Mvc.UrlHelper Url, string Path, bool CRC)
        {
            if (!CRC)
                return Url.Content(Path);

            string serverPath = HttpContext.Current.Server.MapPath(Path);
            byte[] fileContents = File.ReadAllBytes(serverPath);
            string result = CRC32.Compute(fileContents).ToString("X");
            return Url.Content(Path) + "?crc=" + result;
        }
    }
}

First, you can see that we are overloading Url.Content because of this System.Web.Mvc.UrlHelper Url. This gives us an Extension Method, a method that acts as if it came with the class.
Next, we get the location of our file on our server by using HttpContext.Current.Server.MapPath(Path).
Then we use File.ReadAllBytes(serverPath) to get a copy of the file in memory. JavaScript and CSS files are typically very small, so there is no buffering required here.
Finally, we compute the CRC32 checksum CRC32.Compute(fileContents) and convert it to a hex string .ToString("X").
And last, but not least, we add ?crc={checksum} to the filename.

So what are the steps for using this feature? It couldn't be easier.

Reference the WebCRC.dll file in your project's References section by doing an Add Reference...
On your web page (.aspx or .cshtml file) where you want to add the CRC references, add <%@Import Namespace="WebCRC" %> for ASPX files or @using WebCRC for Razor.
Add , true to your Url.Content calls.

It's that simple!

Doesn't this kill your server by computing these checksums every single time you generate an HTML page?

No, not really. You see, since your web server is already serving these files all the time, they are already cached in memory anyway. And the CRC algorithm has been around for a long time and is very fast on modern computers. This process takes only 2ms (per file reference) on my desktop machine, and probably only 1ms on a decent server. And since most pages take at least 1.5s (1500ms) to render, we can afford 10-20ms on a page to ensure that we don't spend another couple hundred ms downloading stuff too often or support calls telling the user to clear their cache because we forgot something and the page isn't working.

Aren't you worried about CRC collisions, where a new file generates the same hash as the old file?

No, because a file that would generate the same hash as a JavaScript file wouldn't look like JavaScript at all (it would look like binary trash). And it certainly wouldn't look like the same file with one new function added.

Is this limited to JavaScript and CSS files?

No. You could use it for any type of file that you like, such as images. But I would test it if you are using it on files that are large because it might start to become heavy if the files are larger.

Will this work with minified files and gzip?

Of course. As long as the same minifier is used and gives the same minified result each time a rollout is done, the CRC will calculate the same checksum each time. And if the file changes, the minified version must be different as well, so it will get a new CRC.

Gzip is handled by the server AFTER the page is rendered, so it has no effect on this process.

History

16^th July, 2012: Initial upload

License

This article, along with any associated source code and files, is licensed under The MIT License