Introduction
Every time I do a web project, I always find myself reinventing the wheel when it comes to handling caching of web content. Sometimes, I try to do it by version number by adding a ?v=01.04.23, but then a release fails because somebody forgot to update it. Or I try to do it by date ?v=20120101, figuring that downloading once a day isn't too bad and it ensures that the customers see any changes (if rollouts are at night), but then development is a pain and your servers are still hit hard every morning.
What if there was some way to uniquely identify any file in 8 bytes, so that we could only download it again if it actually changed? Well, PKZIP has been doing this for decades, by calculating a CRC32 string for each file in the .zip archive. And I figured that PKZIP and its variants can zip a lot of files really fast, which means calculating a CRC32 must be pretty quick.
Background
If you are not familiar with using false querystrings to version web content, look here for a quick introduction.
If you are not familiar with CRC32, you can read up on it here.
Using the Code
Because CRC32 is a small hash of the file in question, adding or removing even a single byte will change the CRC hash. Which means that this method is 100% reliable to download when the file is changed and not download when the file is the same and cached. While this may not be the fastest method of all time (although it's really fast), it is foolproof because there is no way that anyone can defeat it by forgetting a version number. And files using this method will stay cached, even for years, until the file changes in which case all your users will immediately get the new file.
So how do we use this magical new method? Just add a , true
to your Url.Content
calls. (You are using Url.Content
to add your .js and .css files to your MVC projects, right?)
So, <%=Url.Content("~/Scripts/MyScript.js")%>
becomes <%=Url.Content("~/Scripts/MyScript.js", true)%>
and in Razor, @Url.Content("~/Scripts/MyScript.js")
becomes @Url.Content("~/Scripts/MyScript.js", true)
. That's it!
Now, instead of http://www.example.com/Scripts/MyScript.js, your file will be downloaded as http://www.example.com/Scripts/MyScript.js?crc=2D1BA10F.
So, first, let's look at the code in question:
using System.IO;
using System.Web;
namespace WebCRC
{
public static class CRC
{
public static string Content(this System.Web.Mvc.UrlHelper Url, string Path, bool CRC)
{
if (!CRC)
return Url.Content(Path);
string serverPath = HttpContext.Current.Server.MapPath(Path);
byte[] fileContents = File.ReadAllBytes(serverPath);
string result = CRC32.Compute(fileContents).ToString("X");
return Url.Content(Path) + "?crc=" + result;
}
}
}
- First, you can see that we are overloading
Url.Content
because of this System.Web.Mvc.UrlHelper Url
. This gives us an Extension Method, a method that acts as if it came with the class. - Next, we get the location of our file on our server by using
HttpContext.Current.Server.MapPath(Path)
. - Then we use
File.ReadAllBytes(serverPath)
to get a copy of the file in memory. JavaScript and CSS files are typically very small, so there is no buffering required here. - Finally, we compute the CRC32 checksum
CRC32.Compute(fileContents)
and convert it to a hex string .ToString("X")
. - And last, but not least, we add
?crc={checksum}
to the filename.
So what are the steps for using this feature? It couldn't be easier.
- Reference the WebCRC.dll file in your project's References section by doing an Add Reference...
- On your web page (.aspx or .cshtml file) where you want to add the CRC references, add
<%@Import Namespace="WebCRC" %>
for ASPX files or @using WebCRC
for Razor. - Add
, true
to your Url.Content
calls.
It's that simple!
Doesn't this kill your server by computing these checksums every single time you generate an HTML page?
No, not really. You see, since your web server is already serving these files all the time, they are already cached in memory anyway. And the CRC algorithm has been around for a long time and is very fast on modern computers. This process takes only 2ms (per file reference) on my desktop machine, and probably only 1ms on a decent server. And since most pages take at least 1.5s (1500ms) to render, we can afford 10-20ms on a page to ensure that we don't spend another couple hundred ms downloading stuff too often or support calls telling the user to clear their cache because we forgot something and the page isn't working.
Aren't you worried about CRC collisions, where a new file generates the same hash as the old file?
No, because a file that would generate the same hash as a JavaScript file wouldn't look like JavaScript at all (it would look like binary trash). And it certainly wouldn't look like the same file with one new function added.
Is this limited to JavaScript and CSS files?
No. You could use it for any type of file that you like, such as images. But I would test it if you are using it on files that are large because it might start to become heavy if the files are larger.
Will this work with minified files and gzip?
Of course. As long as the same minifier is used and gives the same minified result each time a rollout is done, the CRC will calculate the same checksum each time. And if the file changes, the minified version must be different as well, so it will get a new CRC.
Gzip is handled by the server AFTER the page is rendered, so it has no effect on this process.
History
- 16th July, 2012: Initial upload