Introduction
This article demonstrates how to use HttpResponse.Filter
to easily reduce the output size of your website.
Background
Recently, we redesigned the web site for Layton City. Because the redesign made it much easier for citizens to find what they were looking for, our hits per day nearly tripled overnight. Unfortunately, so did our bandwidth. We're currently serving almost 60Mb a day of just HTML. That doesn't include images or Adobe� Reader� documents. So priority #1 became reducing our bandwidth without reducing usability or having to rewrite the majority of our pages.
One downside of using some of the ASP.NET controls is that they insert lots of whitespace characters so that developers can easily see where problems are. While that is desirable during debugging, there is no means of turning that functionality off when you have released your site.
After finding an article on HttpResponse.Filter
in the Longhorn SDK (here), we decided to use HttpResponse.Filter
to intercept our outgoing HTML and squish it.
Using the code
Add the WhitespaceFilter
class to your project, and add the following line of code into the Application_BeginRequest
function in your Global.asax file:
Sub Application_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)
Response.Filter = New WhitespaceFilter(Response.Filter)
End Sub
The above code causes the compressor to be added to every single page in your application. Alternatively, if you only want to compress individual pages, you can add the line to the Page_Load
event.
Whitespace.vb
Comments are inline. Some of the weird lines are in to help compress specific portions of the website. (Updated 1/23/2004)
Imports System.IO
Imports System.Text.RegularExpressions
Public Class WhitespaceFilter
Inherits Stream
Private _sink As Stream
Private _position As Long
Public Sub New(ByVal sink As Stream)
_sink = sink
End Sub
#Region " Code that will most likely never change from filter to filter. "
Public Overrides ReadOnly Property CanRead() As Boolean
Get
Return True
End Get
End Property
Public Overrides ReadOnly Property CanSeek() As Boolean
Get
Return True
End Get
End Property
Public Overrides ReadOnly Property CanWrite() As Boolean
Get
Return True
End Get
End Property
Public Overrides ReadOnly Property Length() As Long
Get
Return 0
End Get
End Property
Public Overrides Property Position() As Long
Get
Return _position
End Get
Set(ByVal Value As Long)
_position = Value
End Set
End Property
Public Overrides Function Seek(ByVal offset As Long, _
ByVal direction As System.IO.SeekOrigin) As Long
Return _sink.Seek(offset, direction)
End Function
Public Overrides Sub SetLength(ByVal length As Long)
_sink.SetLength(length)
End Sub
Public Overrides Sub Close()
_sink.Close()
End Sub
Public Overrides Sub Flush()
_sink.Flush()
End Sub
Public Overrides Function Read(ByVal MyBuffer() As Byte, _
ByVal offset As Integer, ByVal count As Integer) As Integer
_sink.Read(MyBuffer, offset, count)
End Function
#End Region
Public Overrides Sub Write(ByVal MyBuffer() As Byte, _
ByVal offset As Integer, ByVal count As Integer)
Dim data(count) As Byte
Buffer.BlockCopy(MyBuffer, offset, data, 0, count)
Dim s As String = System.Text.Encoding.UTF8.GetString(data)
s = s.Replace(ControlChars.Cr, _
Chr(255)).Replace(ControlChars.Lf, _
"").Replace(ControlChars.Tab, "")
s = s.Replace(";" & Chr(255), ";" & ControlChars.Cr)
s = s.Replace(Chr(255), " ")
Do
s = s.Replace(" ", " ")
Loop Until s.IndexOf(" ") = -1
s = s.Replace("<!-- Page Content Goes Above Here -->", "")
s = s.Replace("<!-- Page Content Goes Below Here -->", "")
s = s.Replace("<!-- Do not get rid of this on data pages -->", "")
s = s.Replace(" <!DOCTYPE", "<!DOCTYPE")
s = s.Replace("<li> ", _
"<li>").Replace("</td> ", _
"</td>").Replace("</tr> ", _
"</tr>").Replace("</ul> ", _
"</ul>").Replace("</table> ", _
"</table>").Replace("</li> ", "</li>")
s = s.Replace("<LI> ", _
"<LI>").Replace("</TD> ", _
"</TD>").Replace("</TR> ", _
"</TR>").Replace("</UL> ", _
"</UL>").Replace("</TABLE> ", _
"</TABLE>").Replace("</LI> ", "</LI>")
s = s.Replace("<td> ", _
"<td>").Replace("<tr> ", _
"<tr>")
s = s.Replace("<TD> ", _
"<TD>").Replace("<TR> ",_
"<TR>")
s = s.Replace("<P> ", "<P>").Replace("<p> ", "<p>")
s = s.Replace("</P> ", "</P>").Replace("</p> ", "</p>")
s = s.Replace("style=""display:inline""> ", _
"style=""display:inline"">")
s = s.Replace(" <H", "<H").Replace(" <h", _
"<h").Replace(" </H", _
"</H").Replace(" </h", "</h")
s = s.Replace("<UL> ", "<UL>").Replace("<ul> ", "<ul>")
s = s.Replace(" <TABLE", _
" ID="Table1"<TABLE").Replace(" ID="Table2"<table", _
" ID="Table3"<table")
s = s.Replace(" ID="Table4"<li>", _
"<li>").Replace(" <LI>", "<LI>")
s = s.Replace(" <br>", _
"<br>").Replace(" <BR>",_
"<BR>").Replace("<br> ", _
"<br>").Replace("<BR> ", "<BR>")
s = s.Replace(" <ul>", "<ul>").Replace(" <UL>", "<UL>")
s = s.Replace("<STRONG>", "<B>").Replace("<strong>", "<b>")
s = s.Replace("</STRONG>", "</B>").Replace("</strong>", "</b>")
s = s.Replace("&brkbar;", "|")
s = s.Replace("�", "|")
s = s.Replace("­", "-")
s = s.Replace(" ", Chr(160))
s = s.Replace("‚", "'")
s = s.Replace("„", """")
s = s.Replace("�", "'")
s = s.Replace("’", "'")
s = s.Replace("�", "'")
s = s.Replace("�", """")
s = s.Replace("”", """")
s = s.Replace("�", """")
s = s.Replace("�", "-")
s = s.Replace("&endash;", "-")
s = s.Replace("<!--", "<!--" & ControlChars.Cr)
s = s.Replace("}", "}" & ControlChars.Cr)
Do
s = s.Replace(" ", " ")
Loop Until s.IndexOf(" ") = -1
Dim outdata() As Byte = System.Text.Encoding.UTF8.GetBytes(s)
_sink.Write(outdata, 0, outdata.GetLength(0))
End Sub
End Class
Points of Interest
Occasionally, you will find that you have one or more pages that you do not want to compress. For example, the pages may use pre-formatted text or the pages may emit binary data instead of HTML.
In that case, you would want to filter the filter, so to speak. On our site, we have one page that we don't compress, so our Application_BeginRequest
looks a little bit like this...
Sub Application_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)
If Request.Url.PathAndQuery.ToLower.IndexOf("makethumbnail") = -1 Then
Response.Filter = New WhitespaceFilter(Response.Filter)
End If
End Sub
Using this class will increase the amount of processing time used for each page. In our case, the reduction in bandwidth (7% on our main page, as much as 30% on some of our more complex pages) was worth the increased workload on the server. All of the string operations are very inefficient, admittedly. A rewrite to use StringBuilder
is in the works. The only downside to StringBuilder
is that you can't run regular expressions against it. However, because of the use of String
s in the current version, I do not recommend using it if the HTML on your page is greater than 80,000 bytes on average, due to the behavior of the .NET Framework's garbage collector. Essentially, any object greater than 80,000 bytes will be immediately pushed into the Large Object Heap, which is only GC'ed as a measure of last resort by the framework.
If you are using a server operating system, you can also enable HTTP compression on the server to reduce your bandwidth usage even further. If an HTTP/1.1 client connects to your server, Windows will compress the binary stream (similar to ZIP) before sending it out to the client.
To enable HTTP compression on Windows 2000, open the Internet Service Manager, right-click on your server, and pick "Properties". Select the "Service" tab, then check "Compress Application Files" and "Compress Static Files".
As far as I can tell, HTTP compression is automatically enabled on IIS 5.1 in Windows XP.
History
- v1.1, 1/23/2004 - Bug-fix release.
- v1.0, 1/21/2004 - Initial submission.