Introduction
No doubt you have seen many web pages in which the results of a keyword-search highlights the keyword in yellow, making it easy for the reader to find the keyword in the context in which it was found. There are of course many ways to approach this task.
This article discusses:
- Implementation of the (mostly) undocumented
HttpResponse.Filter
property
- Implementation of a simple search box to highlight a word or phrase on a page
- Use of
Regex.Replace
with a MatchEvaluator
delegate
Background
This week when I approached the implementation of keyword highlighting, I considered a few possible ways:
- Client-side DOM manipulation with JavaScript
- Search and replace on the text to which I have programmatic access
- An ASP.NET HTTP Module or HTTP Handler, compiled as a standalone assembly and installed in Web.config
- Manipulating the output stream, similar to output buffering in PHP
It was the last method that I decided to pursue, because it had the potential to operate independently of the page's code (unlike #2), wouldn't require processor-intensive client-scripting (unlike #1), and wouldn't require any server-side configuration (unlike #3).
The example site consists of a web page that displays the text from Charles Dickens' Great Expectations. In the upper-right corner of the page floats a search box into which you can enter a word or phrase. It also presents some options, such as case-sensitive searching, whole-word searching, and searching using regular expressions instead of literal text.
When a word or phrase is entered into the search box and the button clicked, the page is shown again with the search term highlighted throughout the document.
Terminology
For the sake of clarity, I'll refer to the search term or keywords as the needle. Likewise, I'll refer to the text that is being searched as the haystack. This nomenclature is also used throughout the code for consistency.
Using the Code
Earlier in the article, I promised to add highlighting to a page with one line of code. Here is the code in context:
protected void Page_Load(object sender, EventArgs e)
{
Content.Text = Properties.Resources.Great_Expectations__by_Charles_Dickens;
if(IsPostBack)
{
Response.Filter = new HighlightFilter(Response, Needle.Text) {
IsHtml5 = false,
MatchCase = MatchCase.Checked,
MatchWholeWords = MatchWholeWords.Checked,
UseRegex = UseRegularExpressions.Checked
};
Needle.Text = string.Empty;
}
}
As you can see, when the Web Form is posted back, the needle is retrieved from Needle.Text
. In the code-behind, we construct a HighlightFilter
, passing it the HttpResponse
object and the needle.
I have also set some of the properties of HighlightFilter
using an object initializer. Most of the properties should be self-explanatory, like MatchCase
, MatchWholeWord
, and UseRegex
.
The IsHtml5
property wraps instances of the needle in the <mark>
element, for which it was intended. If it is false
, a div
with its class set to "highlight
" is used instead. For greater control, one can explicitly set the values of the OpenTag
and CloseTag
properties. For ultimate control, you can subscribe to the Highlighting
event and modify the supplied Haystack
using the supplied Needle
, or even subclass HighlightFilter
entirely.
Of course, the usefulness of post-processing in this manner need not be limited to highlighting. Using the Filter
class, one could subscribe to the Filtering
event to modify the output stream, or subclass Filter
and override the protected OnFilter
method. There are numerous applications including:
- obfuscation
- minification
- altering the output of sealed classes
- translation (e.g. RSS ? HTML)
- insertion of common code (e.g. reverse master page)
If you find other uses, please share with a comment.
How It Works
I would need to somehow intercept the output stream, Page.Response.OutputStream
.
A bit of searching led me to the Filter
property of the HttpResponse
class. The documentation for the property leaves quite a bit to the imagination. The property is assigned a Stream
that filters writes, and the example refers to a magical (i.e. undocumented) UpperCaseFilterStream
that takes the property itself as a parameter to the constructor, and ta da! Hmm… (Had I bothered to find and unpack Samples.AspNet.CS.Controls
maybe I would have solved this one.
I created the Filter
class, which takes the HttpResponse
object as a parameter to the constructor. The class itself inherits Stream
, but the implementation of the abstract
class simply invokes methods and properties of the HttpResponse
object's OutputStream
stream, with the exception of Write(byte[] buffer, int offset, int count)
. The overridden Write
method decodes the buffer to a string
using the response's ContentEncoding
, applies a filter, and re-encodes and writes out the buffer to the OutputStream
.
The Filter
class by itself doesn't do anything useful, but its potential is unlimited. To make it filter something, one needs to subclass it and override OnFilter
, or instantiate it and subscribe to the Filtering
event, which passes a FilterEventArgs
object containing the buffered string to be manipulated.
For example, to implement needle highlighting, HighlightFilter
inherits Filter
, overriding OnFilter
and adding some properties and the Highlighting
event.
The new OnFilter
method uses Regex.Replace
to replace instances of the needle in the haystack. It does this using the invocation that takes a MatchEvaluator
, a delegate that is called for each match that is found. This is perfect for this use because if MatchWholeWords
is true
, the characters that bound the needle will be replaced in kind, and the case of the match will not be altered (i.e. using String.Replace
would replace the casing of all matches with that of the needle.
If UseRegex
is false
, the needle is simply escaped with Regex.Escape
instead of using an alternate means of searching and replacing.
I was initially concerned that using Regex
for replacement with a MatchEvaluator
would be prohibitively slow, but replacement of common words in Great Expectations (just over one megabyte) takes a few millisecond on my Core i7-2600K and hopefully not too much more on a typical web server. Interestingly, enabling "Match Whole Word", increases this to several seconds.
Points of Interest
In my first attempt, I derived a new class from MemoryStream
and assigned it to the Filter
property. I overrode the Write
method and manipulated it by wrapping instances of the keyword in a new element to which as CSS style could be assigned.
Inspection of the contents of the stream
demonstrated that it worked quite nicely, and the class called base.Write
to complete the task, but this resulted in zero bytes sent to the client. The sample application suggests maybe one needs to write out the bytes individually. Instead, I used my class to wrap the output stream
.
Acknowledgements
Thank you to The Gutenberg Project for the free distribution of Great Expectations and over 36,000 other works; and of course to Charles Dickens (1812-1870) himself.
History
- October 31, 2011: Version 1.0.0.x
- January 3, 2013: Modified title to better describe the nature of the topic