Introduction
When looking at the Controls
collection of the Page
object, you quickly realize that all the interesting stuff comes in a LiteralControl
. So there is no easy way to insert or change this text in a comfortable manner. Therefore I wrote a few classes that take a string apart into objects. These objects can be changed safely and then generate a new string for a LiteralControl
.
I ran across this problem when writing a page template class. You can take the literal code out of the .aspx file, but then the designer seems not to be working very well. So I like to use the designer and change the header literal in the page template.
Background
Parsing HTML is not really a fun thing so I made a few restrictions.
- The parser does not really understand HTML, but only text, tags and attributes. He does not care what their name and values are.
- Badly formed input will result in poor output. (i.e. the
<meta>
is often not closed, so the only way to place a following tag is as a child tag. So the source text must be changed to something like <meta ... />
.)
- Since you can insert plain text into the resulting object tree, you easily ruin the output. (i.e. inserting text like
"<junk"
will be rendered as "<junk"
and not <junk"
. Remember, the brain is in front of the screen :-)
Using the code
The main class is Fragments
. The constructor of Fragments
, take a string which is parsed into objects. Fragments
is a collection of (guess what?) Fragement
s. Actually Fragment
is the super class of FragmentText
(representing simple plain text), FragmentTag
(representing a tag <tagname attr="value" ... >
), FragmentComment
(for a comment <!---->
and Fragmen
t
Doctype
(i.e. <!DOCTYPE HTML ... >
).
The objects can be changed, added or removed like in any collection. Objects of type FragmentTag
, have a property Nodes
representing the sub tags. Since we parse a fragment there can be unmatched tags (i.e. only open or only closing tags. Therefore the FragmentTag
has a property Type
, which state if there are open and/or closing tags. The value OpenCloseShort
stands for tags of the kind <br/>
. Obviously these tags can not have Nodes
.
Finally using the ToString()
method will transform the Fragment
into a plain HTML string.
Fragments fragments = new Fragments( someString );
for each ( Fragment fragment in fragments )
{
if ( fragment is FragmentTag )
{
FragmentTag tag = (FragmentTag)fragment;
tag.Nodes.Add( new FragmentText( "plain text" ) );
}
if ( fragment is FragmentText )
{
...
}
}
string s = fragments.ToString();
You can also start with an empty Fragments
object, insert everything into it and generate the output.
There is a small sample program with the sources, which I use for testing. It demonstrates most of the usage.
Points of interest
I use the Regex
class to split the input into pieces. The pattern is rather unreadable, but the basic structure is pattern1|pattern2|pattern3|...
. It took some time to understand, that the next match will contain exactly one of the patterns. There I gave each pattern an exclusive name and made some sub groups for parameters or names. Also note that the next match will not continue exactly behind the last match. It will only continue searching there. So we have to keep track ourselves if all input is parsed.
History
- Version 1.0 - first release
- Version 1.1 - bug fixes (exception inside exception, parsing of nested quotes)