(untagged)

How to Automatically Close Un-closed HTML Tags using C# for ASP.NET Web Applications

Milad Ashrafi

0.00/5 (No votes)

7 Oct 2014

We need this script for database based ASP.NET websites for using HTML content in post pages.

Introduction

Sometimes we want to get the summary of a full HTML article/post to show some lines of that in the main page. Therefore, if we cut the HTML string from the middle like a regular string, we have so many un-closed open HTML tags. So what happens is that the browser cannot find the correct closing tags for the open tags. For example, if we have an un-closed tag like <div>, we should close it. If not <div> will be closed by the next </div> outside the post area and posts will be arranged together.

Background

In this simple script, I use two regular expressions to export and compare tags, one for the start tag and one for the end tag. Then I make a reverse order for the start tag list. See the below to imagine this:

Order	Start Tag List		End Tag (false)	End Tag (true)
Order	Normal	Reverse	Normal	Normal
1	`<html>`	`<p>`	`</p>`	`</p>`
2	`<div>`	`<input>`	`</input>`	`</input>`
3	`<span style=”color:red;”>`	`<form>`	`</form>`	`</form>`
4	`<form>`	`<span style=”color:red;”>`	NO END TAG	`</span>`
5	`<input>`	`<div>`	NO END TAG	`</div>`
6	`<p>`	`<html>`	NO END TAG	`</html>`

The code is as follows:

public static string AutoCloseHtmlTags(string inputHtml)
{
    var regexStartTag = new Regex(@"<(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big" + 
          @"|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|" + 
          @"header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|" + 
          @"map|mark|menu|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|" + 
          @"output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|" + 
          @"source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|" + 
          @"tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr)(\s\w+.*(\u0022|'))?>");
    var startTagCollection = regexStartTag.Matches(inputHtml);
    var regexCloseTag = new Regex(@"</(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|" + 
          @"big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|header" + 
          @"|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|menu|" + 
          @"meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|" + 
          @"progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|" + 
          @"strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|" + 
          @"time|title|tr|track|tt|u|ul|var|video|wbr)>");
    var closeTagCollection = regexCloseTag.Matches(inputHtml);
    var startTagList = new List<string>();
    var closeTagList = new List<string>();
    var resultClose = "";
    foreach (Match startTag in startTagCollection)
    {
        startTagList.Add(startTag.Value);
    }
    foreach (Match closeTag in closeTagCollection)
    {
        closeTagList.Add(closeTag.Value);
    }
    startTagList.Reverse();
    for (int i = 0; i < closeTagList.Count; i++)
    {
        if (startTagList[i] != closeTagList[i])
        {
            int indexOfSpace = startTagList[i].IndexOf(
                     " ", System.StringComparison.Ordinal);
            if (startTagList[i].Contains(" "))
            {
                startTagList[i].Remove(indexOfSpace);
            }
            startTagList[i] = startTagList[i].Replace("<", "</");
            resultClose += startTagList[i] + ">";
            resultClose = resultClose.Replace(">>", ">");
        }
    }
    return inputHtml + resultClose;
}

Please let me know about your ideas...

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here