Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

How to Automatically Close Un-closed HTML Tags using C# for ASP.NET Web Applications

0.00/5 (No votes)
7 Oct 2014 1  
We need this script for database based ASP.NET websites for using HTML content in post pages.

Introduction

Sometimes we want to get the summary of a full HTML article/post to show some lines of that in the main page. Therefore, if we cut the HTML string from the middle like a regular string, we have so many un-closed open HTML tags. So what happens is that the browser cannot find the correct closing tags for the open tags. For example, if we have an un-closed tag like <div>, we should close it. If not <div> will be closed by the next </div> outside the post area and posts will be arranged together.

Background

In this simple script, I use two regular expressions to export and compare tags, one for the start tag and one for the end tag. Then I make a reverse order for the start tag list. See the below to imagine this:

Order

Start Tag List End Tag (false) End Tag (true)
Normal Reverse Normal Normal
1 <html> <p> </p> </p>
2 <div> <input> </input> </input>
3 <span style=”color:red;”> <form> </form> </form>
4 <form> <span style=”color:red;”> NO END TAG </span>
5 <input> <div> NO END TAG </div>
6 <p> <html> NO END TAG </html>

The code is as follows:

public static string AutoCloseHtmlTags(string inputHtml)
{
    var regexStartTag = new Regex(@"<(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big" + 
          @"|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|" + 
          @"header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|" + 
          @"map|mark|menu|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|" + 
          @"output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|" + 
          @"source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|" + 
          @"tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr)(\s\w+.*(\u0022|'))?>");
    var startTagCollection = regexStartTag.Matches(inputHtml);
    var regexCloseTag = new Regex(@"</(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|" + 
          @"big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|header" + 
          @"|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|menu|" + 
          @"meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|" + 
          @"progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|" + 
          @"strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|" + 
          @"time|title|tr|track|tt|u|ul|var|video|wbr)>");
    var closeTagCollection = regexCloseTag.Matches(inputHtml);
    var startTagList = new List<string>();
    var closeTagList = new List<string>();
    var resultClose = "";
    foreach (Match startTag in startTagCollection)
    {
        startTagList.Add(startTag.Value);
    }
    foreach (Match closeTag in closeTagCollection)
    {
        closeTagList.Add(closeTag.Value);
    }
    startTagList.Reverse();
    for (int i = 0; i < closeTagList.Count; i++)
    {
        if (startTagList[i] != closeTagList[i])
        {
            int indexOfSpace = startTagList[i].IndexOf(
                     " ", System.StringComparison.Ordinal);
            if (startTagList[i].Contains(" "))
            {
                startTagList[i].Remove(indexOfSpace);
            }
            startTagList[i] = startTagList[i].Replace("<", "</");
            resultClose += startTagList[i] + ">";
            resultClose = resultClose.Replace(">>", ">");
        }
    }
    return inputHtml + resultClose;
} 

Please let me know about your ideas...

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here