Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

5.00/5 (2 votes)
5 Jan 2011CPOL 10.1K  
NOTE: If you're really wanting plain text, then you should also be sure to decode the HTML entities (System.Web.HttpUtility.HtmlDecode()) on the resulting text, or you'll wind up with HTML/XML character entity text in your output, such as & and [ If you're going to immediately output the...
NOTE: If you're really wanting plain text, then you should also be sure to decode the HTML entities (System.Web.HttpUtility.HtmlDecode()) on the resulting text, or you'll wind up with HTML/XML character entity text in your output, such as & and [ If you're going to immediately output the text to a browser, however, then you won't need to.
using System.Web;
 . . .
class foo {
   public void bar() {
      string ss = "Remove tags & HTML Entities";
      Regex regex = new Regex("\\<[^\\>]*\\>");
      Response.Write(String.Format("Before: '{0}'\n", ss));
      ss = regex.Replace(ss, String.Empty);
      ss = HttpUtility.HtmlDecode(ss);
      Response.Write(String.Format("After: '{0}'\n", ss));
   }
}

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)