Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

5.00/5 (2 votes)
15 Feb 2012CPOL 11.8K  
I think the following Regex and HtmlDecode would do:string html = ...;string textonly = HttpUtility.HtmlDecode( Regex.Replace(html, @|, ));Any HTML construct that would not be stripped off properly by this?
I think the following Regex and HtmlDecode would do:

C#
string html = ...;
string textonly = HttpUtility.HtmlDecode(
         Regex.Replace(html, @"<!--[\S\s]*?-->|<(?:"".*?""|'.*?'|[\S\s])*?>", ""));


Any HTML construct that would not be stripped off properly by this?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)