Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

5.00/5 (4 votes)
20 Dec 2010CPOL 14.1K  
Consider using the open source HTML Agility Pack library (htmlagilitypack.codeplex.com).It lets you use XPATH queries to access very specific parts of an HMTL document, and the HTML does not have to be valid, well-formed XML. In addition to accessing the raw inner text of an element you can...
Consider using the open source HTML Agility Pack library (htmlagilitypack.codeplex.com).

It lets you use XPATH queries to access very specific parts of an HMTL document, and the HTML does not have to be valid, well-formed XML. In addition to accessing the raw inner text of an element you can select specific attribute values, which is useful for getting things like meta description content or image alt/title text.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)