Regular Expression to Extract Inner Text from Anchor Tags

Bryian Tan

5.00/5 (2 votes)

8 Feb 2011CPOL

44.5K

How to extract the text from a hyperlink and preserve other HTML tags

In this post, you will see a simple regular expression to execute the task of extracting text from a hyperlink and preserving other HTML tags.

Introduction

Several days ago, someone at the forum asked how to extract the text from a hyperlink and preserve other HTML tags. It sounded interesting, I did some research but could not find a direct solution. So, I decided to put together a simple regular expression to execute the task.

Regular Expression: (<[a|A][^>]*>|</[a|A]>)

Explanation:

<[a|A][^>]*> -- Remove <a ...>

</[a|A]> -- Remove </a> tag

Example 1

HTML

string str1 = "<a href=\"http://www.amazon.com/dp/0596528124/\" class=\"someclass\">
Mastering Regular Expressions</a> 
-- <A href=\"http://cnn.com/\">CNN</a> <div><a href=\"http://blog.ysatech.com\">
http://blog.ysatech.com</a></div>";

str1 = System.Text.RegularExpressions.Regex.Replace(str1, "(<[a|A][^>]*>|)", "");

Result

Mastering Regular Expressions -- CNN <div> http://blog.ysatech.com </div>

Example 2

HTML

string str2 = "<div><a href=\"http://www.ysatech.com/\" class=\"someclass\">ysatech</a></div>";

str2 = System.Text.RegularExpressions.Regex.Replace(str2, "(<[a|A][^>]*>|)", "");

Result

<div>ysatech</div>

Test this regular expression here.

History

9^th February, 2011: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)