Replace speacial characters with span tag using c# regular expressions.

Question

0.00/5 (No votes)

See more:

We have a test like "This is a bug (solveing) ? & @ # 17066.".

After we modify the test with below logic :

C#

// replace <br /> tags with new line characters
            string replaceContent = Regex.Replace(contentText, @"", "\n");

            // add full stop to inner text of an element, if it needs one, to separate from next word when tags removed
            replaceContent = Regex.Replace(replaceContent, @"\w(?=)", @"$0\.");

            // remove all tags and leading and trailing space          
            replaceContent = Regex.Replace(replaceContent, "<.*?>", string.Empty).Trim();
            CommonMethods.WordMatchCollection = CommonMethods.WordRegularExpression.Matches(replaceContent);

            // match words which are not in tags
            string pattern = @"(?<!<[^>]*?)\b(\w+|\w+['-]\w+)\b(?![^<]*?>)";
            string replacement = "<span>$0</span>";

            // surround words in text with <span> tags
            contentText = Regex.Replace(contentText, pattern, replacement);

            return contentText;</span>

The formated test is as follows:-

This is a bug (solveing) ? & @ # 17066.

We want to add Span tags for Speacial character like #, $, @ etc also.
How we can do this.

Thanks in Advanced.

Posted 27-Oct-11 3:49am

Member 3827009

Updated 27-Oct-11 4:11am

Mehdi Gholam

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

BillWoodruff · Answer 1 · 2011-10-27T17:58:00

A critical piece of information needed here, imho, is: does the input string format/structure vary: i.e., is it 'arbitrary' HTML, or whatever ?

Looking at your sample input line, and the parsing code, I can't relate the two: the input sample does not appear to have any of the HTML adornment the code sample is obviously replacing. Is it possible the CP code-parse engine has stripped tags and HTML adornment out of the sample input string ?

The ... what I interpret as removal of HTML adornment ... and then the addition of 'span' tags by the parsing code ... is also a bit confusing.

Please find a way to give a complete input sample, and its desired (transformed) output format.

And, may I suggest, as an experiment, you take the modal-format input string, and 'Split' it a few different ways, and look at the string[] produced ... starting with splitting by the space character.

Sometimes it can be easier to deal with parsing and transformation by first splitting, and other times RegEX is probably the "only game in town."