Click here to Skip to main content
16,017,297 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
We have a test like "This is a bug (solveing) ? & @ # 17066.".

After we modify the test with below logic :

C#
// replace <br /> tags with new line characters
            string replaceContent = Regex.Replace(contentText, @"", "\n");

            // add full stop to inner text of an element, if it needs one, to separate from next word when tags removed
            replaceContent = Regex.Replace(replaceContent, @"\w(?=)", @"$0\.");

            // remove all tags and leading and trailing space          
            replaceContent = Regex.Replace(replaceContent, "<.*?>", string.Empty).Trim();
            CommonMethods.WordMatchCollection = CommonMethods.WordRegularExpression.Matches(replaceContent);

            // match words which are not in tags
            string pattern = @"(?<!<[^>]*?)\b(\w+|\w+['-]\w+)\b(?![^<]*?>)";
            string replacement = "<span>$0</span>";

            // surround words in text with <span> tags
            contentText = Regex.Replace(contentText, pattern, replacement);

            return contentText;</span>


The formated test is as follows:-

This is a bug (solveing) ? &amp; @ # 17066.




We want to add Span tags for Speacial character like #, $, @ etc also.
How we can do this.

Thanks in Advanced.


Posted
Updated 27-Oct-11 4:11am
v2

1 solution

A critical piece of information needed here, imho, is: does the input string format/structure vary: i.e., is it 'arbitrary' HTML, or whatever ?

Looking at your sample input line, and the parsing code, I can't relate the two: the input sample does not appear to have any of the HTML adornment the code sample is obviously replacing. Is it possible the CP code-parse engine has stripped tags and HTML adornment out of the sample input string ?

The ... what I interpret as removal of HTML adornment ... and then the addition of 'span' tags by the parsing code ... is also a bit confusing.

Please find a way to give a complete input sample, and its desired (transformed) output format.

And, may I suggest, as an experiment, you take the modal-format input string, and 'Split' it a few different ways, and look at the string[] produced ... starting with splitting by the space character.

Sometimes it can be easier to deal with parsing and transformation by first splitting, and other times RegEX is probably the "only game in town."
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900