Regular Expressions

11-Nov-10 10:28

You should be a little more clear about exactly what you are trying to do with that question mark, but here are some things to keep in mind...

If you have well formed XML, an XML parser is almost certainly the way to go. It might actually be faster than a regular expression. Unfortunately, I'm not familiar with PHP's XML parser, but you should take the time to familarize yourself with it.

Also, the question mark means "the preceding item is optional". Since the question mark is after an opening paren, there is nothing preceeding it, so I'm not exactly sure what you're after there. Depending on the regular expression engine you use, you can use a similar syntax for positive and negative lookaheads and lookbehinds, and you can use them for named groups. Or if you put a backslash to the left of the question mark, you'll escape it so it matches a literal question mark. But I'm not really sure what you're trying to do here. For example, if you were trying to get the query string value out of a URL, you could use a named group to grab it:

http://www\.google\.com\?(?<QUERY_STRING>.*)

Notice I use the question mark twice. The first time as a literal question mark and the second time as part of a named group. Here is another example:

http://www\.google\.com(?=\?)

That is a positive lookahead that ensures the character following the "m" is a question mark. But it doesn't actually grab the question mark as part of the pattern, it only ensures that the URL will match if that question mark exists in the right location. And of course, there is this use of the question mark:

http://www\.google\.com\??

That means the last question mark is optional. And then there is one more use of question marks (lazy matching rather than greedy matching) that goes like this:

\<img\>.*?\</img\>

I'll leave it up to you to figure out what that does if you are interested. One more thing, the less than and greater than signs have a special meaning in regular expressions. You may want to escape them by putting a backslash to the left of them.

Re: xml regex (for php)

fdsfsa76f7sa611-Nov-10 11:08

fdsfsa76f7sa6

11-Nov-10 11:08

Thanks for the detailed explanation.

After a more extensive searching I found out how to use xml_parser for blogspot feed. It certainly seems easier than regex.

Expression needed

Mazdak5-Sep-10 16:22

Mazdak

5-Sep-10 16:22

Hi,

I'm not very good at RegEx , I want to use RegEx class to match values with desired string. My main string looks like this:

<%#String.Concat ..blah blah.. %>

in above string I want to get all the matches which between double quotes ( " ... " ) in blah blah part.

What is my expression should look like?

Thanks for your help

Mazy

"This chancy chancy chancy world."

Cross-post

Luc Pattyn5-Sep-10 16:54

5-Sep-10 16:54

you should post in no more than one location (a single forum, or Q&A) so everything about this topic stays together.

Hmmm | :|

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Re: Expression needed

OriginalGriff6-Sep-10 1:27

OriginalGriff

6-Sep-10 1:27

It's not the best in the world, but it does the job:

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Mon, Sep 6, 2010, 12:26:15 PM
///  Using Expresso Version: 3.0.3634, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  <%
///      <%
///  Any character that is NOT in this class: [\"], any number of repetitions
///  "
///  [InQuotes]: A named capture group. [.*]
///      Any character, any number of repetitions
///  Match a suffix but exclude it from the capture. [\"]
///      Literal "
///  Any character that is NOT in this class: [%], any number of repetitions
///  %>
///      %>
///  
///
/// </summary>
public static Regex regex = new Regex(
      "<%[^\\\"]*\"(?<InQuotes>.*)(?=\\\")[^%]*%>",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace = 
      "<Hello>";


//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);

//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);

//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);

//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);

//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);

//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();

//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.

html tag finder

moein.serpico14-Aug-10 23:14

moein.serpico

14-Aug-10 23:14

hi guys
i wanna a pattern to find number of html control in whole page
i test this pattern but thers no result:

@"\b((<\s*"+pattern+@"{1})\s(\s*\w*\s*\W*\s*\d*\s*)*/>\b)"

that pattern is a variable that contain like img,input and so on

PIEBALDconsult15-Aug-10 8:19

PIEBALDconsult

15-Aug-10 8:19

Perhaps show examples of the HTML and pattern.

OriginalGriff15-Aug-10 8:31

OriginalGriff

15-Aug-10 8:31

I'm not in the least bit supprised it doesn't work as you expected. It isn't a valid regex.
Go to www.ultraPico.com[^] and D/L a copy of Expresso - it examines, designs and explains regexes. It's free, and I use it a lot. I really wish I'd written it!

Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.

AspDotNetDev26-Sep-10 9:29

26-Sep-10 9:29

OriginalGriff wrote:
It isn't a valid regex.

Works fine for me. Doesn't produce the expected results, but it will compile and it does return results under certain scenarios.

AspDotNetDev26-Sep-10 9:28

26-Sep-10 9:28

Take out both occurrences of "\b".

moein.serpico wrote:
{1}

That is completely useless.

moein.serpico wrote:
(\s*\w*\s*\W*\s*\d*\s*)*

And what is all that for?

Differences between matches with same RegEx [modified]

Hiren solanki17-Dec-10 0:40

Hiren solanki

17-Dec-10 0:40

You could have my TIP[^] to be useful.

Regards,
Hiren.

My Recent Article: - Way to know which control have raised PostBack
My Recent Tip/Trick: - Remove HTML Tag, get plain Text

RichardGrimmer6-Aug-10 0:00

RichardGrimmer

6-Aug-10 0:00

Guys,
Seems quiet around here, but hey-ho...hopefully I'll get a response.

OK - I have the following Regex:

^https?://([a-zA-Z0-9-_]*[/\.]{1})*([a-zA-Z]^\.)*(aspx)?

Which I've created using an online regex builder[^]

It's intended to match urls on either www. or local network, with or without a page (.aspx) on the end (so either www.site.com/ or www.site.com/page.aspx). The problem I have is that for all cases it seems to work correctly, until I drop it into my C# / Silverlight app (with the 2 x \ escaped as \\).

The problem is that under the regex builder, a .aspx is fine, a .aspxs is flagged as not matching - which is exactly what I need...however, in c# the aspxs is not flagged as invalid.

I think I can see why - because the s is part of the group, it's permitted - so I'm trying to essentially do an Optional literal string...the ? to make it optional, the brackets to apply to the group...but c# seems to be percieving it more as [aspx]* . If I remove the bracket, then the ? only then applies to the x on the end (as with the s in https at the start)...

Any ideas what I've missed here?

C# has already designed away most of the tedium of C++.

modified on Friday, August 6, 2010 6:33 AM

Re: Differences between matches with same RegEx

Peter_in_27806-Aug-10 2:22

Peter_in_2780

6-Aug-10 2:22

A shot in the dark. Try $ (matches end-of-string) at the end of your regex. Maybe one environment assumes it, the other not.

Software rusts. Simon Stephenson, ca 1994.

Re: Differences between matches with same RegEx - RESOLVED

RichardGrimmer8-Aug-10 22:53

RichardGrimmer

8-Aug-10 22:53

By jove I think he's got it!

Good work fella...thanks a lot!

C# has already designed away most of the tedium of C++.

Even Expresso couldn't help me with this RegEx - Can You?

Sonhospa30-Jul-10 7:19

Sonhospa

30-Jul-10 7:19

Hi all,

it must be a small problem that drives me crazy here...

From a text file like the sample given below, the following regular expression extracts the 'Grp' group well, but the 'Entr' group would always be empty. Maybe someone else can see where my error is? Actually, as a final result I'd also like to extract the entrie's name and its value (i.e. befor/after the "=") into different groups. Are any regex experts around who probably see more?

Dim regex = New Regex("\[(?<Grp>.*)\](?<Entr>.*)

is supposed to give me back the groups and entries in an ini-file like structure, no? It seems as if \[ and \] lead to something which I can't find in any reference. Btw I built the regex with Expresso.

' Sample
[Track]
Latitude=N047° 25' 53.4256"
Longitude=W122° 18' 28.7933"
Altitude=+000432.00

[Options]
Titles=False
Sound=True
Pause=False

Thank you very much in advance,
Michael

Re: Even Expresso couldn't help me with this RegEx - Can You?

Luc Pattyn30-Jul-10 7:57

Partially resolved: Even Expresso couldn't help me with this RegEx - Can You?

30-Jul-10 7:57

Michael,

I guess something is wrong in your post, I see only one double quote, and no MultiLine option. You probably have a conflict with the HTML monster.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

Sonhospa31-Jul-10 1:06

Sonhospa

31-Jul-10 1:06

Happy day

! I could resolve a part of my regex task in Expresso.

For the record and other users: Obviously ".*" for "any character / any number of repetitions" is not good enough to capture a CRLF! Trying to capture it with \r wasn't the right thing either, it took "\r\n" to match the group together with the first line! Here's the current status of the regex:

\[(?<Grp>.*)\]\r\n(?<Keyword>\w+)\s*=\s*(?<Value>.*)\r

The ultimate question would now be:
Of course the first key-value-pair after the group header isn't enough.
How can I include its repetitions into my match now? Confused | :confused:

@ Luc: Thanks for the hint. I guess it was more a matter with my own attention monster Wink | ;)

that I had here...

@ Admin: Pls excuse me double-posting with the VB forum. This new forum obviously isn't too known yet, so that I had more forwarding answers there before being told to shift my post here...

Re: Partially resolved: Even Expresso couldn't help me with this RegEx - Can You?

Luc Pattyn31-Jul-10 2:24

Re: Even Expresso couldn't help me with this RegEx - Can You?

31-Jul-10 2:24

Michael,

I'm not really a regex expert, but I would tackle your problem with two regex and two foreach; the first regex would locate groups, the second would parse the key=value pairs within a group. I've never seen this done with a single regex.

BTW: I don't like your \r\n stuff at all. One normally uses the symbols for start-of-line (^) and end-of-lin ($)e, avoiding problems with matching things on the first line and last line (your input may or may not end on \r or \r\n).

FYI: I have some similar regex stuff in my article CP Vanity[^] which scrapes some of the CodeProject web pages. Similiar meaning two regexes, two foreach loops.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

PIEBALDconsult12-Aug-10 18:07

PIEBALDconsult

12-Aug-10 18:07

Don't you need to double (escape) the backslashes? This is part of why I wrote my RegexTester[^].

Re: Even Expresso couldn't help me with this RegEx - Can You?

Luc Pattyn12-Aug-10 18:20

Re: Even Expresso couldn't help me with this RegEx - Can You?

12-Aug-10 18:20

the OP started with a "Dim", in VB there is no escape mechanism AFAIK.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

PIEBALDconsult12-Aug-10 18:23

PIEBALDconsult

12-Aug-10 18:23

I need sleep. This working for eight hours a day sucks, why do people do it?

Re: Even Expresso couldn't help me with this RegEx - Can You?

AspDotNetDev23-Aug-10 12:37

23-Aug-10 12:37

(fun)?$

Re: Even Expresso couldn't help me with this RegEx - Can You?

AspDotNetDev23-Aug-10 12:44

23-Aug-10 12:44

Making tools to help with regular expressions must be fun, because I made one too. Looks kinda like yours. Smile | :)