|
I'd like to extract data from blogspot feed. I've used regex only for Rainmeter, so this is what I came up with:
(?siU)</id><published>(.*)</published><updated>.*</updated><title type='text'>(.*)</title>.*<link rel='alternate' type='text/html' href='(.*)'
I assume the "(?siU)" part is wrong. What would be the correct format?
I've also heard about php's xml_parser, but I think regex is faster. Still, how would I extract same data as in above (broken) regex with xml_parser in php?
Thanks in advance!
|
|
|
|
|
You should be a little more clear about exactly what you are trying to do with that question mark, but here are some things to keep in mind...
If you have well formed XML, an XML parser is almost certainly the way to go. It might actually be faster than a regular expression. Unfortunately, I'm not familiar with PHP's XML parser, but you should take the time to familarize yourself with it.
Also, the question mark means "the preceding item is optional". Since the question mark is after an opening paren, there is nothing preceeding it, so I'm not exactly sure what you're after there. Depending on the regular expression engine you use, you can use a similar syntax for positive and negative lookaheads and lookbehinds, and you can use them for named groups. Or if you put a backslash to the left of the question mark, you'll escape it so it matches a literal question mark. But I'm not really sure what you're trying to do here. For example, if you were trying to get the query string value out of a URL, you could use a named group to grab it:
http://www\.google\.com\?(?<QUERY_STRING>.*)
Notice I use the question mark twice. The first time as a literal question mark and the second time as part of a named group. Here is another example:
http://www\.google\.com(?=\?)
That is a positive lookahead that ensures the character following the "m" is a question mark. But it doesn't actually grab the question mark as part of the pattern, it only ensures that the URL will match if that question mark exists in the right location. And of course, there is this use of the question mark:
http://www\.google\.com\??
That means the last question mark is optional. And then there is one more use of question marks (lazy matching rather than greedy matching) that goes like this:
\<img\>.*?\</img\>
I'll leave it up to you to figure out what that does if you are interested. One more thing, the less than and greater than signs have a special meaning in regular expressions. You may want to escape them by putting a backslash to the left of them.
|
|
|
|
|
|
Hi,
I'm not very good at RegEx , I want to use RegEx class to match values with desired string. My main string looks like this:
<%#String.Concat ..blah blah.. %>
in above string I want to get all the matches which between double quotes ( " ... " ) in blah blah part.
What is my expression should look like?
Thanks for your help
Mazy
"This chancy chancy chancy world."
|
|
|
|
|
you should post in no more than one location (a single forum, or Q&A) so everything about this topic stays together.
|
|
|
|
|
It's not the best in the world, but it does the job:
public static Regex regex = new Regex(
"<%[^\\\"]*\"(?<InQuotes>.*)(?=\\\")[^%]*%>",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
public static string regexReplace =
"<Hello>";
Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.
|
|
|
|
|
hi guys
i wanna a pattern to find number of html control in whole page
i test this pattern but thers no result:
@"\b((<\s*"+pattern+@"{1})\s(\s*\w*\s*\W*\s*\d*\s*)*/>\b)"
that pattern is a variable that contain like img,input and so on
|
|
|
|
|
Perhaps show examples of the HTML and pattern.
|
|
|
|
|
I'm not in the least bit supprised it doesn't work as you expected. It isn't a valid regex.
Go to www.ultraPico.com[^] and D/L a copy of Expresso - it examines, designs and explains regexes. It's free, and I use it a lot. I really wish I'd written it!
Real men don't use instructions. They are only the manufacturers opinion on how to put the thing together.
|
|
|
|
|
OriginalGriff wrote: It isn't a valid regex.
Works fine for me. Doesn't produce the expected results, but it will compile and it does return results under certain scenarios.
|
|
|
|
|
Take out both occurrences of "\b".
moein.serpico wrote: {1}
That is completely useless.
moein.serpico wrote: (\s*\w*\s*\W*\s*\d*\s*)*
And what is all that for?
|
|
|
|
|
You could have my TIP[^] to be useful.
|
|
|
|
|
Guys,
Seems quiet around here, but hey-ho...hopefully I'll get a response.
OK - I have the following Regex:
^https?://([a-zA-Z0-9-_]*[/\.]{1})*([a-zA-Z]^\.)*(aspx)?
Which I've created using an online regex builder[^]
It's intended to match urls on either www. or local network, with or without a page (.aspx) on the end (so either www.site.com/ or www.site.com/page.aspx). The problem I have is that for all cases it seems to work correctly, until I drop it into my C# / Silverlight app (with the 2 x \ escaped as \\).
The problem is that under the regex builder, a .aspx is fine, a .aspxs is flagged as not matching - which is exactly what I need...however, in c# the aspxs is not flagged as invalid.
I think I can see why - because the s is part of the group, it's permitted - so I'm trying to essentially do an Optional literal string...the ? to make it optional, the brackets to apply to the group...but c# seems to be percieving it more as [aspx]* . If I remove the bracket, then the ? only then applies to the x on the end (as with the s in https at the start)...
Any ideas what I've missed here?
C# has already designed away most of the tedium of C++.
modified on Friday, August 6, 2010 6:33 AM
|
|
|
|
|
A shot in the dark. Try $ (matches end-of-string) at the end of your regex. Maybe one environment assumes it, the other not.
Software rusts. Simon Stephenson, ca 1994.
|
|
|
|
|
By jove I think he's got it!
Good work fella...thanks a lot!
C# has already designed away most of the tedium of C++.
|
|
|
|
|
Hi all,
it must be a small problem that drives me crazy here...
From a text file like the sample given below, the following regular expression extracts the 'Grp' group well, but the 'Entr' group would always be empty. Maybe someone else can see where my error is? Actually, as a final result I'd also like to extract the entrie's name and its value (i.e. befor/after the "=") into different groups. Are any regex experts around who probably see more?
Dim regex = New Regex("\[(?<Grp>.*)\](?<Entr>.*)
is supposed to give me back the groups and entries in an ini-file like structure, no? It seems as if \[ and \] lead to something which I can't find in any reference. Btw I built the regex with Expresso.
' Sample
[Track]
Latitude=N047° 25' 53.4256"
Longitude=W122° 18' 28.7933"
Altitude=+000432.00
[Options]
Titles=False
Sound=True
Pause=False
Thank you very much in advance,
Michael
|
|
|
|
|
Michael,
I guess something is wrong in your post, I see only one double quote, and no MultiLine option. You probably have a conflict with the HTML monster.
|
|
|
|
|
Happy day ! I could resolve a part of my regex task in Expresso.
For the record and other users: Obviously ".*" for "any character / any number of repetitions" is not good enough to capture a CRLF! Trying to capture it with \r wasn't the right thing either, it took "\r\n" to match the group together with the first line! Here's the current status of the regex:
\[(?<Grp>.*)\]\r\n(?<Keyword>\w+)\s*=\s*(?<Value>.*)\r
The ultimate question would now be:
Of course the first key-value-pair after the group header isn't enough.
How can I include its repetitions into my match now?
@ Luc: Thanks for the hint. I guess it was more a matter with my own attention monster that I had here...
@ Admin: Pls excuse me double-posting with the VB forum. This new forum obviously isn't too known yet, so that I had more forwarding answers there before being told to shift my post here...
|
|
|
|
|
Michael,
I'm not really a regex expert, but I would tackle your problem with two regex and two foreach; the first regex would locate groups, the second would parse the key=value pairs within a group. I've never seen this done with a single regex.
BTW: I don't like your \r\n stuff at all. One normally uses the symbols for start-of-line (^) and end-of-lin ($)e, avoiding problems with matching things on the first line and last line (your input may or may not end on \r or \r\n).
FYI: I have some similar regex stuff in my article CP Vanity[^] which scrapes some of the CodeProject web pages. Similiar meaning two regexes, two foreach loops.
|
|
|
|
|
Don't you need to double (escape) the backslashes? This is part of why I wrote my RegexTester[^].
|
|
|
|
|
the OP started with a "Dim", in VB there is no escape mechanism AFAIK.
|
|
|
|
|
I need sleep. This working for eight hours a day sucks, why do people do it?
|
|
|
|
|
|
Making tools to help with regular expressions must be fun, because I made one too. Looks kinda like yours.
|
|
|
|
|
As hot as hell.
Those are pretty regular, aren't they?
L u n a t i c F r i n g e
|
|
|
|