|
Luc Pattyn wrote: 1.
we have a regular expressions forum
I knew I should have checked all the forums. However, I am using C#. Maybe someone will move this to the right forum.
Luc Pattyn wrote: 2.
you're worried about readability and yet you use regular expressions?
It's a simple numeric search. I could easily add a comment that I'm actually searching for 6-2-4, but only 2-4 is needed since the numbers begin with 6.
Luc Pattyn wrote: 3.
what do you want to happen with numbers having multiple two's and four's?
is 612423 two matches? a match and a fail?
As long as there's a 2 before a 4, it passes. The string 612345 was just an example. It could be 64444444444444422222222222123678901236789012345678903456789034567890 and it would still pass.
|
|
|
|
|
Bassam Abdul-Baki wrote: as there's a 2 before a 4, it passes
Check for the 6
Compare String.IndexOf ( 2 ) < String.LastIndexOf ( 4 ) (Check for -1 too)
Done
|
|
|
|
|
Nice! I like this, but my example only had two numbers for simplicity. I actually have three numbers. Missed it by this much.
|
|
|
|
|
Well, why didn't you say so?
http://msdn.microsoft.com/en-us/library/5xkyx09y.aspx[^]
if ( s [ 0 ] == firstdigit )
and int x = s.IndexOf ( secondsdigit ) > 0
and int y = s.IndexOf ( thirddigit , x ) > 0
and int z = s.IndexOf ( fourthdigit , y ) > 0
etc.
Perhaps in a loop.
|
|
|
|
|
D'oh! That would work. Haven't used IndexOf in quite a while. Thanks!
|
|
|
|
|
If you really want to optimize this, just step through the characters with a state machine.
This is faster than using a regular expression, which has overhead because it's designed to handle the much more complex general case.
|
|
|
|
|
Alan Balkany wrote: If you really want to optimize this, just step through the characters with a state machine.
Not familiar with it. However, I have an obscene amount of strings that I'm parsing and it's coded in C#.
|
|
|
|
|
Are you always looking for these three digits, or was that just an example?
|
|
|
|
|
The three digits were an example, but the same digits all the time. So yes and no.
|
|
|
|
|
A state machine uses a single int to mark where you are in your search. E.g.:
int state = 0;
foreach (char c in nextString)
switch (state)
{
case (0): if (c == '6') state = 1;
break;
case (1): if (c == '2') state = 2;
break;
case (2): if (c == '4') state = 3;
break;
}
if (state == 3)
return (FOUND);
else
return (NOT_FOUND);
This is way faster than using regular expressions, which must handle much more complex cases.
|
|
|
|
|
Groovy (not the language)! I'll try it out. Thanks!
|
|
|
|
|
I'm afraid the switch will turn out to be expensive, and it really is unnecessary. How about:
int len=s.Length;
int i=0;
while(i<len) {
char c=s[i++];
if (c=='6') {
while(i<len) {
c=s[i++];
if (c=='2') {
while(i<len) {
c=s[i++];
if (c=='4') return true;
}
}
}
}
}
return false;
or the more structured:
string search="624x";
int iSearch=0;
char cSearch=search[iSearch];
foreach(char c in s) {
if(c==cSearch) {
cSearch=search[++iSearch];
if (cSearch=='x') return true;
}
}
return false;
BTW: none if these fulfill the requirement, as they would accept 6424 whereas the OP wants to reject that; so the state machine needs to be a tiny bit more complex.
|
|
|
|
|
Yes, your first example is a more efficient implementation of the state machine, but I was trying to illustrate the concept with the most clarity.
My example also could be optimized by immediately returning when the third digit was found, but I wanted to show a pure example of a state machine.
|
|
|
|
|
|
Luc Pattyn wrote: BTW: none if these fulfill the requirement, as they would accept 6424 whereas the OP wants to reject that; so the state machine needs to be a tiny bit more complex.
No, I wouldn't. When I said as long as 6-2-4 appear in that order, I meant 6[0-9]*2[0-9]*4.
|
|
|
|
|
The first example you provided is quite clear, although you had a len missing in the third while loop. I'm not so sure about the second one(since I don't have a compiler at work - not my thing anymore). Never mind. I got it. Except the 'x' confused the hell out of me.
I think I like the second one better, but I need to use the one that's faster.
Thanks!
|
|
|
|
|
sorry. I fixed both mistakes now.
I don't expect you would notice a significant performance difference between both approaches.
|
|
|
|
|
No, but I am wondering between your first method and PIEBALDconsult's use of IndexOf above. Your second method actually has a more dynamic use for me which I may or may not need. Either way, it helped me learn what I've forgotten.
Thanks!
|
|
|
|
|
IndexOf is fine for short strings; it is also fine for long strings if you know the character is present. It is a waste if you need several of them and most of those return nothing.
BTW: there also is a LastIndexOf which actually scans from right to left. So looking for 2...4 could be achieved like this:
int i2=s.IndexOf('2');
if (i2>=0 && i2<s.LastIndexOf('4')) return true;
|
|
|
|
|
Thanks, I saw that. Actually, the string I'm searching for will only appear roughly one third of the time. Why would it be a problem if the character is not present? It should basically scan the entire string just like your method does. However, I'm inclined to believe that Microsoft must have optimized the IndexOf and LastIndexOf to work just as well as your first search method. It's your second method that I'm still deciding on.
|
|
|
|
|
an IndexOf-based approach, when using several calls to such methods, has the disadvantage of scanning the string several times; a state-machine based approach typically scans just once.
I'm not sure what you mean by my first and second approach; the one with cSearch is a bit neater than the nested while loops, as it takes less code and scales better when more than a few items need to be present in a specific order. As I said, the speed difference would be minimal, the only overhead it adds is advancing one position in "search" each time the state-machine's state changes (as opposed to entering another nested while loop). When in doubt, just give it a try.
|
|
|
|
|
Luc Pattyn wrote: an IndexOf-based approach, when using several calls to such methods, has the disadvantage of scanning the string several times;
PIEBALDconsult's example shows the IndexOf(Char, Int32) or LastIndexOf(Char, Int32) searching from the last known position. That's identical to what you're doing, but I would think Microsoft has it optimized.
Luc Pattyn wrote: I'm not sure what you mean by my first and second approach;
Your cSearch (while) was the first one. That one is somewhat self-explanatory, but the second one allows me to create more dynamic searches in case the values that I'm searching for change. It also means I am not fixed at N amount of while loops.
Luc Pattyn wrote: When in doubt, just give it a try.
Yeah, I intend to. I just don't have Visual Studio here and no admin rights to install the Express version. We need a Portable Visual Studio Microsoft.
|
|
|
|
|
Bassam Abdul-Baki wrote: but I would think Microsoft has it optimized
I went to look, but found out that it's implemented as a call to "somewhere" (0xFFFFFFFFFF47A100, any clues?)
So I still know nothing. It could be optimized.. but knowing MS it is likely to be a highly generic "good for all CPU's" routine and therefore rather slow. That's what they usually do.
|
|
|
|
|
Luc Pattyn wrote: IndexOf is fine for short strings; it is also fine for long strings if you know the character is present. It is a waste if you need several of them and most of those return nothing.
Huh? You do know there are overloads for IndexOf that take the starting position to scan from, right? Scan for first character, then scan for the second character starting from the position that follows the first, then scan for the third character starting from the position that follows the second character, and so on. I figure that would be very clear code that would be very performant as well.
int index = str.IndexOf(char3, str.IndexOf(char2, str.IndexOf(char1) + 1) + 1);
|
|
|
|
|
I like that. Short and succinct. However, Luc's second method is still more dynamic. I guess we could always put this one in a loop as well.
|
|
|
|