|
I saw this one a long time ago, and finally have a use for it. Thank you very much.
Chris Richardson
Programmers find all sorts of ingenious ways to screw ourselves over. - Tim Smith
|
|
|
|
|
No problem. I hope it serves you well.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|
|
I converted this into C# and bingo...
I tried to break it but couldn't
|
|
|
|
|
You didn't try hard enough:
wildcmp(NULL,whatever)
will break it.
Hector Santos, CTO
Santronics Software, Inc.
http:/www.santronics.com
|
|
|
|
|
The C# code would NOT break, instead the framework would throw an exception in this case. With appropriate exception handling routines, pointer checking in C# is useless overhead.
Cheers anyway,
K. C. Dorner
IBM Billing Solution
|
|
|
|
|
Exception handling in the dotnet framework is slow like hell
we will see
|
|
|
|
|
Where can I get the C# version?
- Bruce
BRCKCC
|
|
|
|
|
Hi !
Wery useful function, save at least a one sigarette lifetime
Seriously - great code.
Stanislav.
|
|
|
|
|
Yeah...good stuff. How long did it take you?
|
|
|
|
|
Could anyone explain how this code works for me? I am having trouble trying to figure out what is going on in a couple places. I would think a short explination would help out some other people like me who don't know C. Thanks in advance!
|
|
|
|
|
Ok, I'll try, even though I think it would be a good idea for you to learn C
The first loop basically goes through both strings step by step until there is a * in the wild string.
When ever the characters of the both strings don't match and the character in the wild string is no ? the function returns 0 (FALSE) = no match.
(I'm not a hundred percent sure, 'cause I don't have time to test it, but I guess this loop is for speed reasons only)
The second loop does the hard thing:
if (*wild == '*') {
if (!*++wild) {
return 1;
}
mp = wild;
cp = string+1;
This if stores the positions of the string pointers, when *wild is a star
(*wild is the character of wild at the current position of the pointer *wild - easy explanation, not 100% correct)
If this * is the last character in the wild string, it returns 1 (TRUE) = match.
} else {
wild = mp;
string = cp++;
This part if the ifs basically solves two things in one.
Firstly it's responsible to increase the pointer position of the string string pointer.
Secondly it returns the two pointers after a wrong go through to the end.
} else if ((*wild == *string) || (*wild == '?')) {
wild++;
string++;
This part does the same as the first loop, just after the first *.
while (*wild == '*') {
wild++;
}
Well, this loop just ingores several * at the end of the wild string.
return !*wild;
And now, that's a nice one
I like it.
After going through all the * in the last loop, the wild string can now contain either
- nothing anymore, that means *wild is NULL, or
- anything but nothing.
Is *wild NULL that means all the comparisons were successful and the function can return 1.
Or easier: it returns !*wild = not NULL = 1
Is it not NULL, but just any character, !*wild will be 0.
So this
return !*wild;
basically replaces
if (*wild = '') {
return 1;
} else {
return 0;
}
or something like this.
An example to explain how it really works:
wild is 'bl?h.*g'
string is 'blah.jpgeg'
After the first loop where 'b' is 'b' and 'l' is 'l' and '?' is 'a' and 'h' is 'h' and '.' is '.'
the position of the two pointers *wild and *string look like this:
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
Now the second loop starts:
the pointer *string is increased until it points to a character that is the same as the *wild+1.
That means it looks for a 'g' in the string string beginning from the current position.
This increment is done by the last else, as explained above as firstly.
So it will look like this
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
Now the second part if the ifs increases both pointers, because 'g' == 'g'.
*wild is now NULL because the g was the last character in the wild string.
*string is 'e'
Because Null != 'e' it sets back the pointers to the values they had before the comparison rush.
This is done again by the last else part, as explained as secondly above.
But the diferrence is now, that the *string pointer is one character further than then.
It looks now like that:
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
This change compared to the first time is done by the cp++ of
} else {
wild = mp;
string = cp++;
where the pointer cp is incremented.
The same game starts all over again and again it doesn't succeed.
So next time it will look like this:
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
and:
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
and finally:
*wild |<br />
'bl?h.*g'<br />
'blah.jpgeg'<br />
*string |
And this will end the loop, as *string will be NULL after the next run.
Well, it's not an easy explanation, as the problem of wildcard search is not really as easy as C++ Guru wants it to have.
(Maybe for a guru, it's easy )
Don't hesitate to ask, if the answer is not understandable.
And of course please correct me, if something is wrong!
Targys
|
|
|
|
|
why use local variables and so many loops? it can be much easier to match two strings. i don't understand why so many people spend hours to search the web for wildcard matching when they can write it themselves in 5 minutes time??!
the code below could be shorter but it's easier to read like this.
i didn't debug it very much but it will work, though.
// -------------------------------------------------------------------
int wildcmp(const char* wild, const char* string)
// -------------------------------------------------------------------
{
if(*wild == *string)
return '\0' == *string || wildcmp(++wild, ++string);
if('\0' == *string)
return '*' == *wild && wildcmp(++wild, string);
switch(*wild)
{
case '?':
return wildcmp(++wild, ++string);
case '*':
wild++;
if('\0' == *wild)
return 1;
while(*string != '\0')
if(wildcmp(wild, string++))
return 1;
default:
return 0;
}
}
yours,
the c++ guru himself.
|
|
|
|
|
Works fine, can't beat!
But how the hell can you make this even shorter???
|
|
|
|
|
no problem, but it looks very ugly. but i think you cannot do it with less characters. i would be glad to find out i was wrong, so try to do it shorter!
int wildcmp(const char* w, const char* s)
{
if(*w == *s) return !*s || wildcmp(++w, ++s);
if(!*s) return '*' == *w && wildcmp(++w, s);
if('?' == *w) return wildcmp(++w, ++s);
if('*' == *w) if(!*++w) return 1; else while(*s) if(wildcmp(w, s++)) return 1;
return 0;
}
|
|
|
|
|
I don't know about the rest of you, but I personally prefer FAST code to SHORT code. The function you posted is considerably slower, do some benchmarks, I have.
This code....
int main(int argc, char **argv) {
int x;
for (x=0; x<9999999; x++) {
wildcmp("*t?st?n*this*t*", "testin this sh*t");
}
}
using your function...
real 0m13.170s
user 0m13.120s
sys 0m0.020s
using my function...
real 0m6.804s
user 0m6.790s
sys 0m0.000s
Furthermore I don't see why you criticize me for using loops when you are using a recursive function.
Thanks for your reply anyhow,
Jack
|
|
|
|
|
i didn't criticize you.
i just said that wildcard matching is not a very heavy problem to solve and that i don't understand why people spend hours looking around to find code while it takes a cigarette's lifetime to do it yourself.
first of all i feel happy that my function works at all! secondly, i know that recursion produces noticeable overhead, and as i said it was not my primary objective to write a fast function. i just tried to write it in 5 minutes time.
|
|
|
|
|
|
I'm a speed freak so this is exactly the answer I was looking for after I noticed c++ guru's comments on code complexity. TBH I thought both implementations were straight forward, but speed is more critical.
Cheers
"Two wrongs don't make a right, but three lefts do!" - Alex Barylski
|
|
|
|
|
Gotta go with c++ guru on this one.
unless you are writing code for slow embedded processors with 1K of RAM, you are better off with the much clearer code from c++ guru. I can see at a glance how his code works. However, if there happened to be a bug in Handy code, or i needed to upgrade the functionality, i wouldn't have a clue where to start.
Readability is extremely important when sharing code around.
|
|
|
|
|
What is unclear about my code? Do you not understand pointer arithmetic?
-Jack
|
|
|
|
|
It is just clearer to use C++ gurus style. I take a lot of code from code project and am very grateful for it. With most of the classes i have to make some modification or other as I am writing a commercial application which often requires tweaks. I need wildcard matching in a number of places but maybe a little more than * and ?. Therefore, the first thing i considered was how i could add extra tokens. It appears at a glance that c++ gurus code is easier to grasp.
I understand pointer arithmethic just fine, and i believe the claim that your code is twice as fast. It is more important for me to have bug free code as wildcard comparison functionality is carried out on a very small scale in my application. Any bugs are headaches. I am firmly a c/c++ devotee and hate all the luggage that comes with .NET, VB, etc. But we can have a small tradeoff for readability in the general case. Why give those guys ammunition for their 'C/C++ code is unreadable and only hacks use it' argument?
No offense but i will use the gurus code because i have my reasons. It is not fair to blindly knock it though.
anyway, it is better you submitted this article than no article.
cheers
|
|
|
|
|
This code looks like it would perform slow as hell. A recursive function like this would have an incredible amount of overhead. I think in this case, re-inventing the wheel will bite you in the ass. I imagine that on a longer string, this function will get slower and slower.
|
|
|
|
|
Because often when you reinvent the wheel, you forget to put some spokes in!
It's not just coding time, but also testing time.
|
|
|
|
|
Well... all the other answers speak for themselves, so
I will try not to add anything to it.
May be your code is shorter, and may be it works.
Nevertheless you could have written it a bit more gentle.
Like this, it seems quite a bit arrogant.
In fact, the choice of speed or space always depends on what you need.
Usually speed is more usefull, sometimes you need space (like on a small
linux firewall that should fit on a floppy, or so)
Hope you don't mind that we will use Jack Handy's code
Targys
|
|
|
|
|
a bit more gentle?
if you think jack handy's code is more gentle, fine. but i think you're one of the use-and-dont-ask kind of guys then.
even if it is the faster solution (which end-user application ever requires a million wildcard matches?) the readability of jack's procedure approaches zero. if you do not recognize this, you're either in the wrong business or talking about things you don't know about.
additionally, your 'fit on a floppy' argument is also in the wrong place: my code is shorter, and so is the compiler output.
of course i don't mind if you use jack's code. but if i were to maintain the code, i'd prefer my solution.
regards
|
|
|
|