|
Luc Pattyn wrote: I don't understand what makes it so hard to search a simple text field.
7.5 million members and a no support for partial word searches in SQL Full Text indexes, and insanely slow searches when doing LIKE scans.
As I have said before: we are working on search. Not a tweak but a complete rewrite.
cheers,
Chris Maunder
The Code Project | Co-founder
Microsoft C++ MVP
|
|
|
|
|
I understand partial-word searches probably (or even inherently) are slower, however they sometimes work well, and often they don't. That is the puzzling part of it. I saw no indication of a time-out, all I got was a single match within a few seconds.
[ADDED] Seems like partial word search does not work at all any more, appending an asterisk doesn't do a thing, no error message, no change in the results??? (Just tried "Maund*", only found "Jay Maund"). This is new, it has worked before. If you don't support it any more, please put up an error message when one enters an invalid search term. [/ADDED]
|
|
|
|
|
I haven't touched the search, Luc. It's behaving as it's always behaved. I've gone back in the SVN logs to mid 2009 and there's been no change to the way we search by member name.
cheers,
Chris Maunder
The Code Project | Co-founder
Microsoft C++ MVP
|
|
|
|
|
I'm positive it has worked for me, i.e. without a trailing asterisk, it used to look for full words, and with a trailing asterisk it looked for "starting with...". Which was fine, not perfect, but fine as I usually know how a name (or a name part) starts. In fact, you may even have pointed it out to me long ago, not sure here. I do recall having a similar conversation earlier, when I was looking for Dave K's correct name, however I can't find it right now
Since it doesn't do "trailing asterisk means starts with..." now, if you haven't changed it, then SQL Server has changed or I'm all wrong? I find that very strange.
|
|
|
|
|
We have upgraded from SQL 2005 to SQL 2008 and then to 2008 R2. A change like this would be a major breaking change so I doubt it would be that.
The main point is that I understand the current system is suboptimal and we are currently working on fixing it.
You're one of our KPIs
cheers,
Chris Maunder
The Code Project | Co-founder
Microsoft C++ MVP
|
|
|
|
|
Chris Maunder wrote: You're one of our KPIs
I hope my expectations aren't that exceptional, and it is more me being rather explicit about them...
I just read the first Aimee.NET article, looks promising. Do you have a tentative time schedule?
It also made me wonder how come SQL Server doesn't offer a good solution all by itself. Isn't everyone wanting partial-word search (at least starting-with search)?
|
|
|
|
|
Sounds like an interesting challenge. Here is a recommendation for a custom data structure that would allow for quick partial-word matches.
Put all usernames into a tree. There are a few root nodes, one for each character that can be in a username. Each node in the tree builds a portion of a username. So, if you have "Bob", the tree would look like this:
B-+
|
O-+
|
B-+
|
[Data Associated With Bob]
Now, create a tree of partial names. For a username that is 3 letters long, there will be one main entry (in the above tree), and 2 entries in the partial tree:
O-+
|
B-+
|
[Pointer To Data Associated With Bob]
[Poitner To Data Associated With Scrob]
B-+
|
[Pointer To Data Associated With Bob]
[Pointer To Data Associated With Scrob]
[Pointer To Data Associated With Bub]
For a partial-word search, you would look in the partial username tree. If somebody searched for ".ob", you'd look for a root node of "O", then find the "B" child, and find all pointer nodes to find the usernames that end with "ob". You'd also want to search subtrees to find names that contain "ob" followed by more letters. You'd probably want to make some lower limit... say, no fewer than 3 characters can be entered for a partial-name search. This "tree" could be implemented as part of a database. Accounting for typical username length and the number of users, the size of the data would probably be around 15,000,000 units (each unit being, say, 10 bytes).
There are probably better data structures for the more general case, but this would probably work well for short text snippets (as is the case with usernames).
|
|
|
|
|
I'm not sure we want to reinvent the wheel
cheers,
Chris Maunder
The Code Project | Co-founder
Microsoft C++ MVP
|
|
|
|
|
That calls for:
MARKETING GIRL:
When you have been in marketing as long as I have, you’ll know that before any new product can be developed, it has to be properly researched. I mean yes, yes we’ve got to find out what people want from fire, I mean how do they relate to it, the image -
FORD:
Oh, stick it up your nose.
MARKETING GIRL:
Yes which is precisely the sort of thing we need to know, I mean do people want fire that can be fitted nasally.
CHAIRMAN:
Yes, and, and, and the wheel. What about this wheel thingy? Sounds a terribly interesting project to me.
MARKETING GIRL:
Er, yeah, well we’re having a little, er, difficulty here…
FORD:
Difficulty?! It’s the single simplest machine in the entire universe!
MARKETING GIRL:
Well alright mister wise guy, if you’re so clever you tell us what colour it should be!
|
|
|
|
|
any color you like, as long as it is black. Or has Ford marketing evolved beyond that?
|
|
|
|
|
Then take the bloody training wheels off your search and use something that's all growd up!
|
|
|
|
|
|
Wish I had time to refactor open source projects! Awesome!
Does Lucene (now Aimee) support fast partial-word searches?
|
|
|
|
|
According to this, partial word searches in Lucene may be slow for leading wildcards. My "wheel" technique solves that.
|
|
|
|
|
<Disclaimer> this guy is[^] is just ahead behind* of me in the total rep points stakes </Disclaimer> but can his rep calculation be right? He seems to have gained 23k rep points in the space of one day (orless than a week anyhow). Prettty much all these in the "organiser" category. Either there are shenanigans going on (in which case I don't care), or you have a bug(in which case you might). Hopefully I've spotted something of interest.
* I think I overtook him with this post!
|
|
|
|
|
0 bookmark
1 goto 1
Panic, Chaos, Destruction.
My work here is done.
or "Drink. Get drunk. Fall over." - P O'H
OK, I will win to day or my name isn't Ethel Crudacre! - DD Ethel Crudacre
|
|
|
|
|
Nagy Vilmos wrote: 0 bookmark
1 goto 1
That will not yield 23K rep points unless you schedule it at a high frequency. Please check your codez.
|
|
|
|
|
0 bookmark
1 plus 5 vote 25 posts in lounge
2 plus 5 vote 25 articles
3 plus 5 vote 25 comments in programming fora
4 plus 5 vote 25 ...etc etc
10 goto 0
------------------------------------
I will never again mention that I was the poster of the One Millionth Lounge Post, nor that it was complete drivel. Dalek Dave
CCC League Table Link
CCC Link[ ^]
|
|
|
|
|
Luc's implementation is more efficient. Only bookmarking isn't capped at 25 actions/day.
3x12=36
2x12=24
1x12=12
0x12=18
|
|
|
|
|
Well, it isn't my implementation; I'm still just a keyboard-and-mouse jockey. I was merely pointing out the loop wasn't constructed correctly.
|
|
|
|
|
I was going to come here and ask pretty much the same question. Some smells, and it's not my feet.
.45 ACP - because shooting twice is just silly ----- "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997 ----- "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001
|
|
|
|
|
In my case, it probably is my feet that smell, but this certainly smells worse.
I wasn't, now I am, then I won't be anymore.
|
|
|
|
|
crazy info here...most likely means nothing. Googled monkey118 and see a name attached "Chris Pearson". we have a few Chris Pearson's here. Maybe monkey118 is a dummy account.
|
|
|
|
|
cheers,
Chris Maunder
The Code Project | Co-founder
Microsoft C++ MVP
|
|
|
|
|
Just went to the guy's profile (that has been deleted I guess) and I'm getting the following:
Unable to load the requested member's information.
Which is fine but underneath it are the tabs (where the information usually would be) only the tab name's are empty and the tab pane as well.
It just looks strange.
I have a screenshot if you want / need it
|
|
|
|