|
Yes true but instead they can be bothered to go to the menu bar, click the post question link, type in their question, tag it, write the body (well sometimes they can't be bothered with that ) and then click the post button and then spend ages waiting for an answer...surely searching is less effort?
|
|
|
|
|
No, searching requires at least some thinking, which the worst of them do not want to do. That is why they re-ask the same question when someone only gives them links to descriptions of how to do something. They want complete, fully functional and debugged code.
Just because the code works, it doesn't mean that it is good code.
|
|
|
|
|
Very true A bit sad though when you think about it...
|
|
|
|
|
I'm guilty of doing this, admittedly. Or, I'll do a search and can't find it, post the question, and a few hours, search again.
It would be that on the second search I find what I'm looking for... Live and learn, I guess.
|
|
|
|
|
drummerboy0511 wrote: It would be that on the second search I find what I'm looking for
Maybe you have just found an answer to your own post -- googlebots are fast.
Greetings - Jacek
|
|
|
|
|
So, I recently changed jobs (4 months ago) and have found some humorous legacy code in the past. But nothing, nothing comes close to the database horror I found. For several days, the senior architect and my manager have been trying to figure out why our nightly and weekly stored procedures were taking so long as to time out. As it turns out, neither have and dba experience.
Well today the discussion turned to hiring an outsider to audit the tables, index, stored procedures, etc. I kindly volunteered to have a look myself, kindly pointing out my database experience. After 15 minutes I found the problem.
There is a table called trades, it has ~23,000,000 rows, and is almost always queried by tradeid and date columns. Not surprisingly the primary key consists of tradeid and date. I looked at the indexes and was shocked to see that there was a clustered index made up of open_quantity and order_type; values of which are almost always 0 and 0 or 1 respectively and almost never search criteria. I looked back at the primary key, and it was set to non-clustered.
This table has existed as such for years- recording and reporting positions and trends on a public facing website the entire time.
I was told I earned my pay for the week, but I'm not allowed to just go home
Update:
We created a clustered index on the id and time column and kept the original clustered index as a non-clustered index. It did solve the problems we were having, primarily when inserts were extremely fast, select queries would become impossibly slow.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
modified on Tuesday, March 22, 2011 10:30 AM
|
|
|
|
|
You should have mentioned how much you have saved them getting in an external auditor to do the job and claim that as a pay bonus!
Don't vote my posts down just because you don't understand them - if you lack the superior intelligence that I possess then simply walk away
|
|
|
|
|
We get an evenly distributed monthly bonus based on the profitability of our department. Technically, since I saved that cost, the extra profit should be reflected in the monthly bonus pool for everyone. But yeah, I feel entitled to a little more of it this time around. I will probably poke around and bring it up once word gets out the major problem was fixed.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
|
|
|
|
|
sounds like a good scheme in general
Don't vote my posts down just because you don't understand them - if you lack the superior intelligence that I possess then simply walk away
|
|
|
|
|
Think of it this way... you get the respect of your peers.
|
|
|
|
|
I wouldn't think the clustering vs non-clustering would not be a very big problem. The open_quantity and order_type index could be useful if the goal is, for example, to process all orders which have not yet been processed. Rather than scan 23,000,000 rows, you'd only have to process the 10 unprocessed orders. You sure you found the problem?
|
|
|
|
|
The open_quantity is never a search criteria. Its value is 0 99.99% of the time, and we never search by open_quantity or order_type, if we did, the order_id and date would be included as well. Have you ever inserted large sets of data into a table that had a clustered index, while the inserted data would have to be physically stored into the middle of the table because of the index? It's quite ugly.
I'm not suggesting eliminating the clustered index, but there's no reason the data need be physically stored in order according to open quantity and order type.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
|
|
|
|
|
You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?
|
|
|
|
|
Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
|
|
|
|
|
See here. Basically, the row data is stored with the clustered index, which is stored as a B-tree, so insertion is not an expensive operation (i.e., this should not be the cause of your problem).
|
|
|
|
|
|
Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.
|
|
|
|
|
What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.
|
|
|
|
|
A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1).
Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.
|
|
|
|
|
It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days.
For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.
|
|
|
|
|
Andy Brummer wrote: inserting 35,000 records a second for 22 hours straight
Inserting nearly 3 billion records is going to take a while no matter what you do.
Andy Brummer wrote: it makes the difference between inserting 5 million records in a few minutes or
in a few days.
I don't think it would cause a 1,000x slowdown. 10x, maybe... but 1,000x? That doesn't sound right.
|
|
|
|
|
I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
|
|
|
|
|
Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.
|
|
|
|
|
A lot of the time has to do with speculative reads missing and the random read write access on the disks pushing them from < 10ms access times to over 100ms. SQL server optimizes it's accesses for overall throughput and that can drive latency to ridiculous levels. As much as it doesn't sound like it should be a big deal. I can tell you from real world experience it is.
|
|
|
|
|
I don't know why this was down voted, hopefully I corrected it enough.
"Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!"
— Hunter S. Thompson
|
|
|
|
|