|
|
Hi All,
I currently have a problem with two tables in SQL Server 2005. The tables have grown very large and sometimes lock up when small amounts of data are requested from them. The current record count is 9731495(PrintMedia) on the one table and 4126084(PrintArticles) on the other one. Table names: PrintArticles and PrintMedia. Whats worse is that these two tables are joined on each other in sql queries. This is because the database is non-relational. So joins are used to substitute for foreign keys. The performance needs to be improved. The SQL server version is 2005 standard edition - So partitioning is not possible
Here is a breakdown of the Tables with their fields, types and Max sizes:
PrintArticles:
Analysis int NULL
width float NULL
height float NULL
ArtSize float NULL
ID (PK) int NULL
Cutting text 2147483647 -This field contains on average 3000 characters
CutID int NULL
NewbaseNo varchar 20
AnalysisApplic int NULL
AnalysisUser nvarchar 50
Translations ntext 1073741823
AnalysisApplicDate datetime NULL
AnalysisApplicTime nvarchar 5
PerTranslation text 2147483647
PerTranslated int NULL
This table also has 3 indexes: 1 clustered on the PK and 2 non clustered
PrintMedia
PrintID (PK) int NULL
CuttID int NULL
Graphic int NULL
FpageSection int NULL
Caption varchar 1000
PDFPath varchar 300
Branch varchar 30
UploadTime varchar 10
AnalysisTag varchar 500
UserID varchar 30
Modifier varchar 30
DateModified varchar 10
PubID int NULL
NewBaseArticleNo varchar 20
Edition varchar 800
IS_HardCopy int NULL
Seen_by varchar 300
DTPDate varchar 10
ColourPDF int NULL
SpokesPerson varchar 200
Mention varchar 200
NLPU int NULL
Server varchar 10
Repl text 2147483647
GroupID int NULL
ArticleID int NULL
PubDate 5 varchar 10
CreatedDate varchar 10
Publication varchar 200
SubPublication varchar 200
Headline varchar 800
SubHeadline varchar 800
Journalist varchar 500
SubJournalist varchar 500
Page varchar 20
Client varchar 50
Category varchar 200
CategoryValue varchar 200
CategoryDisplayName varchar 200
OrderID varchar 5
TagID varchar 5
Language varchar 20
Section varchar 200
CCM real NULL
SizeX real NULL
SizeY real NULL
RandValue real NULL
FrontPageCover int NULL
This table has 12 non clustered indexes
I have to restart the sql server service at least once a day to remove locks. If that fails I have to reorganize the indexes or rebuild them to get the tables working again.
The previous developers wrote applications that use adhoc queries. The normal CRUD and the tables worked fine. But scalability was not considered it seems. Will it help if I modify all the select statements in these apps to not have locks eg "select [columns] from [tablename] with (nolock) where..."?. I just need these tables to perform faster and not lock up.
Plz advise.
|
|
|
|
|
Jacobus01 wrote: Will it help if I modify all the select statements in these apps to not have locks eg "select [columns] from [tablename] with (nolock) where..."?.
It feels faster, doesn't it? The documentation on MSDN reveals why it's faster;
Do not issue shared locks and do not honor exclusive locks. When this option is in effect, it is possible to read an uncommitted transaction or a set of pages that are rolled back in the middle of a read. Dirty reads are possible. Only applies to the SELECT statement.
That means that you could be reading data that's not committed. I'd be very carefull if the database contains a lot of stored procedures and utilizes transactions a lot.
How are the indexes, and what does the profiler say?
I are Troll
|
|
|
|
|
I assume that CutID is the FK to the tables, just wondering why it is not the primary on one of the table? Totally irrelevant to your problem.
I would first put in place a nightly maintenance job that cleaned up the indexes.
As to Eddys issue with uncommitted data, I would assume that queries are filtered and therefore this dodgy data should not be an issue except where you are specifically getting the latest information. There tends to be a small window of data that is subject to change, the last few records in the table. If you give your users a caveat that it may be dodgy I see no reason (nolock) should not be used.
CAVEAT - note I have never used (nolock) so take this cautiously.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
If a primary key is dropped from a table and some records are duplicated after dropping primary key, then is it possible to apply the primary key on the same column. If yes, then how??
NEHA GUPTA
|
|
|
|
|
No, not really. The whole point of a primary key is that it is unique for each row (be it a surrogate key or a key made up from several columns). If you have duplicate rows then, by definition, you will be breaking the rule. Remove the duplicate rows and then apply a primary key.
Why would you deliberately remove the primary key and add duplicate rows?
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
nils illegitimus carborundum
me, me, me
|
|
|
|
|
ya how can again you set the primary key which already have two rows of same key ??? it is not possible...
suchita
|
|
|
|
|
but there is some option when we alter the table, (when we add constraint) with an option "with check" and "with nocheck". What is that option? Please clarify..
NEHA GUPTA
|
|
|
|
|
If you add a new constraint, Sql Server will test whether or not all of the records that are currently in the table pass that constraint.
Adding the NOCHECK option will skip that test.
You'd have to remove any double entries to recreate your primary key.
I are Troll
|
|
|
|
|
I'm currently in the process of creating a service records database for storing information on machine servicing.
In the un-normalised customer table I have three attributes for recording telephone contact details: LandlineNumber, MobileNumber and FaxNumber.
Ordinarily, these attributes should be decomposed through the normal forms; 1NF, 2NF and 3NF.
TELEPHONE_1NF(TelNumber, CustomerNumber, TelNumberType)
TELEPHONE_2NF(TelNumber, CustomerNumber)
TELEPHONE_TYPE_2NF(TelNumber, TelType)
TELEPHONE_3NF(TelNumber, CustomerNumber)
TELEPHONE_TYPE_3NF(TelNumber, TelType)
My question is, in the real world, do you database designers out there ACTUALLY do this, or would you leave the three different telephone attributes in the customer table and invite null entries? Or, do you use a different strategy completely?
There is only one satisfying way to boot a computer.
|
|
|
|
|
My vote is to leave the 3 telephone attributes in the Customer Record and allow for blank/Null numbers.
There is an argument for keeping things simple and in this case I would go for simplicity.
|
|
|
|
|
Thanks for the advice. My college lecturer would be spitting in his beer if he even knew I was asking the question! However, having been taught the 'correct way' I'm wondering what real database designers ACTUALLY do; whether they stick to the theory or compromise.
There is only one satisfying way to boot a computer.
|
|
|
|
|
In most business cases I would go for the denormalised structure as David suggested.
However I have seen this type on Nth level normalisation work quite well, especially if it is a commercial product and must be extensible to meet user requirements. It also requires a GREAT support team and constant retention of the IP in the development team.
Corporate products can accept greater restrictions on their boundaries. If you want 4 phone number then stick it in Outlook, bugger off.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I also agree with what David said
|
|
|
|
|
I would take the view that people may have more than 3 contact numbers (attributes) and by normalising out you are allowing for any number of numbers, so to speak, and building in the ability to add as many as required in the future. On the other hand if 3 numbers is set in stone (as if anything ever really is) then leave as is.
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
nils illegitimus carborundum
me, me, me
|
|
|
|
|
Your teacher is right, 3NF would be the minimum requirement before I'd call it a well-designed database. You are right too; in the real world, databases are created when needed, and grow over time - not all of them are designed upfront.
Whenever I can, I try and normalize. Your example would make it very hard to get the complete address-details of the customer, since you'd be seeing a similar construction with ZipCodes and streets.
Digital Thunder wrote: would you leave the three different telephone attributes in the customer table and invite null entries?
You don't want to do this;
SELECT ([phone1], [phone2], [phone3]) from ..
Before you know it, customers will want to add extra information, like a short description (e.g. "this number only between 1 and 5 am") or a bool column to flag disconnected lines.
That would lead to a
SELECT ([phone1], [phone1desc], [phone2], [phone2desc], [phone3] ..
Might not sound very useful in this particular example, but I'm sure that most of us had to update a similar database at least once in their career. Imagine you have a table that all of a sudden would receive four or five more attributes
I are Troll
|
|
|
|
|
Eddy Vluggen wrote: You don't want to do this;
SELECT ([phone1], [phone2], [phone3]) from ..
Before you know it, customers will want to add extra information, like a short description (e.g. "this number only between 1 and 5 am") or a bool column to flag disconnected lines.
That would lead to a
SELECT ([phone1], [phone1desc], [phone2], [phone2desc], [phone3] ..
Hmmm, I see what your saying. So, I suppose the answer to my original question is "it depends" judging by the range of replies that the question solicited. It depends on what the database is actually designed to achieve in the first place and what are the consequences moving from a theoretically correct design to a design that has compromises which fly in the face of conventional theory.
My thanks go out to all of you that took the trouble to reply to the original question.
There is only one satisfying way to boot a computer.
|
|
|
|
|
Digital Thunder wrote: s moving from a theoretically correct design to a design that has compromises
Stick to the theoretically correct design, and make sure that you can explain why you denormalized that particular table.
Your teach has prolly given some good arguments why you should apply at least 3NF. A good reason for denormalization would be partitioning - if the table were too large, and you'd like to divide it over two databases.
IRL you'll find lots of denormalized databases, simply because there's lots of people creating databases who never heard of Codd, or decided to take a shortcut.
I are Troll
|
|
|
|
|
Eddy Vluggen wrote: Your teach has prolly given some good arguments why you should apply at least 3NF.
He certainly did. Basically, we were told, without normalising to 3NF we wouldn't be able to get the database to do what we want it to do and that it would be rather error prone and that if we wanted to make any money out of database design then we had better normalise to 3NF otherwise we would be penniless! The latter is a very convincing argument.
There is only one satisfying way to boot a computer.
|
|
|
|
|
I have learned through experience to do what your instructor describes -- things will become more complex over time. Do it correctly now.
"It's better to have it and not need it than to need it and not have it."
"A stitch in time saves nine."
At one place I worked, we needed to hold a credit card number for each account (actually, account-holders also had the option not to provide one) -- great, it (and its type; Master Card, Visa, etc.) was in the Account table. Then we had a customer who wanted to store two credit card numbers (so there would be a backup in case the other was declined ) -- OK, the DBAs added it to the Account table with another field to indicate which one had been charged most recently. Then we had another customer who also wanted us to support ACH transactions against bank accounts -- oh, crap -- the DBAs added yet more fields to the Account table. I left the company after writing a document explaining that I thought that all that data should be moved to a separate table before things got completely out of hand.
I suspect that in your case you want to generalize it to some sort of "contact method" table which includes telephone numbers, email addresses, icy cucumbers, etc.
|
|
|
|
|
I inherited a table with over 250 columns. Many no longer used, several should have been in their own table.
But the table grew just like yours (need this one thing, well now I need this one other thing).
Building normalized to start is the best path.
|
|
|
|
|
PIEBALDconsult wrote: I suspect that in your case you want to generalize it to some sort of "contact method" table which includes telephone numbers, email addresses, icy cucumbers, etc.
Not quite sure about the icy cucumbers but yes, you've hit the nail on the head, that's exactly what it's for. I actually finished college 5 years ago, and this is the first occasion that has presented itself to put into practice what I actually learned, so I'm a bit rusty at the minute and having to rely on lecture notes, assignments and Database Systems (2nd Ed.) by Begg and Connolly, which is heavy going at times to say the least.
There is only one satisfying way to boot a computer.
|
|
|
|
|
Digital Thunder wrote: heavy going at times
Yeah, the other week I again looked up (Wikipedia and others) third-normal-form and again couldn't make sense of the discussion. I absolutely know how to use third-normal-form, but the terms that are used to describe it are non-sensical (unless there is someone there actually telling you and applying it to some data).
It's kind of like being able to read, write, and speak a language fluently but not understanding linguistics. Really, only cunning linguists need to know what the term "pluperfect" means or even that it exists. And all those hours spent diagramming sentences didn't help us speak betterishly.
|
|
|
|
|
PIEBALDconsult wrote: unless there is someone there actually telling you and applying it to some data
I know EXACTLY what you mean. When the lecturer is explaining examples to drive home a point, it's quite obvious what he is actually getting at. However, I to have looked at a few examples on the web and some of them have only served to confuse me even more than I was! I did however, find a very good article that condensed a lot of detail into relatively few pages and the examples given really did a good job in explaining the detail.
There is only one satisfying way to boot a computer.
|
|
|
|
|
Digital Thunder wrote: confuse me even more
Exactly.
Digital Thunder wrote: a very good article
Got a link?
|
|
|
|