What’s more important – speed or data accuracy/quality when generating reports? There’s no definitive correct answer to this question as there’s often a trade-off between the two. In many cases, a cost/benefit analysis needs to be done to decide how to best approach a problem. Over time, there will be a number of posts related to the subject of speed vs. accuracy, although each post might be about unrelated things [for example, when to use the NOLOCK
SQL Server hint, when to denormalize vs. normalize, the structure of queries, how to approach database backups, etc].
Today I’m going to discuss nightly populated tables. At Boston Public Schools, student data changes frequently. New test scores are being entered in daily, student enrollment records are changed, attendance information is updated, etc. Similarly, data is reported on frequently. Whether it be an ad-hoc report being developed for the Enrollment department or a test score tally available for all schools to see, numbers are constantly being crunched and data is being pulled from dozens of tables for reporting purposes.
Some of the logic behind the reporting queries is quite complex. For a report that’s frequently run, it can cause a performance drain on the system. What’s the solution we’ve put in place in some scenarios? Have some reporting tables be populated nightly from a SQL Server job. The benefit is speed, but at the cost of a loss of accuracy in the report. Specifically, the data is up to one day out of date. In some cases, this is a deal breaker, while in other cases, being a day behind is no big deal.
Also, another thing that’s helped is a number of our reports have moved to the data warehouse system we have at BPS. This has helped transition some of the most complex reports away from using the primary OLTP tables.
Truth be told, some of the complex queries still do run directly against our OLTP tables. Because of the requirements of the reports, the data has to be up to date and reflect any changes being made during the day. The important thing, however, is to always ask whether there’s an opportunity for a particular report to fully or at least partially be pulled using nightly/interval-based generated data. Even moving half the ‘reporting’ queries off the OLTP tables can improve server load a good deal.
In cases where decisions are being made based on the tallies/number crunching against the live OLTP tables, another alternative is to simply pull the necessary data to the business logic and have the number crunching be done there. In some cases, this is an alternative, while in others, it really has to be done at the SQL query level. It depends on the problem, what tables need to be pulled from, etc. [If doing the calculations in the business logic means needing to pull millions of records from several tables, then this option might not be practical.] Also, as I pointed out previously, there can be drawbacks of moving logic away from the SQL Server level that need to be considered.
In a future post, I’ll revisit this particular topic with some more specific examples of how nightly generated tables have been beneficial at BPS.