This post discusses why OR
conditions in join
statements should be avoided and gives an example fix.
OR
conditions and join
statements – some things just don’t go well together. Similar to how OR
conditions can significantly impair queries having subqueries, they can wreak havoc on join
s as well.
Take, for example, the following [admittedly unrealistic] query. This assumes a numbers table [numbers] has been set up. [See my Cartesian join post for a query to set up a numbers table.] Assume the numbers table only has 10,000 records.
SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON ( (n1.num = n2.num - 400) OR (n1.num = n2.num + 400))
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num
This query looks simple enough. However, even though the source table only has 10,000 rows and the result only has 18,900 records, it takes 5 seconds to complete, and Management Studio [and frankly my entire computer] freezes while it is running. This query spiked CPU usage to 100%. That’s not good – join
s and OR
conditions simply do not mix [at least on SQL Server].
I encountered this exact type of scenario at BPS. We have a DTS package for bringing over student
data from the source location to another database [with a different schema] where the school assignments are done. One of the steps took 20 minutes, and I simply lived with it the first few times it ran. But because it’s best to ask if there’s a better way, I decided to investigate the step. With a 20 minute runtime, I fully expected to see some highly complex tangled web of SQL with 30 joins and 150 lines. I couldn’t have been more wrong – it was a simple query resembling the one above [with an OR
condition in the join
]. In under 5 minutes, I’d broken apart the query into two separate steps and what previously took 20 minutes to run only took 2 seconds.
For the query above, here’s one solution [using a UNION
] for breaking apart the OR
. There are likely other ways and I invite comments for other approaches to removing the OR
condition.
SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num - 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
UNION
SELECT n1.num, n2.num, n3.num FROM numbers n1
INNER JOIN numbers n2 ON n1.num = n2.num + 400
INNER JOIN numbers n3 ON n2.num = n3.num - 300
ORDER BY n1.num, n2.num, n3.num
This query runs instantly with a numbers table having 10,000 rows. It finishes in 8 seconds for a table having 800,000 rows. If I had used the larger table with the first query, I’d probably have to reboot.