Introduction
A best table is defined as, "A table containing all the Columns given in the Input with none or the least number of other Columns." Sounds like a Physics definition..!! That's just what was defined for it as the requirement was that way.:-)
It's a Stored Procedure which takes the CSV
of Columns
and its Count
from the front end and returns the Best Table, i.e., a Table which contains all the mentioned Columns in the CSV format plus none or the least number of Other Columns. It is more clearly explained in the SQL file attached.
Background
Was developed for a Search engine:
- Where there would be 'n' number of fields which can be used for searching
- Tables are also split up very much so that table size does not gets too huge
- But because of which we have many tables here and if the Customer selects/enters some search field alone, then we need to track the table first which needs to be searched and proceed from there.
- Our solution will find the Best Table to be used and returns it.
Using the Code
It's a SQL query which you need to just execute with 2 parameters as mentioned earlier:
CSV of Columns (Fields) (@ColumnNames)
Its Count (@ColumnCount)
Pass these parameters from the UI.
EXECUTE [Proc_findbesttable] '''ID'',''Name'',''Phone''', 3
The query is below (It contains only sample inputs, the attached source has the proper procedure with parameters in it). For explanation purposes, I have mentioned the below code here.
SELECT TOP 1 t.[name] AS BestTable
FROM sys.tables t
INNER JOIN syscolumns c
ON c.id = t.object_id
AND c.[name] IN ( 'ID', 'Name' )
GROUP BY t.[name],
t.[max_column_id_used]
HAVING ( Count(t.[name]) = 2 )
ORDER BY t.[max_column_id_used] ASC
Group By
is used since a table's name is unique in a database.
All it involves is joining the sys.tables
and syscolumns
with only the specific Columns intended and this kind of joining will create a One to Many relationship (Table - Column) and the JOIN
also will link tables even with lesser columns than the given input because of the IN
there.
So the logic is this: If a particular match produces a Table-Column relation with exactly the same number of Column Count mentioned, then that's the Best Table and that's what the HAVING
does there. Note the point that this query is intended to return only the Best Table or nothing and this Query will not return anything if no table contains all the columns passed as Input.
Points of Interest
I initially decided to find out the Number of Columns in the Table manually then came across max_column_id_used
. But it has its own limitation, even if a Column is dropped of a table, the max_column_id_used
remains the same. Remember it is only incremented and never decremented as its purpose is to get how many columns were used in the Table's entire life.
Tip: So if your tables are change prone, then prepare a separate logic for getting a Table's Column Count and redefine the above query.
Note: There may be other challenges arising like what if exactly 2 tables have the same Column set. The above Query was constructed where such a scenario didn't exist. So if you encounter one, then Include another parameter called TablePurpose
and make it a column in all your tables.
(But you might be saving the same Table Description in all the rows - a kind of redundancy [Or even you might have a separate table for saving TableID & Table
purpose] ) (just a rough idea of what to do.) Use that as an additional filter to narrow down to a single result. Column datatype
also can be used if they differ, but that would be a tedious method.