The Best Table Finder (The BestCube Problem)

BharatRamV

4.08/5 (6 votes)

20 Dec 2016CPOL3 min read

15.2K

Finds the Best Table for a given set of columns

Download source code - 924 B

Introduction

A best table is defined as, "A table containing all the Columns given in the Input with none or the least number of other Columns." Sounds like a Physics definition..!! That's just what was defined for it as the requirement was that way.:-)

It's a Stored Procedure which takes the CSV of Columns and its Count from the front end and returns the Best Table, i.e., a Table which contains all the mentioned Columns in the CSV format plus none or the least number of Other Columns. It is more clearly explained in the SQL file attached.

Background

Was developed for a Search engine:

Where there would be 'n' number of fields which can be used for searching
Tables are also split up very much so that table size does not gets too huge
But because of which we have many tables here and if the Customer selects/enters some search field alone, then we need to track the table first which needs to be searched and proceed from there.
Our solution will find the Best Table to be used and returns it.

Using the Code

It's a SQL query which you need to just execute with 2 parameters as mentioned earlier:

CSV of Columns (Fields) (@ColumnNames)
Its Count (@ColumnCount)

Pass these parameters from the UI.

SQL

EXECUTE [Proc_findbesttable] '''ID'',''Name'',''Phone''', 3

The query is below (It contains only sample inputs, the attached source has the proper procedure with parameters in it). For explanation purposes, I have mentioned the below code here.

SQL

SELECT TOP 1 t.[name] AS BestTable

FROM   sys.tables t
   INNER JOIN syscolumns c
       ON c.id = t.object_id
          AND c.[name] IN ( 'ID', 'Name' )

GROUP  BY t.[name],
          t.[max_column_id_used]
HAVING ( Count(t.[name]) = 2 )
ORDER  BY t.[max_column_id_used] ASC

Group By is used since a table's name is unique in a database.

All it involves is joining the sys.tables and syscolumns with only the specific Columns intended and this kind of joining will create a One to Many relationship (Table - Column) and the JOIN also will link tables even with lesser columns than the given input because of the IN there.

So the logic is this: If a particular match produces a Table-Column relation with exactly the same number of Column Count mentioned, then that's the Best Table and that's what the HAVING does there. Note the point that this query is intended to return only the Best Table or nothing and this Query will not return anything if no table contains all the columns passed as Input.

Points of Interest

I initially decided to find out the Number of Columns in the Table manually then came across max_column_id_used. But it has its own limitation, even if a Column is dropped of a table, the max_column_id_used remains the same. Remember it is only incremented and never decremented as its purpose is to get how many columns were used in the Table's entire life.

Tip: So if your tables are change prone, then prepare a separate logic for getting a Table's Column Count and redefine the above query.

Note: There may be other challenges arising like what if exactly 2 tables have the same Column set. The above Query was constructed where such a scenario didn't exist. So if you encounter one, then Include another parameter called TablePurpose and make it a column in all your tables.

(But you might be saving the same Table Description in all the rows - a kind of redundancy [Or even you might have a separate table for saving TableID & Table purpose] ) (just a rough idea of what to do.) Use that as an additional filter to narrow down to a single result. Column datatype also can be used if they differ, but that would be a tedious method.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)