Introduction
This article presents the benefit of the practical application of combination formula arithmetic facilitating quality assurance for the design of tables and queries and subsequent execution and analysis.
Background
Consider that we have four repositories that are to contain the assets of an enterprise. An asset might be a computer identified by a GUID (globally unique identifier) or at least unique to an enterprise or some other limiting domain. Examples of such repositories would be Microsoft Active Directory, Microsoft System Center Configuration Manager, a Configuration Management Database (CMDB), an Anti-virus management asset repository, etc. If all repositories were perfectly synchronized, each repository would contain information for each asset at any point in time. Practically, that is generally not the case, and we often need to reconcile and analyze and report on, an asset that is recorded in one or more repositories against the other repositories missing (or recording) the asset. Specifically, an asset may be removed or added to one repository and because of operational or process time delays, the other repositories may not 'yet' reflect that change. Thus, the need to analyze and report on 'sameness' or 'differences' between the repositories.
Implementation
(The provided files with this article, RepositorySetCompare.sql and HeatMap.xlsx supply all mentioned items.)
Specific to this article, that is, 4 repositories, we will ultimately create a '4-up' table that shows each unique asset and each repository it resides in. The rows will contain null
or AssetName
with the repository names for columns. An asset could be in only 1, only 2, only 3, all 4, or combinations thereof. If it's not in any, then we probably don't know about the asset and that is another matter unto itself. Moreover, I assert here that any repository only has an asset listed once. If an asset is contained in any repository more than once, then that too, is another matter I do not consider within this article. (Although, the accompanying SQL prevents this situation.)
We can readily discern there are several combinations and we can eagerly list all combinations (as is often the case), or better yet, apply combination formula arithmetic to ensure we can list the appropriate distinct combinations. Not more, not less, not duplicating with different wording. Listing, exactly, the appropriate number of distinct combinations.
Since an asset can reside in all 4 or some but not others of the repositories, then what is the number of different situations? This is an application of combinations, not permutations, as we are not concerned with the order of choosing or arranging. (Permutations and formulas addressing them are useful when the order of choosing is relevant.)
We certainly would want to know if an asset is in all 4 of the repositories. That would be 1 situation. The formula allowing us to count for that situation is:
(number of repositories)factorial
divided by
(
number of repositories
less (number of repositories 'wherein we expect the asset to be recorded')factorial
times
(number of repositories 'wherein we expect the asset to be recorded')factorial
)
= 4! / (4 - 4)!4! = 24 / 24 = 1. Please recall that 0! = 1.
Next, the situations where the asset would be in 3 of the 4 repositories but not in 1 of them. How many situations would that be? The formula allowing us to count for these situations is: 4! / (4 - 1)!1! = 24 / 6 = 4. Now we know we have at least 5 (1 + 4) different situations.
Next, the situations where the asset would be in 2 of the 4 repositories but not in the other 2. How many situations would that be? The formula allowing us to count for these situations is: 4! / (4 - 2)!2! = 24/ 4 = 6. Now we have at least 11 (1 + 4 + 6) different situations.
The final set of situations is where the asset is recorded within 1 of the repositories but not in the other 3. The formula for this number of situations is: 4! / (4 - 1)!1! = 24 / 6 = 4. Now we have 15 (1 + 4 + 6 + 4) different situations.
In summary, an asset can be represented in at least 1 of 4 repositories (tables) and may exist in only 1, or only 2, or only 3, or all 4. By applying combination arithmetic, we are assured of knowing the exact number of different combinations which we can design for.
We must ensure then, that our table and query design and execution account for 15 situations or combinations. (1 + 4 + 6 + 4)
I label the repositories as A, B, C, and D to describe the detail of the combinations.
List of Combinations
Combination Set 1
- Asset is recorded in A, B, C, and D (all 4 repositories).
Combination Set 2
- Asset is recorded in A, B, and C, but NOT D.
- Asset is recorded in A, B, and D, but NOT C.
- Asset is recorded in A, C, and D, but NOT B.
- Asset is recorded in B, C, and D, but NOT A.
Combination Set 3
- Asset is recorded in A and B, but NOT in C and D.
- Asset is recorded in A and D, but NOT in B and C.
- Asset is recorded in C and D, but NOT in A and B.
- Asset is recorded in A and C, but NOT in B and D.
- Asset is recorded in B and C, but NOT in A and D.
- Asset is recorded in B and D, but NOT in A and C.
Combination Set 4
- Asset is recorded in A, but NOT in B, C, or D.
- Asset is recorded in D, but NOT in A, B, or C.
- Asset is recorded in C, but NOT in A, B, or D.
- Asset is recorded in B, but NOT in A, C, or D.
I could have just listed all different combinations, anticipating I did not leave some out, did not list some twice or more with perhaps different wording, or got it just right. Performing the combination arithmetic, I am able to determine that I have 15 total unique combinations. Not more and not less. This is an excellent way to check my design needs and execution results.
SQL to Create and Populate Tables with Minimum Data.
Create and populate the repository tables ensuring the minimum of test data based upon the combination arithmetic. In practice, these repositories would most likely contain thousands or more rows. We will generate more than the minimum later.
if OBJECT_ID('tempdb..#AssetRepository_A', 'U') is not null
drop table #AssetRepository_A
if OBJECT_ID('tempdb..#AssetRepository_B', 'U') is not null
drop table #AssetRepository_B
if OBJECT_ID('tempdb..#AssetRepository_C', 'U') is not null
drop table #AssetRepository_C
if OBJECT_ID('tempdb..#AssetRepository_D', 'U') is not null
drop table #AssetRepository_D
if OBJECT_ID('tempdb..#4up', 'U') is not null
drop table #4up
create table #AssetRepository_A
(
AssetID varchar(30)
,constraint PK_AssetRepository_A primary key (AssetID)
)
create table #AssetRepository_B
(
AssetID varchar(30)
,constraint PK_AssetRepository_B primary key (AssetID)
)
create table #AssetRepository_C
(
AssetID varchar(30)
,constraint PK_AssetRepository_C primary key (AssetID)
)
create table #AssetRepository_D
(
AssetID varchar(30)
,constraint PK_AssetRepository_D primary key (AssetID)
)
create table #4up
(
[AssetRepository_A AssetID] varchar(30)
,[AssetRepository_B AssetID] varchar(30)
,[AssetRepository_C AssetID] varchar(30)
,[AssetRepository_D AssetID] varchar(30)
)
insert into #AssetRepository_A(AssetID) values('1Yabcd')
insert into #AssetRepository_B(AssetID) values('1Yabcd')
insert into #AssetRepository_C(AssetID) values('1Yabcd')
insert into #AssetRepository_D(AssetID) values('1Yabcd')
insert into #AssetRepository_A(AssetID) values('2YabcNd')
insert into #AssetRepository_B(AssetID) values('2YabcNd')
insert into #AssetRepository_C(AssetID) values('2YabcNd')
insert into #AssetRepository_A(AssetID) values('3YabdNc')
insert into #AssetRepository_B(AssetID) values('3YabdNc')
insert into #AssetRepository_D(AssetID) values('3YabdNc')
insert into #AssetRepository_A(AssetID) values('4YacdNb')
insert into #AssetRepository_C(AssetID) values('4YacdNb')
insert into #AssetRepository_D(AssetID) values('4YacdNb')
insert into #AssetRepository_B(AssetID) values('5YbcdNa')
insert into #AssetRepository_C(AssetID) values('5YbcdNa')
insert into #AssetRepository_D(AssetID) values('5YbcdNa')
insert into #AssetRepository_A(AssetID) values('6YabNcd')
insert into #AssetRepository_B(AssetID) values('6YabNcd')
insert into #AssetRepository_A(AssetID) values('7YadNbc')
insert into #AssetRepository_D(AssetID) values('7YadNbc')
insert into #AssetRepository_C(AssetID) values('8YcdNab')
insert into #AssetRepository_D(AssetID) values('8YcdNab')
insert into #AssetRepository_A(AssetID) values('9YacNbd')
insert into #AssetRepository_C(AssetID) values('9Yacnbd')
insert into #AssetRepository_B(AssetID) values('10YbcNad')
insert into #AssetRepository_C(AssetID) values('10YbcNad')
insert into #AssetRepository_B(AssetID) values('11YbdNac')
insert into #AssetRepository_D(AssetID) values('11YbdNac')
insert into #AssetRepository_A(AssetID) values('12YaNbcd')
insert into #AssetRepository_D(AssetID) values('13YdNabc')
insert into #AssetRepository_C(AssetID) values('14YcNabd')
insert into #AssetRepository_B(AssetID) values('15YbNacd')
SQL to Populate the 4up Table
Populate the 4up table using left [outer] joins. We can benefit from the combination arithmetic and detailed list of all possible combinations to ensure the appropriate table as the left most table for any join. Very important: Inclusion of where
clause for null
. Specifically, use A as the left most table producing accurate rows for 8 of the 15 combinations; B for 4 of the 15; C for 2 of the 15; and D for 1 of the 15, covering all 15 combinations.
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_A A
left join #AssetRepository_B B on A.AssetID = B.AssetID
left join #AssetRepository_C C on A.AssetID = C.AssetID
left join #AssetRepository_D D on A.AssetID = D.AssetID
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_B B
left join #AssetRepository_A A on B.AssetID = A.AssetID
left join #AssetRepository_C C on B.AssetID = C.AssetID
left join #AssetRepository_D D on B.AssetID = D.AssetID
where A.AssetID is null
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_C C
left join #AssetRepository_A A on C.AssetID = A.AssetID
left join #AssetRepository_B B on C.AssetID = B.AssetID
left join #AssetRepository_D D on C.AssetID = D.AssetID
where A.AssetID is null
and
B.AssetID is null
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_D D
left join #AssetRepository_A A on D.AssetID = A.AssetID
left join #AssetRepository_B B on D.AssetID = B.AssetID
left join #AssetRepository_C C on D.AssetID = C.AssetID
where A.AssetID is null
and
B.AssetID is null
and
C.AssetID is null
SQL to Display the 4up Table
The test data I generated is specifically meant to clearly show all the possible combinations. This SQL delineates this data as ‘Combination Set 1’..’Combination Set 4’.
select * from #4up
select 'Combination Set 1' 'Set Name',* from #4up
where [AssetRepository_A AssetID] is not null
and
[AssetRepository_B AssetID] is not null
and
[AssetRepository_C AssetID] is not null
and
[AssetRepository_D AssetID] is not null
select 'Combination Set 2' 'Set Name', * from #4up
where ([AssetRepository_A AssetID] = [AssetRepository_B AssetID]
and
[AssetRepository_A AssetID] = [AssetRepository_C AssetID]
and
[AssetRepository_D AssetID] is null)
or
([AssetRepository_A AssetID] = [AssetRepository_B AssetID]
and
[AssetRepository_A AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_C AssetID] is null)
or
([AssetRepository_A AssetID] = [AssetRepository_C AssetID]
and
[AssetRepository_A AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_B AssetID] is null)
or
([AssetRepository_B AssetID] = [AssetRepository_C AssetID]
and
[AssetRepository_B AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_A AssetID] is null)
select 'Combination Set 3' 'Set Name', * from #4up
where ([AssetRepository_A AssetID] = [AssetRepository_B AssetID]
and
[AssetRepository_C AssetID] is null
and
[AssetRepository_D AssetID] is null)
or
([AssetRepository_A AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_B AssetID] is null
and
[AssetRepository_C AssetID] is null)
or
([AssetRepository_C AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_B AssetID] is null)
or
([AssetRepository_A AssetID] = [AssetRepository_C AssetID]
and
[AssetRepository_B AssetID] is null
and
[AssetRepository_D AssetID] is null)
or
([AssetRepository_B AssetID] = [AssetRepository_C AssetID]
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_D AssetID] is null)
or
([AssetRepository_B AssetID] = [AssetRepository_D AssetID]
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_C AssetID] is null)
select 'Combination Set 4' 'Set Name', * from #4up
where ([AssetRepository_A AssetID] is not null
and
[AssetRepository_B AssetID] is null
and
[AssetRepository_C AssetID] is null
and
[AssetRepository_D AssetID] is null)
or
([AssetRepository_D AssetID] is not null
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_B AssetID] is null
and
[assetRepository_C AssetID] is null)
or
([AssetRepository_C AssetID] is not null
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_B AssetID] is null
and
[assetRepository_D AssetID] is null)
or
([AssetRepository_B AssetID] is not null
and
[AssetRepository_A AssetID] is null
and
[AssetRepository_C AssetID] is null
and
[assetRepository_D AssetID] is null)
SQL to Generate and Display 5,000 Random Asset Rows
Please note that randomly generated AssetIDs range 111 111 111 .. 999 999 999. My multiple tests have shown that for 5,000 rows, you will generally receive 5,000 rows, For a greater value of rows, you may receive less due to the explicit SQL code NOT allowing duplicate AssetIDs.
The construct for generating values within the aforementioned range, including the end points, is cast((@omega - @alpha + 1) * RAND() + @alpha as integer)
, where @alpha = 111111111
, @omega = 999999999
. This can be used in a generalized way with @alpha
specifying the lowest value of the range and @omega
specifying the highest value of the range.
Please note that the SQL to populate the 4up table is the same as before for the minimal test data.
truncate table #AssetRepository_A
truncate table #AssetRepository_B
truncate table #AssetRepository_C
truncate table #AssetRepository_D
truncate table #4up
declare @lcv int = 5000
declare @v int
declare @AssetSerialNumber int
declare @alpha int = 111111111
declare @omega int = 999999999
while (@lcv > 0)
begin
set @AssetSerialNumber = cast((@omega - @alpha + 1) * RAND() + @alpha as integer)
set @v = cast(15 * RAND() + 1 as integer)
if @v = 1
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 2
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
end
else if @v = 3
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 4
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 5
begin
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 6
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
end
else if @v = 7
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 8
begin
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 9
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
end
else if @v = 10
begin
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
end
else if @v = 11
begin
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 12
begin
if not exists (select * from #AssetRepository_A where AssetID = @AssetSerialNumber)
insert into #AssetRepository_A(AssetID) values(@AssetSerialNumber)
end
else if @v = 13
begin
if not exists (select * from #AssetRepository_D where AssetID = @AssetSerialNumber)
insert into #AssetRepository_D(AssetID) values(@AssetSerialNumber)
end
else if @v = 14
begin
if not exists (select * from #AssetRepository_C where AssetID = @AssetSerialNumber)
insert into #AssetRepository_C(AssetID) values(@AssetSerialNumber)
end
else if @v = 15
begin
if not exists (select * from #AssetRepository_B where AssetID = @AssetSerialNumber)
insert into #AssetRepository_B(AssetID) values(@AssetSerialNumber)
end
set @lcv = @lcv - 1
end
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_A A
left join #AssetRepository_B B on A.AssetID = B.AssetID
left join #AssetRepository_C C on A.AssetID = C.AssetID
left join #AssetRepository_D D on A.AssetID = D.AssetID
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_B B
left join #AssetRepository_A A on B.AssetID = A.AssetID
left join #AssetRepository_C C on B.AssetID = C.AssetID
left join #AssetRepository_D D on B.AssetID = D.AssetID
where A.AssetID is null
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_C C
left join #AssetRepository_A A on C.AssetID = A.AssetID
left join #AssetRepository_B B on C.AssetID = B.AssetID
left join #AssetRepository_D D on C.AssetID = D.AssetID
where A.AssetID is null
and
B.AssetID is null
insert into #4up
select A.AssetID
,B.AssetID
,C.AssetID
,D.AssetID
from
#AssetRepository_D D
left join #AssetRepository_A A on D.AssetID = A.AssetID
left join #AssetRepository_B B on D.AssetID = B.AssetID
left join #AssetRepository_C C on D.AssetID = C.AssetID
where A.AssetID is null
and
B.AssetID is null
and
C.AssetID is null
select * from #4up
Rudimentary Heat Map using Excel
The provided file, HeatMap.xlsx, shows a rudimentary ‘Heat Map’ for both minimal test data and for 5,000 randomly generated AssetIDs. Tools, including Microsoft Excel, Microsoft PowerBI, Tableau, MicroStrategy, SAS, SPSS, R, and others, could be used for more expansive analysis of such data.
The Excel worksheets apply conditional formatting to the 4up sets of data. Please note the ‘sorted’ tabs.
Points of Interest
This article recommends:
- Applying combination formula to ascertain the correct number of combinations to consider for design and execution
- Utilizing left [outer] joins in SQL to ensure correct combinations
- Use the SQL
RAND
function to generate random test data
History
- 11th September, 2018: Original article