Introduction
In some ways, this is what SQL should excel at doing, but for some reason it just isn’t supported. There’s always things like Full Text Search, but the overhead of maintaining a full text catalogue, and learning how to use the technology is just a bit too much sometimes and isn’t worth it for what you’re trying to achieve.
Imagine you are running an eCommerce website which sells guitar amplifiers and you need a ‘similar amplifiers’ feature that will suggest amplifiers similar to the amp the user is currently looking at based on a few key features of the amp.
Soundex
Just as a note, I’ve used the DIFFERENCE function in the snippets below. This is just a small library built into SQL Server that allows you to compare how words sound, for example “beech” and “beach” will have a match index of 4 (Best match), while “Road” and “Germany” will return 0 (Worst match). It’s worth looking up.
Simple
If you don’t care about having any knowledge of how ‘good’ of a match the suggestion is, then it’s quite easy. Be careful though, this will be unreliable and spammy but is good for a quick prototype.
INSERT INTO #tmpSelectedAmplifier
SELECT ModelName
, ManufacturerName
, Colour
, AmplifierType
, Wattage
, HasOnboardEffects
FROM Amplifier
WHERE AmplifierId = @ampId
SELECT *
FROM Amplifier amps
INNER JOIN #tmpSelectedAmplifier sAmp
ON (
DIFFERENCE(sAmp.ModelName, amps.ModelName) >= 3
OR sAmp.Colour = amps.Colour
OR sAmp.Wattage = amps.Wattage
OR sAmp.AmplifierType = amps.AmplifierType
OR sAmp.HasOnboardEffects = amps.HasOnboardEffects
)
This query is pretty nice and simple, it will match on any of the conditions specified in the INNER JOIN
. More often than not, this really isn’t the way you’d want to do this.
Note that I used a temp table and an inner join
instead of using a WHERE
clause. The only reason to do this is because it’s cleaner (At least in my opinion) because to use a WHERE
you’d have to start using variables for each field in the source record (in this case, the amplifier you’re viewing) which can just get messy and cumbersome pretty quickly.
Getting the Number of Matched Fields
The more common case is that you’ll want to have at least two or more fields to match, and will want to order by the amount of fields that do match.
This is still nice and simple, but it does definitely bloat the SQL up a bit. The general idea is just to create a temp table to hold an amplifier ID. This temp table will be inserted to with (for example) all the amps with the colour we are looking for, and then all the amps with the wattage we need and so on.. And then, once we count how many instances of each AmplifierId
appear in the temp table, we will have a rough idea of how good of a match each record is to the one the user is currently viewing.
DECLARE @ampMatches TABLE (AmplifierId UNIQUEIDENTIFIER)
INSERT INTO @ampMatches
SELECT AmplifierId
FROM Amplifier
WHERE DIFFERENCE(sAmp.ModelName, amps.ModelName) >= 3
INSERT INTO @ampMatches
SELECT AmplifierId
FROM Amplifier
WHERE sAmp.Colour = amps.Colour
INSERT INTO @ampMatches
SELECT AmplifierId
FROM Amplifier
WHERE sAmp.Wattage = amps.Wattage
INSERT INTO @ampMatches
SELECT AmplifierId
FROM Amplifier
WHERE sAmp.AmplifierType = amps.AmplifierType
INSERT INTO @ampMatches
SELECT AmplifierId
FROM Amplifier
WHERE sAmp.HasOnboardEffects = amps.HasOnboardEffects
;WITH ampsMatchCountCTE AS
(
SELECT AmplifierId
, COUNT(AmplifierId) AS MatchCount
FROM @ampMatches
GROUP BY AmplifierId
)
SELECT *
FROM Amplifier
INNER JOIN ampsMatchCountCTE ampsMatch
ON ampsMatch.AmplifierId = Amplifier.AmplifierId
WHERE ampsMatch.MatchCount >= 2
ORDER BY ampsMatch.MatchCount DESC
Just as a note, this query can be written using HAVING instead of the CTE, but you’d end up having to list all your non-aggregated fields in a GROUP BY
, which would be a bit of a nightmare..
Order of Precedence of Match Conditions
Just as a quick extension to the previous example, we can add weightings to certain columns so that we provide some extra criteria to either filter or order based on.
DECLARE @ampMatches TABLE (AmplifierId UNIQUEIDENTIFIER
, ColumnWeight TINYINT)
INSERT INTO @ampMatches
SELECT AmplifierId
, 10
FROM Amplifier
WHERE DIFFERENCE(sAmp.ModelName, amps.ModelName) >= 3
INSERT INTO @ampMatches
SELECT AmplifierId
, 5
FROM Amplifier
WHERE sAmp.Colour = amps.Colour
;WITH ampsMatchCountCTE AS
(
SELECT Amplifier
, COUNT(AmplifierId) AS MatchCount
, SUM(ColumnWeight) AS RowWeighting
FROM @ampMatches
GROUP BY AmplifierId
)
SELECT *
FROM Amplifier
INNER JOIN ampsMatchCountCTE ampsMatch
ON ampsMatch.AmplifierId = Amplifier.AmplifierId
WHERE ampsMatch.MatchCount >= 2
ORDER BY ampsMatch.RowWeighting DESC
Of course, in this example, the weightings are used purely for ordering results, and not for filtering the results set.
Links
CodeProject