In this article, I’ll show you how to use STRING_SPLIT() to join two tables.
In this article, I’ll show you how to use STRING_SPLIT()
to join two tables. I call this the impossible join.
I’ve come across this situation several times and always when dealing with data imported from two different sources. In most cases, it has to do with users inputting data inconsistently.
Let me show you.
Watch the Video
Don’t forget to check out the video. I go over the examples and provide more background for you to better understand it all!
The Impossible Join
In this example, I have two tables, Project
and ProjectStatus
. I would like to join ProjectStatus
to Project
to display the Name
, ProjectStatus
, and AccountNumber
in the same result.
If you look at ProjectStatus
, ProjectName
is a coder’s dream. The codes look great. The Project
table, on the other hand, is a hot mess. The codes are buried within the column value.
There is no way to do a direct join from one to the other. In fact, only Project 9108
successfully joins.
The Impossible Join Approach
The way we make the impossible join work is by splitting the ProjectName
into pieces. This way, we don’t have to join on “F1432 plastic
” but can look to join on the parts “F1432
” or “plastic
.”
As you can see from the example, “F1432
” is part of the good ProjectName
and will readily join between the two tables.
The main steps are:
- Split the hot mess column into pieces.
- Join into this separate piece to make a match.
- If a match is made, then join the two tables together.
So now that we have an idea how this is happening, let’s make it happen. To do so, we’ll first learn about STRING_SPLIT()
.
How STRING_SPLIT() Works
STRING_SPLIT()
is used to break up a column into separate parts. It is a special type of function which returns a table as its result. It is similar to a Table Valued Function.
You can learn more about STRING_SPLIT()
in this article I wrote, but the general idea is it takes the column you want to split, and the character, such as space that separates the values (aka words).
So STRING_SPLIT(‘F1432 plastic’, ‘ ‘)
returns a table with two rows:
In our example, let's look at what happens when you use string split in a query against Project
:
Here is the tokenized table with account numbers:
select value ProjectName, AccountNumber
from Project
cross apply string_split(ProjectName, ' ')
There are a couple of things to notice here. First, since STRING_SPLIT()
returns a table, we can use it in our query. You’ll see I’m using CROSS APPLY
to join the STRING_SPLIT()
result to the Project
. I do so I can return a result showing each ProjectName
value alongside the AccountNumber
.
We now use this result in a final join to the ProjectStatus
table to obtain our final result.
Using STRING_SPLIT() with INNER JOIN
At this point, that hard part is over. We have an intermediate result which includes the project “codes” as well as account numbers.
That last thing to do is relate this to the Project
table.
I pull this all together in the following example:
Select s.ProjectName, s.ProjectStatus, p.AccountNumber
from ProjectStatus s
left join (
select value ProjectName, AccountNumber
from Project cross apply string_split(ProjectName, ' ')
) p on s.ProjectName = p.ProjectName
Conclusion
IF you find yourself needed to join to a “code” embedded within another string, then you may find that STRING_SPLIT()
allows you to get to the “code” to make the join successful.