Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

How to do an Impossible Join with String Split

3.00/5 (3 votes)
4 Mar 2022CPOL3 min read 6.5K  
Use STRING_SPLIT() to join two tables
In this article, I’ll show you how to use STRING_SPLIT() to join two tables.

In this article, I’ll show you how to use STRING_SPLIT() to join two tables. I call this the impossible join.

I’ve come across this situation several times and always when dealing with data imported from two different sources. In most cases, it has to do with users inputting data inconsistently.

Let me show you.

Watch the Video

Don’t forget to check out the video. I go over the examples and provide more background for you to better understand it all!

The Impossible Join

In this example, I have two tables, Project and ProjectStatus. I would like to join ProjectStatus to Project to display the Name, ProjectStatus, and AccountNumber in the same result.

Image 1

If you look at ProjectStatus, ProjectName is a coder’s dream. The codes look great. The Project table, on the other hand, is a hot mess. The codes are buried within the column value.

There is no way to do a direct join from one to the other. In fact, only Project 9108 successfully joins.

Impossible Join Example

The Impossible Join Approach

The way we make the impossible join work is by splitting the ProjectName into pieces. This way, we don’t have to join on “F1432 plastic” but can look to join on the parts “F1432” or “plastic.”

As you can see from the example, “F1432” is part of the good ProjectName and will readily join between the two tables.

The main steps are:

  1. Split the hot mess column into pieces.
  2. Join into this separate piece to make a match.
  3. If a match is made, then join the two tables together.

So now that we have an idea how this is happening, let’s make it happen. To do so, we’ll first learn about STRING_SPLIT().

How STRING_SPLIT() Works

STRING_SPLIT() is used to break up a column into separate parts. It is a special type of function which returns a table as its result. It is similar to a Table Valued Function.

You can learn more about STRING_SPLIT() in this article I wrote, but the general idea is it takes the column you want to split, and the character, such as space that separates the values (aka words).

So STRING_SPLIT(‘F1432 plastic’, ‘ ‘) returns a table with two rows:

  • F1432
  • Plastic

In our example, let's look at what happens when you use string split in a query against Project:

Here is the tokenized table with account numbers:

SQL
select value ProjectName, AccountNumber
from Project
    cross apply string_split(ProjectName, ' ')

Image 3

There are a couple of things to notice here. First, since STRING_SPLIT() returns a table, we can use it in our query. You’ll see I’m using CROSS APPLY to join the STRING_SPLIT() result to the Project. I do so I can return a result showing each ProjectName value alongside the AccountNumber.

We now use this result in a final join to the ProjectStatus table to obtain our final result.

Using STRING_SPLIT() with INNER JOIN

At this point, that hard part is over. We have an intermediate result which includes the project “codes” as well as account numbers.

That last thing to do is relate this to the Project table.

I pull this all together in the following example:

SQL
Select s.ProjectName, s.ProjectStatus, p.AccountNumber
from ProjectStatus s
    left join (
        select value ProjectName, AccountNumber
        from Project cross apply string_split(ProjectName, ' ')
    ) p on s.ProjectName = p.ProjectName

Conclusion

IF you find yourself needed to join to a “code” embedded within another string, then you may find that STRING_SPLIT() allows you to get to the “code” to make the join successful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)