Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

A Transact Sql version of C strtok() using a table valued function

5.00/5 (2 votes)
31 Aug 2014CPOL4 min read 12.8K  
Publishes a Transact-SQL function for parsing a string containing a delimited list, similar to the ANSI C languaje strtok() function.

Introduction

I should confess: I'm a C/C++ object-oriented guy that came to SQL Server late in my career. You dudes who had followed similar path knows what it means. Having met the standard template library, programming stored procedures is like returning to the times of Basic before Visual Basic, or Pascal before Turbo Pascal... But you know, it is not your decision to change an organization's culture. Just adapt and contribute with the best you can offer from your experience.

One of the first things annoyed me from Sql Server stored procedures was the total lack of support for using collections or list as parameters. I could only remember Perl... SELECT statement in principle returns a list so why I can not input a list too? Not to mention why a stored procedure can not capture the list generated by a called stored procedure...? I know in recent versions of Sql Server stored procedures support table type as parameters. I have not tested this feature yet. But for people like me who are stuck at Sql Serveer 2005 this article can result useful. Even if you are using a post-2005 sql server version you can happen on code handling such a list, or may be you like the quick and dirty approach of the lousely-typedness of a comma delimited list.

So I came to the common workaround of using a Varchar parameter to pass a comma-separated string of the stringized items.  The problem was to parse the list. First time dealing with the parsing came fast to my mind the useful strtok() function, from the good old C times. After stumbling for the second time with the problem I decided to write my own sql version of strtok()

Some ANSI C Background

strtok() is a very useful C function from the core string library. You pass as parameters the target string and the expected separator and you get the substring up to the first appareance of the separator. It is the so called "token".

C++
s = strtok("one,two,three", ',')

will return "one". The best part comes from the fact that subsequent calls to the function return the following tokens "two", "three", and finally null to signal there is no more tokens. This make strtok() suitable for being used inside a loop

C++
s = strtok("one,two,three", ',');
while (s != null)
{
    s = strtok(null, ',');
    // do something with the token
}

Note just only in the first call you specify the input string. In the subsequent calls you pass null to indicate to use the same string and look for the next token... It sounds a little weird. Actually strtok() modifies the input string and stores internally the state of the search sequence just to know where to start the search for the next token. I dare not to say it is a bad programming practice (Oh man, it is standard ANSI C library!). Just we will change the function signature for our sql version. Anyway we can not use local static variables nor pointer trickiness in our Transact SQL code.  

Our Transact SQL version

Then our trumpeted function will look like this

SQL
CREATE FUNCTION strtok(
     @sStringToParse    varchar(max)
    ,@sSeparator        char(1)
    ,@nPosition         int Output
)
RETURNS varchar(max)
It returns a substring of @sStringToParse from @nPosition up to the first appareance of @sSeparator or the end of string if no separator is found. If @nPosition is at the end or after the end of @sStringToParse it will return null. @sStringToParse  will not be modified so we can use a table column as argument without the fear of it being altered by a strtok() side effect. The @nPosition will return the position after the last separator found. For the beginning you just provide 1 as argument to start scanning from the start of string (remember in Sql string start from positio 1). All this looks very good BUT... Transact Sql functions does not accepts output parameters.
 
Ok. We can turn our function into a stored procedure like 
SQL
CREATE PROCEDURE strtok(
     @sStringToParse    varchar(max)
    ,@sSeparator        char(1)
    ,@sToken            varchar(max) Output
    ,@nPosition         int Output
)

But functions in Transact Sql are far more flexible than stored procedures. They have the benefit that they can be used in expressions, even as part of queries or other data manipulation operations. And remember you dudes, we are C/C++ guys just transcoding a C function, we can not convert it into a procedure, it sounds like a sin.

So lets explore another solution. Our problem is the following. We need to find a way to return two values from a Transact SQL function. Then why not to use a table valued function? It can returns one row with our two needed values

SQL
CREATE Function strtok (
     @sStringToParse    varchar(max)
    ,@sSeparator        char(1)
    ,@nStartPosition    int
)
Returns @tbResult Table (
     Token              Varchar(max)
    ,NextStartPosition  int
) 
As
...

Then we can use it in a loop like

SQL
Declare @sList Varchar(max)
Declare @sToken Varchar(max)
Declare @nPos Int

Set @sList = 'one, two, three'
Set @nPos = 1
Set @sToken = ''
While @sToken Is Not Null
Begin
   Select @sToken = Token,
       @nPos = NextStartPosition
   From strtok(@sList, ',', @nPos)

   -- do something with @sToken
End

Note, it is vital to recover NextStartingPosition from the strtok call to be it reinjected as the parameter @nStartPosition in the next call.

We can also parse a two-dimensional structure

SQL
Declare @sList Varchar(max)
Declare @sRowToken Varchar(max), @sCellToken Varchar(max)
Declare @nPosX Int, @nPosY Int

Set @sList = 'one, two, three; four, five, six; seven, eight, nine'

-- outer cycle
Set @sRowToken = ''
Set @nPosY = 1
While @sRowToken Is Not Null
Begin
    Select @sRowToken = Token,
        @nPosY = NextStartPosition
    From strtok(@sList, ';', @nPosY)

    Select @sRowToken
    If @sRowToken Is Null
        Break;

    -- inner cycle
    Set @sCellToken = ''
    Set @nPosX = 1
    While @sCellToken Is Not Null
    Begin
        Select @sCellToken = Token,
            @nPosX = NextStartPosition
        From strtok(@sRowToken, ',', @nPosX)

        -- do something with @sRowToken
        Select @sCellToken
    End
End

So it looks like a workable idea. Lets provide the full source code for the Transact-Sql strtok function:

SQL
CREATE Function strtok (
     @sStringToParse    Varchar(max)
    ,@sSeparator        char(1)
    ,@nStartPosition    int
)
Returns @tbResult Table (
     Token             Varchar(max)
    ,NextStartPosition int
)
AS
Begin

    Declare    
         @nNextPos          int
        ,@nLenStringToParse int
        ,@sToken            varchar(max) 

    Select @nLenStringToParse = Len(@sStringToParse)

    -- Special case: input string is null
    If @sStringToParse Is Null
    Begin
        Select
             @nNextPos  = Null
            ,@sToken    = Null
    End

    -- Special case: separador is null. Return full input string
    Else If @sSeparator Is Null
    Begin
        Select
             @nNextPos  = @nLenStringToParse + 1
            ,@sToken    = Substring(@sStringToParse, @nStartPosition, @nLenStringToParse)
    End

    -- Special case: input string is empty. 
    Else If @nLenStringToParse = 0
    Begin
        If @nStartPosition = 1
            Select
                 @nNextPos  = 2
                ,@sToken    = @sStringToParse
        Else
            Select
                 @nNextPos  = @nStartPosition + 1
                ,@sToken    = Null
    End

    -- Special case: input string ends with separator
    Else If @nStartPosition = @nLenStringToParse + 1
        And Substring(@sStringToParse, @nLenStringToParse, 1) = @sSeparator
    Begin
        Select
             @nNextPos  = @nStartPosition + 1
            ,@sToken    = ''
    End

    -- In any other case if we are at the end of the input string, return null signalling end...
    Else If @nStartPosition > @nLenStringToParse
    Begin
        Select
             @nStartPosition = Null
            ,@sToken         = Null
    End

    -- Normal cases...
    Else
    Begin
        Select @nNextPos = CharIndex(@sSeparator, @sStringToParse, @nStartPosition)

        If @nNextPos Is Null
        Begin
            Select
                 @sToken    = Null
                ,@nNextPos  = Null
        End

        -- Separator not found: return the full string
        Else If @nNextPos = 0
        Begin
            If @nStartPosition = 0
                Select @sToken = @sStringToParse
            Else
                Select @sToken = SUBSTRING( @sStringToParse, @nStartPosition, @nLenStringToParse )

            Select @nNextPos = @nLenStringToParse + 1
        End
        Else 
        Begin
            Select @sToken = SUBSTRING( @sStringToParse, @nStartPosition, @nNextPos - @nStartPosition )
            Select @nNextPos = @nNextPos + 1
        End
        
        Set @nStartPosition = @nNextPos
    End

    Insert Into @tbResult (NextStartPosition, Token) 
    Values (@nNextPos, @sToken)
    Return

End

Enjoy!

Conclussions

We have sucessfully created a Transact-SQL function for parsing a string containing a delimited list.

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)