Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2012

Nested XML from SQL to JSON

4.81/5 (11 votes)
21 Oct 2014CPOL4 min read 55K  
Getting SQL Server XML nested data as a JSON string

Introduction

This article describes how to output data from SQL Server as a JSON string. It follows the path laid out by Phil Factor at https://www.simple-talk.com/sql/t-sql-programming/producing-json-documents-from-sql-server-queries-via-tsql but going for a different (and I believe simpler) approach.

Background

In SQL Server conversion from SQL to XML is straight forward, as is directly supported by SQL Server. However going to JSON is a different problem, as there is no direct JSON support, the only option left is to produce a NVARCHAR with the desired data.

Getting a simple (without nested object) JSON object from a XML node is quite easy, just throw in the Xml Data Type methods: local-name(), text(), do some type conversions and escaping, concatenate everything into a NVARCHAR and you are done. However for nested objects it gets a little trickier, as the issue of when to stop concatenating becomes determinant.

When to stop concatenating depends on where you started concatenating. If you started on the root node then you should stop when you get into a leaf node. Alternatively if you started on the leaf nodes, then you must stop when you get to the root node.

Phil’s proposal follows this later direction, it parses the XML, builds a hierarchy table, identifies leaf nodes, converts them into JSON strings, then their parents, then the parents or their parents,… and keep on until you get to the root node. This is a generic solution using a bottom-up approach. In real world situations, probably the XML parsing can be omitted as the hierarchy table can be populated more efficiently by using SQL or XPath queries.

But what about a top-down approach? The starting node (ie: root) is initially known, the leaf nodes are unknown, but identifying them it is quite simple: they are the ones without any children..

Generating our XML

We know that we need some XML data, with some levels of nesting. As we are in SQL world, we might as well generate from some tables. Let’s create a sample database for a Weather service. Our service has data collecting stations. Each station collect monthly temperature and rain stats, each stats is reviewed by one or more operators. To make it more realistic, let's add some nulls but having some stats unreviewed.

SQL
SET NOCOUNT ON

IF OBJECT_ID('STATS') IS NOT NULL DROP TABLE STATS
IF OBJECT_ID('STATIONS') IS NOT NULL DROP TABLE STATIONS
IF OBJECT_ID('OPERATORS') IS NOT NULL DROP TABLE OPERATORS
IF OBJECT_ID('REVIEWS') IS NOT NULL DROP TABLE REVIEWS

-- Create and populate table with Station
CREATE TABLE STATIONS(ID INTEGER PRIMARY KEY, CITY NVARCHAR(20), STATE CHAR(2), LAT_N REAL, LONG_W REAL);
INSERT INTO STATIONS VALUES (13, 'Phoenix', 'AZ', 33, 112);
INSERT INTO STATIONS VALUES (44, 'Denver', 'CO', 40, 105);
INSERT INTO STATIONS VALUES (66, 'Caribou', 'ME', 47, 68);

-- Create and populate table with Operators
CREATE TABLE OPERATORS(ID INTEGER PRIMARY KEY, NAME NVARCHAR(20), SURNAME NVARCHAR(20));
INSERT INTO  OPERATORS VALUES (50, 'John "The Fox"', 'Brown');
INSERT INTO  OPERATORS VALUES (51, 'Paul', 'Smith');
INSERT INTO  OPERATORS VALUES (52, 'Michael', 'Williams'); 

-- Create and populate table with normalized temperature and precipitation data
CREATE TABLE STATS (
        STATION_ID INTEGER REFERENCES STATIONS(ID),
        MONTH INTEGER CHECK (MONTH BETWEEN 1 AND 12),
        TEMP_F REAL CHECK (TEMP_F BETWEEN -80 AND 150),
        RAIN_I REAL CHECK (RAIN_I BETWEEN 0 AND 100), PRIMARY KEY (STATION_ID, MONTH));
INSERT INTO STATS VALUES (13,  1, 57.4, 0.31);
INSERT INTO STATS VALUES (13,  7, 91.7, 5.15);
INSERT INTO STATS VALUES (44,  1, 27.3, 0.18);
INSERT INTO STATS VALUES (44,  7, 74.8, 2.11);
INSERT INTO STATS VALUES (66,  1, 6.7, 2.10);
INSERT INTO STATS VALUES (66,  7, 65.8, 4.52);

-- Create and populate table with Review
CREATE TABLE REVIEWS(STATION_ID     INTEGER,STAT_MONTH  INTEGER,OPERATOR_ID INTEGER)  
insert into REVIEWS VALUES (13,1,50)
insert into REVIEWS VALUES (13,7,50)
insert into REVIEWS VALUES (44,7,51)
insert into REVIEWS VALUES (44,7,52)
insert into REVIEWS VALUES (44,7,50)
insert into REVIEWS VALUES (66,1,51)
insert into REVIEWS VALUES (66,7,51)

Let’s get acquainted with our data:

SQL
select     STATIONS.ID       as ID,
           STATIONS.CITY     as City,
           STATIONS.STATE    as State,
           STATIONS.LAT_N    as LatN,
           STATIONS.LONG_W   as LongW,
           STATS.MONTH       as Month,
           STATS.RAIN_I      as Rain,
           STATS.TEMP_F      as Temp,
       OPERATORS.NAME    as Name,
       OPERATORS.SURNAME as Surname
from       stations 
inner join stats     on stats.STATION_ID=STATIONS.ID 
left join  reviews   on reviews.STATION_ID=stations.id 
                     and reviews.STAT_MONTH=STATS.[MONTH]
left join  OPERATORS on OPERATORS.ID=reviews.OPERATOR_ID

ID  City     State LatN  LongW  Month Rain  Temp  Name            Surname
--------------------------------------------------------------------------
13  Phoenix  AZ    33    112    1     0.31  57.4  John "The Fox"  Brown
13  Phoenix  AZ    33    112    7     5.15  91.7  John "The Fox"  Brown
44  Denver   CO    40    105    1     0.18  27.3  {null}          {null}
44  Denver   CO    40    105    7     2.11  74.8  Paul            Smith
44  Denver   CO    40    105    7     2.11  74.8  Michael         Williams
44  Denver   CO    40    105    7     2.11  74.8  John "The Fox"  Brown
66  Caribou  ME    47    68     1     2.1   6.7   Paul            Smith
66  Caribou  ME    47    68     7     4.52  65.8  Paul            Smith


select stations.*,
       (select stats.*, 
               (select OPERATORS.*  
                from   OPERATORS 
                inner  join reviews on OPERATORS.ID=reviews.OPERATOR_ID 
                where  reviews.STATION_ID=STATS.STATION_ID 
                and    reviews.STAT_MONTH=STATS.MONTH 
                for xml path('operator'),type
               ) operators
        from  STATS 
        where STATS.STATION_ID=stations.ID 
        for xml path('stat'),type
       ) stats 
from   stations 
for    xml path('station'),type


<station>
  <ID>13</ID>
  <CITY>Phoenix</CITY>
  <STATE>AZ</STATE>
  <LAT_N>3.3000000e+001</LAT_N>
  <LONG_W>1.1200000e+002</LONG_W>
  <stats>
    <stat>
      <STATION_ID>13</STATION_ID>
      <MONTH>1</MONTH>
      <TEMP_F>5.7400002e+001</TEMP_F>
      <RAIN_I>3.1000000e-001</RAIN_I>
      <operators>
        <operator>
          <ID>50</ID>
          <NAME>John "The Fox"</NAME>
          <SURNAME>Brown</SURNAME>
        </operator>
      </operators>
    </stat>
    <stat>
      <STATION_ID>13</STATION_ID>
      <MONTH>7</MONTH>
      <TEMP_F>9.1699997e+001</TEMP_F>
         ...
         ...
</station>

Converting to JSON

For converting a XML node to JSON we will use the Xml Data Type methods:

  • To get the nodes of the XML, we will select the result of .nodes(Xpath_Expression), this function returns each node that matches the given Xpath_Expression as a row with a single column of type XML. These XML-rows will be "selected" and, after extracting their JSON-name and JSON-values, will make up the properties for every JSON entity.
  • To get from a XML-row a JSON property name, we will get the local name of the node as a NVARCHAR using .value('local-name(.)', 'NVARCHAR(255)').
  • To get from a XML-row a JSON value, similarly we will use .value('text()[1]','NVARCHAR(MAX)') this function returns the XML text as a NVARCHAR., which will be later formated.

And here come the new bits:

  • We can test if a node is a leaf node by testing if a node has ZERO children .value('count(*)','int')=0. If a node has no children then its contents is a simple JSON value, which can be either a string or a number (as the null value is not present in the XML). In terms of JSON presentation, the only difference is quote enclosing and character escaping. This presentation is handled by dbo.qfn_JsonEscape.
  • If a XML node has one or more children, then we get its children as a XML subtree by executing .query('*'), and then we can use the XML node to JSON function again.

That is, we are introducing recursion to follow the XML hierarchy. Each call to the function will, either:

  • Handle a childless XML subtree, which by simply using the Xml Data Type methods, and simple string concatenation can be converted into a JSON string. The procedure to do this is quite straight forward, just concatenate:
  1. The {,} symbol
  2. The {"} symbol
  3. The node local-name() or empty string if null (notice the coalesce function)
  4. The {"} symbol again
  5. The {;} symbol
  6. The result of applying dbo.qfn_JsonEscape to the node value
  • Or handle a XML subtree, which will concatenate the array chars ([ ]) and call the function using the subtree as the new argument.

Putting everything together:

SQL
CREATE FUNCTION [dbo].[qfn_XmlToJson](@XmlData xml)
RETURNS nvarchar(max)
AS
BEGIN
  declare @m nvarchar(max)
  SELECT @m='['+Stuff
  (
     (SELECT theline from
    (SELECT ','+' {'+Stuff
       (
              (SELECT ',"'+coalesce(b.c.value('local-name(.)', 'NVARCHAR(255)'),'')+'":'+
                      case when b.c.value('count(*)','int')=0 
                      then dbo.[qfn_JsonEscape](b.c.value('text()[1]','NVARCHAR(MAX)'))
                      else dbo.qfn_XmlToJson(b.c.query('*'))
                      end
                 from x.a.nodes('*') b(c)                                                                
                 for xml path(''),TYPE).value('(./text())[1]','NVARCHAR(MAX)')
               ,1,1,'')+'}'
          from @XmlData.nodes('/*') x(a)
       ) JSON(theLine)
       for xml path(''),TYPE).value('.','NVARCHAR(MAX)')
      ,1,1,'')+']'
   return @m
END

The final touch is once a whole set has been concatenated and processed the SQL stuff function deletes the first comma (,) symbol in an array.

The helper function dbo.[qfn_JsonEscape] just:

  1. Treats any non-numeric value as a string, surrounding it with double quotes, and JSON escaping control chars.

  2. Returns any numeric value received (without quoting)

SQL
CREATE FUNCTION [dbo].[qfn_JsonEscape](@value nvarchar(max) )
returns nvarchar(max)
as begin
 
 if (@value is null) return 'null'
 if (TRY_PARSE( @value as float) is not null) return @value

 set @value=replace(@value,'\','\\')
 set @value=replace(@value,'"','\"')

 return '"'+@value+'"'
end

Testing

We can test the result by executing:

SQL
select dbo.qfn_XmlToJson
(
  (
    select stations.ID,stations.CITY,stations.STATE,stations.LAT_N,stations.LONG_W ,
          (select stats.*, 
                   (select OPERATORS.*  
                    from   OPERATORS inner join reviews 
                    on     OPERATORS.ID=reviews.OPERATOR_ID
                    where  reviews.STATION_ID=STATS.STATION_ID 
                    and    reviews.STAT_MONTH=STATS.MONTH 
                    for xml path('operator'),type
                   ) operators
           from  STATS 
           where STATS.STATION_ID=stations.ID for xml path('stat'),type
          ) stats 
     from stations for xml path('stations'),type
   )
)

Which returns:

[ {"ID":13,"CITY":"Phoenix","STATE":"AZ","LAT_N":3.3000000e+001,"LONG_W":1.1200000e+002,"stats":[ {"STATION_ID":13,"MONTH":1,"TEMP_F":5.7400002e+001,"RAIN_I":3.1000000e-001,"operators":[ {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}, {"STATION_ID":13,"MONTH":7,"TEMP_F":9.1699997e+001,"RAIN_I":5.1500001e+000,"operators":[ {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}]}, {"ID":44,"CITY":"Denver","STATE":"CO","LAT_N":4.0000000e+001,"LONG_W":1.0500000e+002,"stats":[ {"STATION_ID":44,"MONTH":1,"TEMP_F":2.7299999e+001,"RAIN_I":1.8000001e-001}, {"STATION_ID":44,"MONTH":7,"TEMP_F":7.4800003e+001,"RAIN_I":2.1099999e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}, {"ID":52,"NAME":"Michael","SURNAME":"Williams"}, {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}]}, {"ID":66,"CITY":"Caribou","STATE":"ME","LAT_N":4.7000000e+001,"LONG_W":6.8000000e+001,"stats":[ {"STATION_ID":66,"MONTH":1,"TEMP_F":6.6999998e+000,"RAIN_I":2.0999999e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}]}, {"STATION_ID":66,"MONTH":7,"TEMP_F":6.5800003e+001,"RAIN_I":4.5200000e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}]}]}]

Which is the expected result (notice the John “The Fox” quote escaping). You can test it at http://sqlfiddle.com/#!6/77909/1

Conclusion

Using recursion in XML and JSON comes "naturally", although usually this is not the case in TSQL, it allows for straight and elegant solutions. As in any recursive-based solution there is a price to pay in terms of performance, so I do not advise using this technique for queries returning more than a thousand of records.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)