Introduction
This article describes how to output data from SQL Server as a JSON string. It follows the path laid out by Phil Factor at https://www.simple-talk.com/sql/t-sql-programming/producing-json-documents-from-sql-server-queries-via-tsql but going for a different (and I believe simpler) approach.
Background
In SQL Server conversion from SQL to XML is straight forward, as is directly supported by SQL Server. However going to JSON is a different problem, as there is no direct JSON support, the only option left is to produce a NVARCHAR with the desired data.
Getting a simple (without nested object) JSON object from a XML node is quite easy, just throw in the Xml Data Type methods: local-name()
, text()
, do some type conversions and escaping, concatenate everything into a NVARCHAR
and you are done. However for nested objects it gets a little trickier, as the issue of when to stop concatenating becomes determinant.
When to stop concatenating depends on where you started concatenating. If you started on the root node then you should stop when you get into a leaf node. Alternatively if you started on the leaf nodes, then you must stop when you get to the root node.
Phil’s proposal follows this later direction, it parses the XML, builds a hierarchy table, identifies leaf nodes, converts them into JSON strings, then their parents, then the parents or their parents,… and keep on until you get to the root node. This is a generic solution using a bottom-up approach. In real world situations, probably the XML parsing can be omitted as the hierarchy table can be populated more efficiently by using SQL or XPath queries.
But what about a top-down approach? The starting node (ie: root) is initially known, the leaf nodes are unknown, but identifying them it is quite simple: they are the ones without any children..
Generating our XML
We know that we need some XML data, with some levels of nesting. As we are in SQL world, we might as well generate from some tables. Let’s create a sample database for a Weather service. Our service has data collecting stations. Each station collect monthly temperature and rain stats, each stats is reviewed by one or more operators. To make it more realistic, let's add some nulls but having some stats unreviewed.
SET NOCOUNT ON
IF OBJECT_ID('STATS') IS NOT NULL DROP TABLE STATS
IF OBJECT_ID('STATIONS') IS NOT NULL DROP TABLE STATIONS
IF OBJECT_ID('OPERATORS') IS NOT NULL DROP TABLE OPERATORS
IF OBJECT_ID('REVIEWS') IS NOT NULL DROP TABLE REVIEWS
CREATE TABLE STATIONS(ID INTEGER PRIMARY KEY, CITY NVARCHAR(20), STATE CHAR(2), LAT_N REAL, LONG_W REAL);
INSERT INTO STATIONS VALUES (13, 'Phoenix', 'AZ', 33, 112);
INSERT INTO STATIONS VALUES (44, 'Denver', 'CO', 40, 105);
INSERT INTO STATIONS VALUES (66, 'Caribou', 'ME', 47, 68);
CREATE TABLE OPERATORS(ID INTEGER PRIMARY KEY, NAME NVARCHAR(20), SURNAME NVARCHAR(20));
INSERT INTO OPERATORS VALUES (50, 'John "The Fox"', 'Brown');
INSERT INTO OPERATORS VALUES (51, 'Paul', 'Smith');
INSERT INTO OPERATORS VALUES (52, 'Michael', 'Williams');
CREATE TABLE STATS (
STATION_ID INTEGER REFERENCES STATIONS(ID),
MONTH INTEGER CHECK (MONTH BETWEEN 1 AND 12),
TEMP_F REAL CHECK (TEMP_F BETWEEN -80 AND 150),
RAIN_I REAL CHECK (RAIN_I BETWEEN 0 AND 100), PRIMARY KEY (STATION_ID, MONTH));
INSERT INTO STATS VALUES (13, 1, 57.4, 0.31);
INSERT INTO STATS VALUES (13, 7, 91.7, 5.15);
INSERT INTO STATS VALUES (44, 1, 27.3, 0.18);
INSERT INTO STATS VALUES (44, 7, 74.8, 2.11);
INSERT INTO STATS VALUES (66, 1, 6.7, 2.10);
INSERT INTO STATS VALUES (66, 7, 65.8, 4.52);
CREATE TABLE REVIEWS(STATION_ID INTEGER,STAT_MONTH INTEGER,OPERATOR_ID INTEGER)
insert into REVIEWS VALUES (13,1,50)
insert into REVIEWS VALUES (13,7,50)
insert into REVIEWS VALUES (44,7,51)
insert into REVIEWS VALUES (44,7,52)
insert into REVIEWS VALUES (44,7,50)
insert into REVIEWS VALUES (66,1,51)
insert into REVIEWS VALUES (66,7,51)
Let’s get acquainted with our data:
select STATIONS.ID as ID,
STATIONS.CITY as City,
STATIONS.STATE as State,
STATIONS.LAT_N as LatN,
STATIONS.LONG_W as LongW,
STATS.MONTH as Month,
STATS.RAIN_I as Rain,
STATS.TEMP_F as Temp,
OPERATORS.NAME as Name,
OPERATORS.SURNAME as Surname
from stations
inner join stats on stats.STATION_ID=STATIONS.ID
left join reviews on reviews.STATION_ID=stations.id
and reviews.STAT_MONTH=STATS.[MONTH]
left join OPERATORS on OPERATORS.ID=reviews.OPERATOR_ID
ID City State LatN LongW Month Rain Temp Name Surname
13 Phoenix AZ 33 112 1 0.31 57.4 John "The Fox" Brown
13 Phoenix AZ 33 112 7 5.15 91.7 John "The Fox" Brown
44 Denver CO 40 105 1 0.18 27.3 {null} {null}
44 Denver CO 40 105 7 2.11 74.8 Paul Smith
44 Denver CO 40 105 7 2.11 74.8 Michael Williams
44 Denver CO 40 105 7 2.11 74.8 John "The Fox" Brown
66 Caribou ME 47 68 1 2.1 6.7 Paul Smith
66 Caribou ME 47 68 7 4.52 65.8 Paul Smith
select stations.*,
(select stats.*,
(select OPERATORS.*
from OPERATORS
inner join reviews on OPERATORS.ID=reviews.OPERATOR_ID
where reviews.STATION_ID=STATS.STATION_ID
and reviews.STAT_MONTH=STATS.MONTH
for xml path('operator'),type
) operators
from STATS
where STATS.STATION_ID=stations.ID
for xml path('stat'),type
) stats
from stations
for xml path('station'),type
<station>
<ID>13</ID>
<CITY>Phoenix</CITY>
<STATE>AZ</STATE>
<LAT_N>3.3000000e+001</LAT_N>
<LONG_W>1.1200000e+002</LONG_W>
<stats>
<stat>
<STATION_ID>13</STATION_ID>
<MONTH>1</MONTH>
<TEMP_F>5.7400002e+001</TEMP_F>
<RAIN_I>3.1000000e-001</RAIN_I>
<operators>
<operator>
<ID>50</ID>
<NAME>John "The Fox"</NAME>
<SURNAME>Brown</SURNAME>
</operator>
</operators>
</stat>
<stat>
<STATION_ID>13</STATION_ID>
<MONTH>7</MONTH>
<TEMP_F>9.1699997e+001</TEMP_F>
...
...
</station>
Converting to JSON
For converting a XML node to JSON we will use the Xml Data Type methods:
- To get the nodes of the XML, we will select the result of
.nodes(Xpath_Expression
), this function returns each node that matches the given Xpath_Expression as a row with a single column of type XML. These XML-rows will be "selected" and, after extracting their JSON-name and JSON-values, will make up the properties for every JSON entity.
- To get from a XML-row a JSON property name, we will get the local name of the node as a
NVARCHAR
using .value('local-name(.)', 'NVARCHAR(255)')
.
- To get from a XML-row a JSON value, similarly we will use
.value('text()[1]','NVARCHAR(MAX)')
this function returns the XML text as a NVARCHAR., which will be later formated.
And here come the new bits:
- We can test if a node is a leaf node by testing if a node has ZERO children
.value('count(*)','int')=0
. If a node has no children then its contents is a simple JSON value, which can be either a string or a number (as the null value is not present in the XML). In terms of JSON presentation, the only difference is quote enclosing and character escaping. This presentation is handled by dbo.qfn_JsonEscape
.
- If a XML node has one or more children, then we get its children as a XML subtree by executing
.query('*')
, and then we can use the XML node to JSON function again.
That is, we are introducing recursion to follow the XML hierarchy. Each call to the function will, either:
- Handle a childless XML subtree, which by simply using the Xml Data Type methods, and simple string concatenation can be converted into a JSON string. The procedure to do this is quite straight forward, just concatenate:
- The
{,}
symbol - The
{"}
symbol - The node
local-name()
or empty string if null (notice the coalesce function) - The
{"}
symbol again - The
{;}
symbol - The result of applying
dbo.qfn_JsonEscape
to the node value
- Or handle a XML subtree, which will concatenate the array chars (
[ ]
) and call the function using the subtree as the new argument.
Putting everything together:
CREATE FUNCTION [dbo].[qfn_XmlToJson](@XmlData xml)
RETURNS nvarchar(max)
AS
BEGIN
declare @m nvarchar(max)
SELECT @m='['+Stuff
(
(SELECT theline from
(SELECT ','+' {'+Stuff
(
(SELECT ',"'+coalesce(b.c.value('local-name(.)', 'NVARCHAR(255)'),'')+'":'+
case when b.c.value('count(*)','int')=0
then dbo.[qfn_JsonEscape](b.c.value('text()[1]','NVARCHAR(MAX)'))
else dbo.qfn_XmlToJson(b.c.query('*'))
end
from x.a.nodes('*') b(c)
for xml path(''),TYPE).value('(./text())[1]','NVARCHAR(MAX)')
,1,1,'')+'}'
from @XmlData.nodes('/*') x(a)
) JSON(theLine)
for xml path(''),TYPE).value('.','NVARCHAR(MAX)')
,1,1,'')+']'
return @m
END
The final touch is once a whole set has been concatenated and processed the SQL stuff
function deletes the first comma (,)
symbol in an array.
The helper function dbo.[qfn_JsonEscape] just:
-
Treats any non-numeric value as a string, surrounding it with double quotes, and JSON escaping control chars.
-
Returns any numeric value received (without quoting)
CREATE FUNCTION [dbo].[qfn_JsonEscape](@value nvarchar(max) )
returns nvarchar(max)
as begin
if (@value is null) return 'null'
if (TRY_PARSE( @value as float) is not null) return @value
set @value=replace(@value,'\','\\')
set @value=replace(@value,'"','\"')
return '"'+@value+'"'
end
Testing
We can test the result by executing:
select dbo.qfn_XmlToJson
(
(
select stations.ID,stations.CITY,stations.STATE,stations.LAT_N,stations.LONG_W ,
(select stats.*,
(select OPERATORS.*
from OPERATORS inner join reviews
on OPERATORS.ID=reviews.OPERATOR_ID
where reviews.STATION_ID=STATS.STATION_ID
and reviews.STAT_MONTH=STATS.MONTH
for xml path('operator'),type
) operators
from STATS
where STATS.STATION_ID=stations.ID for xml path('stat'),type
) stats
from stations for xml path('stations'),type
)
)
Which returns:
[ {"ID":13,"CITY":"Phoenix","STATE":"AZ","LAT_N":3.3000000e+001,"LONG_W":1.1200000e+002,"stats":[ {"STATION_ID":13,"MONTH":1,"TEMP_F":5.7400002e+001,"RAIN_I":3.1000000e-001,"operators":[ {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}, {"STATION_ID":13,"MONTH":7,"TEMP_F":9.1699997e+001,"RAIN_I":5.1500001e+000,"operators":[ {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}]}, {"ID":44,"CITY":"Denver","STATE":"CO","LAT_N":4.0000000e+001,"LONG_W":1.0500000e+002,"stats":[ {"STATION_ID":44,"MONTH":1,"TEMP_F":2.7299999e+001,"RAIN_I":1.8000001e-001}, {"STATION_ID":44,"MONTH":7,"TEMP_F":7.4800003e+001,"RAIN_I":2.1099999e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}, {"ID":52,"NAME":"Michael","SURNAME":"Williams"}, {"ID":50,"NAME":"John \"The Fox\"","SURNAME":"Brown"}]}]}, {"ID":66,"CITY":"Caribou","STATE":"ME","LAT_N":4.7000000e+001,"LONG_W":6.8000000e+001,"stats":[ {"STATION_ID":66,"MONTH":1,"TEMP_F":6.6999998e+000,"RAIN_I":2.0999999e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}]}, {"STATION_ID":66,"MONTH":7,"TEMP_F":6.5800003e+001,"RAIN_I":4.5200000e+000,"operators":[ {"ID":51,"NAME":"Paul","SURNAME":"Smith"}]}]}]
Which is the expected result (notice the John “The Fox” quote escaping). You can test it at http://sqlfiddle.com/#!6/77909/1
Conclusion
Using recursion in XML and JSON comes "naturally", although usually this is not the case in TSQL, it allows for straight and elegant solutions. As in any recursive-based solution there is a price to pay in terms of performance, so I do not advise using this technique for queries returning more than a thousand of records.