In SQL Server 2017/2019, Microsoft significantly improved the performance of the already fast JSON parser. We will learn more about it in this article.
Introduction
When Microsoft has been knocking everyone’s socks off with weird novelties for many years in a row from one extreme to the other, little by little, you get used to it and wait for those updates with a certain degree of skepticism. Over time, this feeling only grows stronger and subconsciously, you do not expect anything good to happen.
However, sometimes everything turns out exactly in the opposite way. Out of the blue, Microsoft releases some perfectly working functionality that ruins all the existing stereotypes. At first, you expect kind of the same rake and sharp attacks of vicarious embarrassment, but every minute you realize that it was a properly working JSON parser for SQL Server that you have been missing all these years.
Such a sumptuous introduction has certain grounds, since for a long time on Microsoft Connect, support for working with JSON on SQL Server was one of the most demanded features. Years passed and unexpectedly this functionality was implemented together with the release of SQL Server 2016. Looking ahead, I will say that it turned out very well, but Microsoft didn’t stop there and in SQL Server 2017/2019 significantly improved the performance of the already fast JSON parser.
Contents
JSON support on SQL Server is natively available for all editions. At the same time, Microsoft didn’t provide for a separate datatype, as is the case with XML. JSON data on SQL Server is stored as plain text: in Unicode (NVARCHAR
/NCHAR
) or ANSI (VARCHAR
/CHAR
) format.
DECLARE @JSON_ANSI VARCHAR(MAX) = '[{"Nąme":"Lenōvo モデ460"}]'
, @JSON_Unicode NVARCHAR(MAX) = N'[{"Nąme":"Lenōvo モデ460"}]'
SELECT DATALENGTH(@JSON_ANSI), @JSON_ANSI
UNION ALL
SELECT DATALENGTH(@JSON_Unicode), @JSON_Unicode
The main thing to remember is how much space this or that datatype takes (2 bytes per character, if we store data as Unicode, or 1 byte for ANSI strings). Also, don’t forget to put ‘N
’ literal before Unicode constants. Otherwise, you can run into a bunch of funny situations:
--- ----------------------------
25 [{"Name":"Lenovo ??460"}]
50 [{"Nąme":"Lenōvo モデ460"}]
Everything seems simple, but not at all. Further, we will see that the selected data type affects not only the size, but also the parsing speed.
In addition, Microsoft strongly recommends against using deprecated data types - NTEXT
/TEXT
. For those who, as habit, still use them, we will make a small investigative experiment:
DROP TABLE IF EXISTS #varchar
DROP TABLE IF EXISTS #nvarchar
DROP TABLE IF EXISTS #ntext
GO
CREATE TABLE #varchar (x VARCHAR(MAX))
CREATE TABLE #nvarchar (x NVARCHAR(MAX))
CREATE TABLE #ntext (x NTEXT)
GO
DECLARE @json NVARCHAR(MAX) =
N'[{"Manufacturer":"Lenovo","Model":"ThinkPad E460","Availability":1}]'
SET STATISTICS IO, TIME ON
INSERT INTO #varchar
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)
INSERT INTO #nvarchar
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)
INSERT INTO #ntext
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)
SET STATISTICS IO, TIME OFF
The insertion performance in the latter case will differ significantly:
varchar: CPU time = 32 ms, elapsed time = 28 ms
nvarchar: CPU time = 31 ms, elapsed time = 30 ms
ntext: CPU time = 172 ms, elapsed time = 190 ms
Also, remember that NTEXT
/TEXT
are always stored on LOB pages:
SELECT obj_name = OBJECT_NAME(p.[object_id])
, a.[type_desc]
, a.total_pages
, total_mb = a.total_pages * 8 / 1024.
FROM sys.allocation_units a
JOIN sys.partitions p ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (
OBJECT_ID('#nvarchar'),
OBJECT_ID('#ntext'),
OBJECT_ID('#varchar')
)
obj_name type_desc total_pages total_mb
------------- -------------- ------------ -----------
varchar IN_ROW_DATA 516 4.031250
varchar LOB_DATA 0 0.000000
nvarchar IN_ROW_DATA 932 7.281250
nvarchar LOB_DATA 0 0.000000
ntext IN_ROW_DATA 188 1.468750
ntext LOB_DATA 1668 13.031250
For your information, starting with SQL Server 2005, the rule "which pages to store data on" has been changed for variable-length types. In general, if the size exceeds 8060 bytes, then the data is placed on the LOB page, otherwise it is stored in IN_ROW
pages. It is clear that in this case, SQL Server optimizes the storage of data in pages.
And the last reason not to use NTEXT
/TEXT
is the fact that all JSON functions are simply not on friendly terms with deprecated data types:
SELECT TOP(1) 1
FROM #ntext
WHERE ISJSON(x) = 1
Msg 8116, Level 16, State 1, Line 63
Argument data type ntext is invalid for argument 1 of isjson function.
Now let's see how advantageous storing JSON as NVARCHAR
/VARCHAR
is compared to similar data presented as XML. In addition, we will try to store XML in a native format, and also represent it as a string
:
DECLARE @XML_Unicode NVARCHAR(MAX) = N'
<Manufacturer Name="Lenovo">
<Product Name="ThinkPad E460">
<Model Name="20ETS03100">
<CPU>i7-6500U</CPU>
<Memory>16</Memory>
<SSD>256</SSD>
</Model>
<Model Name="20ETS02W00">
<CPU>i5-6200U</CPU>
<Memory>8</Memory>
<HDD>1000</HDD>
</Model>
<Model Name="20ETS02V00">
<CPU>i5-6200U</CPU>
<Memory>4</Memory>
<HDD>500</HDD>
</Model>
</Product>
</Manufacturer>'
DECLARE @JSON_Unicode NVARCHAR(MAX) = N'
[
{
"Manufacturer": {
"Name": "Lenovo",
"Product": {
"Name": "ThinkPad E460",
"Model": [
{
"Name": "20ETS03100",
"CPU": "Intel Core i7-6500U",
"Memory": 16,
"SSD": "256"
},
{
"Name": "20ETS02W00",
"CPU": "Intel Core i5-6200U",
"Memory": 8,
"HDD": "1000"
},
{
"Name": "20ETS02V00",
"CPU": "Intel Core i5-6200U",
"Memory": 4,
"HDD": "500"
}
]
}
}
}
]'
DECLARE @XML_Unicode_D NVARCHAR(MAX) = N'<Manufacturer Name="Lenovo">
<Product Name="ThinkPad E460"><Model Name="20ETS03100"><CPU>i7-6500U</CPU>
<Memory>16</Memory><SSD>256</SSD></Model><Model Name="20ETS02W00">
<CPU>i5-6200U</CPU><Memory>8</Memory><HDD>1000</HDD></Model>
<Model Name="20ETS02V00"><CPU>i5-6200U</CPU><Memory>4</Memory>
<HDD>500</HDD></Model></Product></Manufacturer>'
, @JSON_Unicode_D NVARCHAR(MAX) = N'[{"Manufacturer":{"Name":"Lenovo","Product":
{"Name":"ThinkPad E460","Model":[{"Name":"20ETS03100","CPU":"Intel Core i7-6500U",
"Memory":16,"SSD":"256"},{"Name":"20ETS02W00","CPU":"Intel Core i5-6200U",
"Memory":8,"HDD":"1000"},{"Name":"20ETS02V00","CPU":"Intel Core i5-6200U",
"Memory":4,"HDD":"500"}]}}}]'
DECLARE @XML XML = @XML_Unicode
, @XML_ANSI VARCHAR(MAX) = @XML_Unicode
, @XML_D XML = @XML_Unicode_D
, @XML_ANSI_D VARCHAR(MAX) = @XML_Unicode_D
, @JSON_ANSI VARCHAR(MAX) = @JSON_Unicode
, @JSON_ANSI_D VARCHAR(MAX) = @JSON_Unicode_D
SELECT *
FROM (
VALUES ('XML Unicode', DATALENGTH(@XML_Unicode), DATALENGTH(@XML_Unicode_D))
, ('XML ANSI', DATALENGTH(@XML_ANSI), DATALENGTH(@XML_ANSI_D))
, ('XML', DATALENGTH(@XML), DATALENGTH(@XML_D))
, ('JSON Unicode', DATALENGTH(@JSON_Unicode), DATALENGTH(@JSON_Unicode_D))
, ('JSON ANSI', DATALENGTH(@JSON_ANSI), DATALENGTH(@JSON_ANSI_D))
) t(DataType, Delimeters, NoDelimeters)
When carried out, we get the following results:
DataType Delimeters NoDelimeters
------------ ----------- --------------
XML Unicode 914 674
XML ANSI 457 337
XML 398 398
JSON Unicode 1274 604
JSON ANSI 637 302
It may seem like the best option is native XML. This is partly true, however there are nuances. XML is always stored as Unicode. Furthermore, due to the fact that SQL Server uses a binary format for storing this data, everything is compressed into a kind of standardized dictionary with pointers. This is why formatting within XML doesn’t affect the final size of the data.
Strings are different, so I wouldn't recommend storing formatted JSON. The best option is to cut all extra characters when saving and format the data on demand already on the client side.
If you want to further reduce the size of JSON data, then we have several options at our disposal.
SQL Server 2016 has implemented new COMPRESS
/DECOMPRESS
functions that add support for GZIP compression:
SELECT *
FROM (
VALUES ('XML Unicode', DATALENGTH(COMPRESS(@XML_Unicode)),
DATALENGTH(COMPRESS(@XML_Unicode_D)))
, ('XML ANSI', DATALENGTH(COMPRESS(@XML_ANSI)),
DATALENGTH(COMPRESS(@XML_ANSI_D)))
, ('JSON Unicode', DATALENGTH(COMPRESS(@JSON_Unicode)),
DATALENGTH(COMPRESS(@JSON_Unicode_D)))
, ('JSON ANSI', DATALENGTH(COMPRESS(@JSON_ANSI)),
DATALENGTH(COMPRESS(@JSON_ANSI_D)))
) t(DataType, CompressDelimeters, CompressNoDelimeters)
Results for the previous example:
DataType CompressDelimeters CompressNoDelimeters
------------ -------------------- --------------------
XML Unicode 244 223
XML ANSI 198 180
JSON Unicode 272 224
JSON ANSI 221 183
Everything shrinks well, but you need to remember about one single feature. Suppose that initially the data came in ANSI, and then the type of the variable changed to Unicode:
DECLARE @t TABLE (val VARBINARY(MAX))
INSERT INTO @t
VALUES (COMPRESS('[{"Name":"ThinkPad E460"}]'))
, (COMPRESS(N'[{"Name":"ThinkPad E460"}]'))
SELECT val
, DECOMPRESS(val)
, CAST(DECOMPRESS(val) AS NVARCHAR(MAX))
, CAST(DECOMPRESS(val) AS VARCHAR(MAX))
FROM @t
The COMPRESS
function returns different binary sequences for ANSI/Unicode and on subsequent reading, we will face the situation that some of the data is stored as ANSI, and some - in Unicode. It is extremely difficult to guess later to which type to cast:
---------------------------- -------------------------------------------------------
筛丢浡≥∺桔湩偫摡䔠㘴∰嵽 [{"Name":"ThinkPad E460"}]
[{"Name":"ThinkPad E460"}] [ { " N a m e " : " T h i n k P a d E 4 6 0 " } ]
If we want to build a high-loaded system, then using the COMPRESS
function will slow down the inserting:
USE tempdb
GO
DROP TABLE IF EXISTS #Compress
DROP TABLE IF EXISTS #NoCompress
GO
CREATE TABLE #NoCompress (DatabaseLogID INT PRIMARY KEY, JSON_Val NVARCHAR(MAX))
CREATE TABLE #Compress (DatabaseLogID INT PRIMARY KEY, JSON_CompressVal VARBINARY(MAX))
GO
SET STATISTICS IO, TIME ON
INSERT INTO #NoCompress
SELECT DatabaseLogID
, JSON_Val = (
SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
)
FROM AdventureWorks2014.dbo.DatabaseLog
OPTION(MAXDOP 1)
INSERT INTO #Compress
SELECT DatabaseLogID
, JSON_CompressVal = COMPRESS((
SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
))
FROM AdventureWorks2014.dbo.DatabaseLog
OPTION(MAXDOP 1)
SET STATISTICS IO, TIME OFF
Moreover, it’s very important:
NoCompress: CPU time = 15 ms, elapsed time = 25 ms
Compress: CPU time = 218 ms, elapsed time = 280 ms
This will reduce the size of the table:
SELECT obj_name = OBJECT_NAME(p.[object_id])
, a.[type_desc]
, a.total_pages
, total_mb = a.total_pages * 8 / 1024.
FROM sys.partitions p
JOIN sys.allocation_units a ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (
OBJECT_ID('#Compress'),
OBJECT_ID('#NoCompress')
)
obj_name type_desc total_pages total_mb
-------------- ------------- ------------ ---------
NoCompress IN_ROW_DATA 204 1.593750
NoCompress LOB_DATA 26 0.203125
Compress IN_ROW_DATA 92 0.718750
Compress LOB_DATA 0 0.000000
In addition, reading from the compressed data table is then greatly slowed down by the DECOMPRESS
function:
SET STATISTICS IO, TIME ON
SELECT *
FROM #NoCompress
WHERE JSON_VALUE(JSON_Val, '$.Event') = 'CREATE_TABLE'
SELECT DatabaseLogID, [JSON] = CAST(DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX))
FROM #Compress
WHERE JSON_VALUE(CAST(DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event') =
N'CREATE_TABLE'
SET STATISTICS IO, TIME OFF
Logical reads will be reduced, but execution time will remain extremely slow:
Table 'NoCompress'. Scan count 1, logical reads 187, ...
CPU time = 16 ms, elapsed time = 37 ms
Table 'Compress'. Scan count 1, logical reads 79, ...
CPU time = 109 ms, elapsed time = 212 ms
Alternatively, you can add a PERSISTED
calculated column:
ALTER TABLE #Compress ADD EventType_Persisted
AS CAST(JSON_VALUE(CAST(
DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event')
AS VARCHAR(200)) PERSISTED
or create a calculated column and non-clustered index based on it:
ALTER TABLE #Compress ADD EventType_NonPersisted
AS CAST(JSON_VALUE(CAST(
DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event')
AS VARCHAR(200))
CREATE INDEX ix ON #Compress (EventType_NonPersisted)
Sometimes network latency has a much stronger impact on performance than the examples I gave above. Imagine that on the client we can compress the JSON GZIP data and send it to the server:
DECLARE @json NVARCHAR(MAX) = (
SELECT t.[name]
, t.[object_id]
, [columns] = (
SELECT c.column_id, c.[name], c.system_type_id
FROM sys.all_columns c
WHERE c.[object_id] = t.[object_id]
FOR JSON AUTO
)
FROM sys.all_objects t
FOR JSON AUTO
)
SELECT InitialSize = DATALENGTH(@json) / 1048576.
, CompressSize = DATALENGTH(COMPRESS(@json)) / 1048576.
For me, it became sort of a "life belt" when I tried to reduce network traffic on one of the projects:
InitialSize CompressSize
-------------- -------------
1.24907684 0.10125923
You can also use data compression to reduce the size of tables. Previously, compression was only available in the Enterprise edition. But with the release of SQL Server 2016 SP1, you can use this functionality even on Express edition:
USE AdventureWorks2014
GO
DROP TABLE IF EXISTS #InitialTable
DROP TABLE IF EXISTS #None
DROP TABLE IF EXISTS #Row
DROP TABLE IF EXISTS #Page
GO
CREATE TABLE #None (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
WITH (DATA_COMPRESSION = NONE))
CREATE TABLE #Row (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
WITH (DATA_COMPRESSION = ROW))
CREATE TABLE #Page (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
WITH (DATA_COMPRESSION = PAGE))
GO
SELECT h.SalesOrderID
, JSON_Data =
(
SELECT p.[Name]
FROM Sales.SalesOrderDetail d
JOIN Production.Product p ON d.ProductID = p.ProductID
WHERE d.SalesOrderID = h.SalesOrderID
FOR JSON AUTO
)
INTO #InitialTable
FROM Sales.SalesOrderHeader h
SET STATISTICS IO, TIME ON
INSERT INTO #None
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)
INSERT INTO #Row
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)
INSERT INTO #Page
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)
SET STATISTICS IO, TIME OFF
None: CPU time = 62 ms, elapsed time = 68 ms
Row: CPU time = 94 ms, elapsed time = 89 ms
Page: CPU time = 125 ms, elapsed time = 126 ms
PAGE compression uses algorithms that find similar chunks of data and replace them with smaller values. ROW compression truncates them to the lowest datatype, and also truncates excess characters. For example, our column is of type INT
, which takes 4 bytes, but values which are less than 255 are stored there. For such records, the type is truncated, and the data on the disk takes up space as if it were TINYINT
.
USE tempdb
GO
SELECT obj_name = OBJECT_NAME(p.[object_id])
, a.[type_desc]
, a.total_pages
, total_mb = a.total_pages * 8 / 1024.
FROM sys.partitions p
JOIN sys.allocation_units a ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (OBJECT_ID('#None'), OBJECT_ID('#Page'), OBJECT_ID('#Row'))
obj_name type_desc total_pages total_mb
---------- ------------- ------------ ---------
None IN_ROW_DATA 1156 9.031250
Row IN_ROW_DATA 1132 8.843750
Page IN_ROW_DATA 1004 7.843750
But what I like the most is the ColumnStore
indexes, which are constantly improving from version to version in SQL Server.
The main idea of the ColumnStore
is to split the data in the table into RowGroups
of about 1 million rows each and, within this group, compress the data by columns. This results in significant savings in disk space, reduced logical reads, and faster analytical queries. Therefore, if there is a need to store an archive with JSON information, then you can create a clustered ColumnStore
index:
USE AdventureWorks2014
GO
DROP TABLE IF EXISTS #CCI
DROP TABLE IF EXISTS #InitialTable
GO
CREATE TABLE #CCI (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED COLUMNSTORE)
GO
SELECT h.SalesOrderID
, JSON_Data = CAST(
(
SELECT p.[Name]
FROM Sales.SalesOrderDetail d
JOIN Production.Product p ON d.ProductID = p.ProductID
WHERE d.SalesOrderID = h.SalesOrderID
FOR JSON AUTO
)
AS VARCHAR(8000))
INTO #InitialTable
FROM Sales.SalesOrderHeader h
SET STATISTICS TIME ON
INSERT INTO #CCI
SELECT *
FROM #InitialTable
SET STATISTICS TIME OFF
In this case, the speed of insertion into the table will approximately correspond to the PAGE
compression. In addition, you can fine-tune the process for OLTP load using the COMPRESSION_DELAY
option.
CCI: CPU time = 140 ms, elapsed time = 136 ms
Before SQL Server 2017, ColumnStore
indexes didn’t support [N]VARCHAR(MAX)
data types, but with the release of the new version, we were allowed to store string
s of any length in the ColumnStore
.
USE tempdb
GO
SELECT o.[name]
, s.used_page_count / 128.
FROM sys.indexes i
JOIN sys.dm_db_partition_stats s ON i.[object_id] = s.[object_id] AND i.index_id = s.index_id
JOIN sys.objects o ON i.[object_id] = o.[object_id]
WHERE i.[object_id] = OBJECT_ID('#CCI')
The gains from this are sometimes very impressive in comparison with traditional compression:
------ ---------
CCI 0.796875
Now let's take a look at how you can generate JSON. If you have worked with XML in SQL Server before, everything is done here by analogy. The easiest way to generate JSON is to use FOR JSON AUTO
. In this case, a JSON array will be generated from objects:
DROP TABLE IF EXISTS #Users
GO
CREATE TABLE #Users (
UserID INT
, UserName SYSNAME
, RegDate DATETIME
)
INSERT INTO #Users
VALUES (1, 'Paul Denton', '20170123')
, (2, 'JC Denton', NULL)
, (3, 'Maggie Cho', NULL)
SELECT *
FROM #Users
FOR JSON AUTO
[
{
"UserID":1,
"UserName":"Paul Denton",
"RegDate":"2029-01-23T00:00:00"
},
{
"UserID":2,
"UserName":"JC Denton"
},
{
"UserID":3,
"UserName":"Maggie Cho"
}
]
It’s important to note that NULL
values are ignored.
If we want to include them in JSON, we can use the INCLUDE_NULL_VALUES
option:
SELECT UserID, RegDate
FROM #Users
FOR JSON AUTO, INCLUDE_NULL_VALUES
[
{
"UserID":1,
"RegDate":"2017-01-23T00:00:00"
},
{
"UserID":2,
"RegDate":null
},
{
"UserID":3,
"RegDate":null
}
]
If you need to get rid of square brackets, then the WITHOUT_ARRAY_WRAPPER
option will help us with this:
SELECT TOP(1) UserID, UserName
FROM #Users
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER
{
"UserID":1,
"UserName":"Paul Denton"
}
If we want to combine the results with the root element, then the ROOT
option is provided for this:
SELECT UserID, UserName
FROM #Users
FOR JSON AUTO, ROOT('Users')
{
"Users":[
{
"UserID":1,
"UserName":"Paul Denton"
},
{
"UserID":2,
"UserName":"JC Denton"
},
{
"UserID":3,
"UserName":"Maggie Cho"
}
]
}
If you need to create JSON with a more complex structure, assign the desired names to properties, group them, then you need to use the FOR JSON PATH
expression:
SELECT TOP(1) UserID
, UserName AS [Detail.FullName]
, RegDate AS [Detail.RegDate]
FROM #Users
FOR JSON PATH
[
{
"UserID":1,
"Detail":{
"FullName":"Paul Denton",
"RegDate":"2017-01-23T00:00:00"
}
}
]
SELECT t.[name]
, t.[object_id]
, [columns] = (
SELECT c.column_id, c.[name]
FROM sys.columns c
WHERE c.[object_id] = t.[object_id]
FOR JSON AUTO
)
FROM sys.tables t
FOR JSON AUTO
[
{
"name":"#Users",
"object_id":1483152329,
"columns":[
{
"column_id":1,
"name":"UserID"
},
{
"column_id":2,
"name":"UserName"
},
{
"column_id":3,
"name":"RegDate"
}
]
}
]
To check the validity of the JSON format, there is an ISJSON
function that returns 1
if it is JSON, 0
if not, and NULL
if NULL
was passed.
DECLARE @json1 NVARCHAR(MAX) = N'{"id": 1}'
, @json2 NVARCHAR(MAX) = N'[1,2,3]'
, @json3 NVARCHAR(MAX) = N'1'
, @json4 NVARCHAR(MAX) = N''
, @json5 NVARCHAR(MAX) = NULL
SELECT ISJSON(@json1)
, ISJSON(@json2)
, ISJSON(@json3)
, ISJSON(@json4)
, ISJSON(@json5)
To extract a scalar value from JSON, you can use the JSON_VALUE
function:
DECLARE @json NVARCHAR(MAX) = N'
{
"UserID": 1,
"UserName": "JC Denton",
"IsActive": true,
"Date": "2016-05-31T00:00:00",
"Settings": [
{
"Language": "EN"
},
{
"Skin": "FlatUI"
}
]
}'
SELECT JSON_VALUE(@json, '$.UserID')
, JSON_VALUE(@json, '$.UserName')
, JSON_VALUE(@json, '$.Settings[0].Language')
, JSON_VALUE(@json, '$.Settings[1].Skin')
, JSON_QUERY(@json, '$.Settings')
For parsing tabular data, the OPENJSON
table function is used. It should be noted right away that it will only work on bases with compatibility level 130 and higher.
There are two modes of operation of the OPENSON
function. The simplest one is without specifying a scheme for the resulting selection:
DECLARE @json NVARCHAR(MAX) = N'
{
"UserID": 1,
"UserName": "JC Denton",
"IsActive": true,
"RegDate": "2016-05-31T00:00:00"
}'
SELECT * FROM OPENJSON(@json)
In the second mode, we can ourselves describe how the returned result will look like: the names of the columns, their number, where to get the values for them from:
DECLARE @json NVARCHAR(MAX) = N'
[
{
"User ID": 1,
"UserName": "JC Denton",
"IsActive": true,
"Date": "2016-05-31T00:00:00",
"Settings": [
{
"Language": "EN"
},
{
"Skin": "FlatUI"
}
]
},
{
"User ID": 2,
"UserName": "Paul Denton",
"IsActive": false
}
]'
SELECT * FROM OPENJSON(@json)
SELECT * FROM OPENJSON(@json, '$[0]')
SELECT * FROM OPENJSON(@json, '$[0].Settings[0]')
SELECT *
FROM OPENJSON(@json)
WITH (
UserID INT '$."User ID"'
, UserName SYSNAME
, IsActive BIT
, RegDate DATETIME '$.Date'
, Settings NVARCHAR(MAX) AS JSON
, Skin SYSNAME '$.Settings[1].Skin'
)
If our document has a nested hierarchy, then the following example will help:
DECLARE @json NVARCHAR(MAX) = N'
[
{
"FullName": "JC Denton",
"Children": [
{ "FullName": "Mary", "Male": "0" },
{ "FullName": "Paul", "Male": "1" }
]
},
{
"FullName": "Paul Denton"
}
]'
SELECT t.FullName, c.*
FROM OPENJSON(@json)
WITH (
FullName SYSNAME
, Children NVARCHAR(MAX) AS JSON
) t
OUTER APPLY OPENJSON(Children)
WITH (
ChildrenName SYSNAME '$.FullName'
, Male TINYINT
) c
With the release of SQL Server 2016, the STRING_SPLIT
function was introduced. And everyone breathed a sigh of relief that now there is no need to invent a bicycle for dividing a string into tokens. However, there is another alternative - the OPENJSON
construct, which we considered earlier. Let's test several options for splitting a string
:
SET NOCOUNT ON
SET STATISTICS TIME OFF
DECLARE @x VARCHAR(MAX) = '1' + REPLICATE(CAST(',1' AS VARCHAR(MAX)), 1000)
SET STATISTICS TIME ON
;WITH cte AS
(
SELECT s = 1
, e = COALESCE(NULLIF(CHARINDEX(',', @x, 1), 0), LEN(@x) + 1)
, v = SUBSTRING(@x, 1, COALESCE(NULLIF(CHARINDEX(',', @x, 1), 0), LEN(@x) + 1) - 1)
UNION ALL
SELECT s = CONVERT(INT, e) + 1
, e = COALESCE(NULLIF(CHARINDEX(',', @x, e + 1), 0), LEN(@x) + 1)
, v = SUBSTRING(@x, e + 1, COALESCE(NULLIF(CHARINDEX(',', @x, e + 1), 0)
, LEN(@x) + 1)- e - 1)
FROM cte
WHERE e < LEN(@x) + 1
)
SELECT v
FROM cte
WHERE LEN(v) > 0
OPTION (MAXRECURSION 0)
SELECT t.c.value('(./text())[1]', 'INT')
FROM (
SELECT x = CONVERT(XML, '<i>' + REPLACE(@x, ',', '</i><i>') + '</i>').query('.')
) a
CROSS APPLY x.nodes('i') t(c)
SELECT *
FROM STRING_SPLIT(@x, N',')
SELECT [value]
FROM OPENJSON(N'[' + @x + N']')
SET STATISTICS TIME OFF
If you look at the results, you will notice that OPENJSON
in some cases can be faster than the STRING_SPLIT
function, not to mention crutches with XML and CTE:
500k 100k 50k 1000
------------- ------- ------ ------ ------
CTE 29407 2406 1266 58
XML 6520 1084 553 259
STRING_SPLIT 4665 594 329 27
OPENJSON 2606 506 273 19
Moreover, if we have a high-load OLTP, then there is no obvious difference between OPENJSON and STRING_SPLIT
(1000 iterations + 10 values separated by commas):
CTE = 4629 ms
XML = 4397 ms
STRING_SPLIT = 4011 ms
OPENJSON = 4047 ms
Starting with SQL Server 2005, it became possible to validate XML from the database side through the use of XML SCHEMA COLLECTION. We describe a schema for XML, and then, based on it, we can validate the data. There is no such functionality explicitly for JSON, but there is a workaround.
As far as I remember, there are two types of expressions for JSON: strict
and lax
(used by default). The difference is that if we specify non-existent or incorrect paths when parsing, then for the lax
expression we will get NULL
, and in the case of strict
- an error:
DECLARE @json NVARCHAR(MAX) = N'
{
"UserID": 1,
"UserName": "JC Denton"
}'
SELECT JSON_VALUE(@json, '$.IsActive')
, JSON_VALUE(@json, 'lax$.IsActive')
, JSON_VALUE(@json, 'strict$.UserName')
SELECT JSON_VALUE(@json, 'strict$.IsActive')
Msg 13608, Level 16, State 2, Line 12
Property cannot be found on the specified JSON path.
There is a JSON_MODIFY
function for modifying data inside JSON. The examples are quite simple, so there is no point in describing them in detail:
DECLARE @json NVARCHAR(MAX) = N'
{
"FirstName": "JC",
"LastName": "Denton",
"Age": 20,
"Skills": ["SQL Server 2014"]
}'
SET @json = JSON_MODIFY(@json, '$.Age', CAST(JSON_VALUE(@json, '$.Age') AS INT) + 2)
SET @json = JSON_MODIFY(@json, '$.Skills[0]', 'SQL 2016')
SET @json = JSON_MODIFY(@json, 'append $.Skills', 'JSON')
SELECT * FROM OPENJSON(@json)
SELECT * FROM OPENJSON(JSON_MODIFY(@json, 'lax$.Age', NULL))
SELECT * FROM OPENJSON(JSON_MODIFY(@json, 'strict$.Age', NULL))
GO
DECLARE @json NVARCHAR(100) = N'{ "price": 105.90 }'
SET @json =
JSON_MODIFY(
JSON_MODIFY(@json, '$.Price',
CAST(JSON_VALUE(@json, '$.price') AS NUMERIC(6,2))),
'$.price', NULL)
SELECT @json
And so, we begin to get to the most exciting thing, namely the issues related to performance.
When parsing JSON, you need to remember one nuance - OPENJSON
and JSON_VALUE
return the result in Unicode, if we do not override this. In the AdventureWorks
database, the AccountNumber
column is of the VARCHAR
data type:
USE AdventureWorks2014
GO
DECLARE @json NVARCHAR(MAX) = N'{ "AccountNumber": "AW00000009" }'
SET STATISTICS IO ON
SELECT CustomerID, AccountNumber
FROM Sales.Customer
WHERE AccountNumber = JSON_VALUE(@json, '$.AccountNumber')
SELECT CustomerID, AccountNumber
FROM Sales.Customer
WHERE AccountNumber = CAST(JSON_VALUE(@json, '$.AccountNumber') AS VARCHAR(10))
SET STATISTICS IO OFF
Difference in logical reads:
Table 'Customer'. Scan count 1, logical reads 37, ...
Table 'Customer'. Scan count 0, logical reads 2, ...
Due to the fact that the datatype
s between the column and the function result are not the same for us, SQL Server has to perform an implicit conversion based on precedence. In this particular case, to NVARCHAR
. Alas, all calculations and transformations on the index column most often lead to IndexScan
.
If we explicitly specify the type, as in the column, we get IndexSeek
.
Now let's look at how you can index JSON objects. As I mentioned above, SQL Server 2016 didn’t add a separate data type for JSON, unlike XML. Therefore, you can use any string
data types to store it.
If someone has experience with XML, then remember that for this format, there are several types of indexes in SQL Server that allow you to speed up certain selections. For string
types, in which JSON is supposed to be stored, such indexes simply don’t exist.
Alas, JSONB was not delivered. The development team was in a hurry to release the JSON functionality and said literally the following: “If you miss the speed, we will add JSONB in the next major version”. But with the release of SQL Server 2017/2019, nothing new has been added.
And here computed columns come to the rescue, which can represent certain properties from JSON documents, according to which you need to search, and create indexes based on these columns.
USE AdventureWorks2014
GO
DROP TABLE IF EXISTS #JSON
GO
CREATE TABLE #JSON (
DatabaseLogID INT PRIMARY KEY
, InfoJSON NVARCHAR(MAX) NOT NULL
)
GO
INSERT INTO #JSON
SELECT DatabaseLogID
, InfoJSON = (
SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
)
FROM dbo.DatabaseLog
It is not very rational to parse the same data every time:
SET STATISTICS IO, TIME ON
SELECT *
FROM #JSON
WHERE JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object') =
'Person.Person'
SET STATISTICS IO, TIME OFF
Table 'JSON'. Scan count 1, logical reads 187, ...
CPU time = 16 ms, elapsed time = 29 ms
Therefore, creating a calculated column and then including it in the index is sometimes justified:
ALTER TABLE #JSON
ADD ObjectName AS
JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object')
GO
CREATE INDEX IX_ObjectName ON #JSON (ObjectName)
GO
SET STATISTICS IO, TIME ON
SELECT *
FROM #JSON
WHERE JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object') =
'Person.Person'
SELECT *
FROM #JSON
WHERE ObjectName = 'Person.Person'
SET STATISTICS IO, TIME OFF
That said, the SQL Server optimizer is pretty smart, so you don't need to change anything in the code:
Table 'JSON'. Scan count 1, logical reads 13, ...
CPU time = 0 ms, elapsed time = 1 ms
Table 'JSON'. Scan count 1, logical reads 13, ...
CPU time = 0 ms, elapsed time = 1 ms
In addition, you can create both regular indexes and full-text indexes, if we want to get a search over the contents of arrays or whole parts of objects. At the same time, the full-text index doesn’t have any special rules for processing JSON, it just breaks the text into separate tokens, using double quotes, commas, brackets as separators - this is what the JSON structure itself consists of:
USE AdventureWorks2014
GO
DROP TABLE IF EXISTS dbo.LogJSON
GO
CREATE TABLE dbo.LogJSON (
DatabaseLogID INT
, InfoJSON NVARCHAR(MAX) NOT NULL
, CONSTRAINT pk PRIMARY KEY (DatabaseLogID)
)
GO
INSERT INTO dbo.LogJSON
SELECT DatabaseLogID
, InfoJSON = (
SELECT PostTime, DatabaseUser, [Event], ObjectName = [Schema] + '.' + [Object]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
)
FROM dbo.DatabaseLog
GO
IF EXISTS(
SELECT *
FROM sys.fulltext_catalogs
WHERE [name] = 'JSON_FTC'
)
DROP FULLTEXT CATALOG JSON_FTC
GO
CREATE FULLTEXT CATALOG JSON_FTC WITH ACCENT_SENSITIVITY = ON AUTHORIZATION dbo
GO
IF EXISTS (
SELECT *
FROM sys.fulltext_indexes
WHERE [object_id] = OBJECT_ID(N'dbo.LogJSON')
) BEGIN
ALTER FULLTEXT INDEX ON dbo.LogJSON DISABLE
DROP FULLTEXT INDEX ON dbo.LogJSON
END
GO
CREATE FULLTEXT INDEX ON dbo.LogJSON (InfoJSON) KEY INDEX pk ON JSON_FTC
GO
SELECT *
FROM dbo.LogJSON
WHERE CONTAINS(InfoJSON, 'ALTER_TABLE')
And finally, we have come to, perhaps, the most interesting part of this article. How much faster is JSON parsed compared to XML on SQL Server? To answer this question, I have prepared a series of tests.
We prepare two large files in JSON and XML format:
USE AdventureWorks2014
GO
DROP PROCEDURE IF EXISTS ##get_xml
DROP PROCEDURE IF EXISTS ##get_json
GO
CREATE PROCEDURE ##get_xml
AS
SELECT r.ProductID
, r.[Name]
, r.ProductNumber
, d.OrderQty
, d.UnitPrice
, r.ListPrice
, r.Color
, r.MakeFlag
FROM Sales.SalesOrderDetail d
JOIN Production.Product r ON d.ProductID = r.ProductID
FOR XML PATH ('Product'), ROOT('Products')
GO
CREATE PROCEDURE ##get_json
AS
SELECT (
SELECT r.ProductID
, r.[Name]
, r.ProductNumber
, d.OrderQty
, d.UnitPrice
, r.ListPrice
, r.Color
, r.MakeFlag
FROM Sales.SalesOrderDetail d
JOIN Production.Product r ON d.ProductID = r.ProductID
FOR JSON PATH
)
GO
DECLARE @sql NVARCHAR(4000)
SET @sql =
'bcp "EXEC ##get_xml" queryout "X:\sample.xml" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql
SET @sql =
'bcp "EXEC ##get_json" queryout "X:\sample.txt" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql
Checking the performance of OPENJSON
, OPENXML
and XQuery
:
SET NOCOUNT ON
SET STATISTICS TIME ON
DECLARE @xml
SELECT @xml = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.xml', SINGLE_BLOB) x
DECLARE @jsonu NVARCHAR(MAX)
SELECT @jsonu = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.txt', SINGLE_NCLOB) x
SELECT ProductID = t.c.value('(ProductID/text())[1]', 'INT')
, [Name] = t.c.value('(Name/text())[1]', 'NVARCHAR(50)')
, ProductNumber = t.c.value('(ProductNumber/text())[1]', 'NVARCHAR(25)')
, OrderQty = t.c.value('(OrderQty/text())[1]', 'SMALLINT')
, UnitPrice = t.c.value('(UnitPrice/text())[1]', 'MONEY')
, ListPrice = t.c.value('(ListPrice/text())[1]', 'MONEY')
, Color = t.c.value('(Color/text())[1]', 'NVARCHAR(15)')
, MakeFlag = t.c.value('(MakeFlag/text())[1]', 'BIT')
FROM @xml.nodes('Products/Product') t(c)
DECLARE @doc INT
EXEC sys.sp_xml_preparedocument @doc OUTPUT, @xml
SELECT *
FROM OPENXML(@doc, '/Products/Product', 2)
WITH (
ProductID INT
, [Name] NVARCHAR(50)
, ProductNumber NVARCHAR(25)
, OrderQty SMALLINT
, UnitPrice MONEY
, ListPrice MONEY
, Color NVARCHAR(15)
, MakeFlag BIT
)
EXEC sys.sp_xml_removedocument @doc
SELECT *
FROM OPENJSON(@jsonu)
WITH (
ProductID INT
, [Name] NVARCHAR(50)
, ProductNumber NVARCHAR(25)
, OrderQty SMALLINT
, UnitPrice MONEY
, ListPrice MONEY
, Color NVARCHAR(15)
, MakeFlag BIT
)
SET STATISTICS TIME, IO OFF
Now let's check the performance of the JSON_VALUE
scalar function relative to XQuery
:
SET NOCOUNT ON
DECLARE @jsonu NVARCHAR(MAX) = N'[
{"User":"Sergii Syrovatchenko","Age":28,"Skills":["SQL Server","T-SQL","JSON","XML"]},
{"User":"JC Denton","Skills":["Microfibral Muscle","Regeneration","EMP Shield"]},
{"User":"Paul Denton","Age":32,"Skills":["Vision Enhancement"]}]'
DECLARE @jsonu_f NVARCHAR(MAX) = N'[
{
"User":"Sergii Syrovatchenko",
"Age":28,
"Skills":[
"SQL Server",
"T-SQL",
"JSON",
"XML"
]
},
{
"User":"JC Denton",
"Skills":[
"Microfibral Muscle",
"Regeneration",
"EMP Shield"
]
},
{
"User":"Paul Denton",
"Age":32,
"Skills":[
"Vision Enhancement"
]
}
]'
DECLARE @json VARCHAR(MAX) = @jsonu
, @json_f VARCHAR(MAX) = @jsonu_f
DECLARE @xml = N'
<Users>
<User Name="Sergii Syrovatchenko">
<Age>28</Age>
<Skills>
<Skill>SQL Server</Skill>
<Skill>T-SQL</Skill>
<Skill>JSON</Skill>
<Skill>XML</Skill>
</Skills>
</User>
<User Name="JC Denton">
<Skills>
<Skill>Microfibral Muscle</Skill>
<Skill>Regeneration</Skill>
<Skill>EMP Shield</Skill>
</Skills>
</User>
<User Name="Paul Denton">
<Age>28</Age>
<Skills>
<Skill>Vision Enhancement</Skill>
</Skills>
</User>
</Users>'
DECLARE @i INT
, @int INT
, @varchar VARCHAR(100)
, @nvarchar NVARCHAR(100)
, @s DATETIME
, @runs INT = 100000
DECLARE @t TABLE (
iter INT IDENTITY PRIMARY KEY
, data_type VARCHAR(100)
, [path] VARCHAR(1000)
, [type] VARCHAR(1000)
, time_ms INT
)
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @int = JSON_VALUE(@jsonu, '$[0].Age')
, @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @int = JSON_VALUE(@jsonu_f, '$[0].Age')
, @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @int = JSON_VALUE(@json, '$[0].Age')
, @i += 1
INSERT INTO @t
SELECT '@json', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @int = JSON_VALUE(@json_f, '$[0].Age')
, @i += 1
INSERT INTO @t
SELECT '@json_f', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @int = @xml.value('(Users/User[1]/Age/text())[1]', 'INT')
, @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[1]/Age/text())[1]', 'INT', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @nvarchar = JSON_VALUE(@jsonu, '$[1].User')
, @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[1].User', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @nvarchar = JSON_VALUE(@jsonu_f, '$[1].User')
, @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[1].User', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = JSON_VALUE(@json, '$[1].User')
, @i += 1
INSERT INTO @t
SELECT '@json', '$[1].User', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = JSON_VALUE(@json_f, '$[1].User')
, @i += 1
INSERT INTO @t
SELECT '@json_f', '$[1].User', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @nvarchar = @xml.value('(Users/User[2]/@Name)[1]', 'NVARCHAR(100)')
, @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[2]/@Name)[1]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = @xml.value('(Users/User[2]/@Name)[1]', 'VARCHAR(100)')
, @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[2]/@Name)[1]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @nvarchar = JSON_VALUE(@jsonu, '$[2].Skills[0]')
, @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[2].Skills[0]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @nvarchar = JSON_VALUE(@jsonu_f, '$[2].Skills[0]')
, @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[2].Skills[0]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = JSON_VALUE(@json, '$[2].Skills[0]')
, @i += 1
INSERT INTO @t
SELECT '@json', '$[2].Skills[0]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = JSON_VALUE(@json_f, '$[2].Skills[0]')
, @i += 1
INSERT INTO @t
SELECT '@json_f', '$[2].Skills[0]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
SELECT @varchar = @xml.value('(Users/User[3]/Skills/Skill/text())[1]', 'VARCHAR(100)')
, @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[3]/Skills/Skill/text())[1]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())
SELECT * FROM @t
Results:
data_type path type 2016 SP2(CU17) 2017(CU25) 2019(CU11)
---------- -------------------------------------- --------- ------------- ----------- -----------
@jsonu $[0].Age INT 1030 314 250
@jsonu_f $[0].Age INT 1014 333 290
@json $[0].Age INT 1100 393 337
@json_f $[0].Age INT 1166 444 396
@xml (Users/User[1]/Age/text())[1] INT 28037 30513 34427
@jsonu $[1].User NVARCHAR 1223 513 490
@jsonu_f $[1].User NVARCHAR 1367 647 673
@json $[1].User VARCHAR 1343 623 604
@json_f $[1].User VARCHAR 1527 790 800
@xml (Users/User[2]/@Name)[1] NVARCHAR 22487 25094 26840
@xml (Users/User[2]/@Name)[1] VARCHAR 22816 25190 26676
@jsonu $[2].Skills[0] NVARCHAR 1654 856 814
@jsonu_f $[2].Skills[0] NVARCHAR 1926 1124 1103
@json $[2].Skills[0] VARCHAR 1790 960 907
@json_f $[2].Skills[0] VARCHAR 2090 1250 1196
@xml (Users/User[3]/Skills/Skill/text())[1] VARCHAR 23530 25426 27840
And another interesting nuance: you don't need to mix JSON_VALUE
and OPENJSON
calls. Also, try to specify only those columns that are really needed after parsing. With JSON, everything is extremely simple - the fewer columns you need to parse, the faster we get the result:
SET NOCOUNT ON
SET STATISTICS TIME ON
DECLARE @json NVARCHAR(MAX)
SELECT @json = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.txt', SINGLE_NCLOB) x
SELECT COUNT_BIG(*)
FROM OPENJSON(@json)
WITH (
ProductID INT
, ProductNumber NVARCHAR(25)
, OrderQty SMALLINT
, UnitPrice MONEY
, ListPrice MONEY
, Color NVARCHAR(15)
)
WHERE Color = 'Black'
SELECT COUNT_BIG(*)
FROM OPENJSON(@json) WITH (Color NVARCHAR(15))
WHERE Color = 'Black'
SELECT COUNT_BIG(*)
FROM OPENJSON(@json)
WHERE JSON_VALUE(value, '$.Color') = 'Black'
- Parsing data from JSON is 2x to 10x times faster than from XML.
- JSON storage is often more redundant than XML storage.
- Parsing JSON data in Unicode is 5-15% faster.
- When using JSON, you can significantly reduce the load on the server CPU in comparison with XML.
- Starting with SQL Server 2017, significantly accelerated the parsing of scalar values from JSON.
- If you can get rid of XML in favor of JSON - do it!
You can get all fresh code samples from my Git. Thanks!