Top 8 New SQL Features of MySQL 8

Yuri Danilov

5.00/5 (1 vote)

16 Mar 2021CPOL6 min read

7.7K

An overview of Top 8 new SQL features of MySQL 8

In this tip, you will learn about MySQL 8's top 8 new features that can make your code more efficient.

Introduction

Even though MySQL 8 was released almost 3 years ago (in April 2018), some projects in development are still targeting version 5.7 without considering the useful features of version 8, which can make your code more efficient. In addition, some features continue to appear and improve as minor versions are released.

Let's take a look at 8 of the most interesting ones.

8 of 8 TL;DR:

Common Table Expressions
Window Functions
Descending Indexes
Expressions as default values
Functional index key parts
Additional target types for casts
Row and column aliases with ON DUPLICATE KEY UPDATE
Regular expression support

Common Table Expressions (CTEs)

A CTE (also known as WITH query) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, recursively or not:

SQL

WITH
  cte1 AS (SELECT a, b FROM table1),
  cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;

Recursive CTEs can be useful to generate some data (Oracle "Connect By Prior" equivalent). For example, we can generate 30 rows and there is no need to create a special procedure or temporary table:

SQL

WITH RECURSIVE sequence AS (
  SELECT 0 AS level
  UNION ALL
  SELECT level + 1 AS value FROM sequence WHERE sequence.level < 30
)
SELECT level
FROM sequence;

We can use these rows to generate some random data:

SQL

WITH RECURSIVE sequence AS (
  SELECT 1 AS level
  UNION ALL
  SELECT level + 1 AS value FROM sequence WHERE sequence.level < 10
)
SELECT LEVEL,
REGEXP_REPLACE(CONCAT(SUBSTRING(MD5(UUID()), 1, 20 + rand()*20) , _
               '@mail.com'), '\\d', '') random_email,
concat('+', FLOOR(rand() * 100), ' ', FLOOR(rand() * 1000), ' ', _
       FLOOR(rand() * 1000), ' ', FLOOR(rand() * 10000)) random_phone,
REGEXP_REPLACE(MAKE_SET(rand()*9999999999999999,
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', _
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
), ',', '') random_str,
date(date_add(now(), INTERVAL LEVEL DAY)) days_sequence,
date_format(date_add(now(), INTERVAL LEVEL YEAR), '%Y') years_sequence,
date_add(now(), INTERVAL rand() * 999999999 SECOND) rand_future_date,
date_add(now(), INTERVAL - rand() * 999999999 - 16 * 365.25 * 24 * 3600 SECOND) _
         rand_past_date_older16
FROM sequence;

As of MySQL 8.0.19, the recursive SELECT part of a recursive CTE supports a LIMIT clause, OFFSET is also supported:

SQL

WITH RECURSIVE sequence AS (
  SELECT 1 AS level
  UNION ALL
  SELECT level + 1 AS value FROM sequence WHERE sequence.level < 100
)
SELECT LEVEL
FROM sequence
LIMIT 10 OFFSET 20;

Expressions can also be used to create mathematical sequences such as Fibonacci numbers or to traverse data hierarchically:

SQL

WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS
(
  SELECT 1, 0, 1
  UNION ALL
  SELECT n + 1, next_fib_n, fib_n + next_fib_n
    FROM fibonacci WHERE n < 20
)
SELECT * FROM fibonacci;
-- WHERE n = 10; -- use WHERE to select n-th Fibonacci number

Window Functions

Window functions perform some group calculation on a set of rows, just like grouped aggregate functions. But unlike a grouped aggregate that collapses set of rows into a single row, a window function performs the aggregation for each row in the result set.

There are two types of window functions:

SQL aggregate functions used as window functions: COUNT, SUM, AVG, MIN, MAX, BIT_OR, BIT_AND, BIT_XOR, STDDEV_POP, STDDEV_SAMP, VAR_POP and VAR_SAMP
Specialized window functions: RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE, ROW_NUMBER, FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG

Let's assume that we have a table containing sales by employee:

SQL

CREATE TABLE sales(employee VARCHAR(50), `date` DATE, sale INT);
 
INSERT INTO sales VALUES
  ('Peter', '2020-03-01', 100),
  ('Peter', '2020-04-01', 200),
  ('Peter', '2020-05-01', 300),
  ('Peter', '2020-06-01', 100),
  ('John',  '2020-03-01', 300),
  ('John',  '2020-04-01', 400),
  ('John',  '2020-05-01', 500),
  ('Jane',  '2020-03-01', 600),
  ('Jane',  '2020-04-01', 800)
;

In the following select, we have a grouped aggregate and there is no GROUP BY clause, but it is implicit and contains all rows:

SQL

SELECT employee, SUM(sale) FROM sales GROUP BY employee;

Unlike GROUP BY, we can let a window function only see the rows of a subset of the total set of rows. This is called a partition, which is similar to grouping, except that the sums are different for each salesman:

SQL

SELECT employee, date, sale, SUM(sale) OVER (PARTITION BY employee) AS sum FROM sales;

To see the sales of the different months, and how the contributions from our salesmen contribute:

SQL

SELECT employee, MONTHNAME(date) AS month, sale, SUM(sale)
  OVER (PARTITION BY MONTH(date)) AS sum FROM sales;

The window specification can contain an ordering clause for the rows in a partition:

SQL

SELECT employee, sale, date, SUM(sale)
  OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;

The above window specification is equivalent to the explicit:

SQL

(PARTITION by employee ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

that is, for each sorted row, the SUM should see all rows before it (UNBOUNDED), and up to and including the current ROW. To calculate the percentage of total sales by employee, we can use:

SQL

SELECT employee, sale, sale*100/SUM(sale)
  OVER (PARTITION BY employee) AS percent_of_sales FROM sales;

Expressions as Default Values

MySQL now supports use of expressions as default values for the BLOB, TEXT, GEOMETRY and JSON data types, which previously could not be assigned default values at all.

SQL

CREATE TABLE table1 (
  field1 FLOAT       DEFAULT (RAND() * 10),
  field2 BINARY(16)  DEFAULT (UUID_TO_BIN(UUID())),
  field3 DATE        DEFAULT (CURRENT_DATE + INTERVAL 1 YEAR),
  field4 JSON        DEFAULT (JSON_ARRAY()),
  field5 BLOB        DEFAULT (SUBSTR(UUID(), 1, 18)),
  field6 TEXT        DEFAULT (CONCAT('UUID: ', UUID())),
  field7 POINT       DEFAULT (Point(5, 10)),
  field8 GEOMETRY    DEFAULT (ST_MPointFromText('MULTIPOINT (1 1, 2 2, 3 3)'))
);

Descending Indexes

Previously, DESC could be used in defining an index, but this flag was ignored. Key values are now physically stored in descending order. Previously, it was possible to scan indexes in reverse order, but this resulted in performance degradation. The descending index can now be scanned in forward order, which is more efficient.

The benefits can be better understood with an example:

SQL

CREATE TABLE tab1 (
  col1 INT, col2 INT,
  INDEX idx1 (col1 ASC, col2 ASC),
  INDEX idx2 (col1 ASC, col2 DESC),
  INDEX idx3 (col1 DESC, col2 ASC),
  INDEX idx4 (col1 DESC, col2 DESC)
);

The optimizer can perform a forward index scan for each ORDER BY clause and not use the file sort operation:

SQL

ORDER BY col1 ASC,  col2 ASC    -- optimizer can use idx1
ORDER BY col1 ASC,  col2 DESC   -- optimizer can use idx2
ORDER BY col1 DESC, col2 ASC    -- optimizer can use idx3
ORDER BY col1 DESC, col2 DESC   -- optimizer can use idx4

Functional Index Key Parts

Starting with version 8.0.13, MySQL supports function key parts that index expression values rather than column values or column prefixes. Using parts of a function key allows you to index values that are not directly stored in the table:

SQL

CREATE TABLE tab1 (
  col1 VARCHAR(10),
  col2 VARCHAR(10),
  col3 INT,
  col4 INT,
  INDEX f_idx1 ((concat(col1, col2))),
  INDEX f_idx2 ((ABS(col3))),
  INDEX f_idx3 ((col3 + col4)),
  INDEX f_idx4 ((col3 + col4), (col3 - col4), col3),
  INDEX f_idx5 ((col3 * 40) DESC),
  INDEX f_idx6 ((md5(col1)))
);

Additional Target Types for Casts

Beginning with MySQL 8.0.17, the CAST and CONVERT functions now support conversions to the DOUBLE, FLOAT, and REAL types:

SQL

SELECT CAST('123.45' AS DOUBLE); -- 123.45
SELECT CAST('123.45' AS FLOAT);  -- 123.45
SELECT CAST('123.45123123123123' AS FLOAT(10)); -- 123.451
SELECT CAST('123.45123123123123' AS FLOAT(20)); -- 123.451
SELECT CAST('123.45123123123123' AS FLOAT(50)); -- 123.45123123123123
SELECT CAST('123.45123123123123' AS REAL);      -- 123.45123123123123

As of 8.0.22, the server allows casting to YEAR:

SQL

SELECT CONVERT(1994.35, YEAR), CONVERT(1994.50, YEAR);            -- 1994, 1995
SELECT CONVERT(20.35, YEAR), CONVERT(20.50, YEAR);                -- 2020, 2021
SELECT CONVERT("15:20:00", YEAR), CONVERT(TIME "15:20:00", YEAR); -- 2015, 2021

Row and Column Aliases With On Duplicate Key Update

Beginning with MySQL 8.0.19, it is possible to reference the row to be inserted and its columns, using aliases. Using the alias new for the new row, and (if needed) the aliases m and n for this row's columns, the INSERT statement:

SQL

INSERT INTO tab1 SET a=5, b=8
ON DUPLICATE KEY UPDATE a = VALUES(a) + VALUES(b);

can be rewritten in many different ways:

SQL

INSERT INTO tab1 SET a = 5, b = 8 AS new
ON DUPLICATE KEY UPDATE a = new.a + new.b;

SQL

INSERT INTO tab1 VALUES(5, 8) AS new
ON DUPLICATE KEY UPDATE a = new.a + new.b;

SQL

INSERT INTO tab1 SET a = 5, b = 8 AS new(m, n)
ON DUPLICATE KEY UPDATE a = m + n;

SQL

INSERT INTO tab1 VALUES(6, 8) AS new(m, n)
ON DUPLICATE KEY UPDATE a = m + n;

Regular Expression Support

In addition to the already existing functions NOT REGEXP, REGEXP and RLIKE, MySQL 8 provides four new ones:

REGEXP_LIKE - whether string matches regular expression
REGEXP_INSTR - starting index of substring matching regular expression
REGEXP_REPLACE - replace substrings matching regular expression
REGEXP_SUBSTR - return substring matching regular expression

REGEXP_LIKE takes a "subject" string, a pattern that gets compiled into a regular expression, and an optional parameter string, more on that later. In its simplest forms, you use it like this:

SQL

SELECT regexp_like('aba', 'b+');   -- 1
SELECT regexp_like('aba', 'b{2}'); -- 0

MySQL now implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe, so we can collate apples and cherries 🙂:

SQL

SELECT regexp_like('🍎🍎🍎🍒🍒🍒', '🍎🍒'); -- 1
SELECT regexp_like('🍎🍎🍎🍒🍒', '🍒🍎');   -- 0

REGEXP_INSTR is very similar, but it doesn’t just return a 1 or 0. It gives you the position of the match, or 0 if there was none, and a few more options.

For instance, you can choose starting position for search:

SQL

SELECT regexp_instr( 'abba', 'b{2}', 2 ); -- 2
SELECT regexp_instr( 'abba', 'b{2}', 3 ); -- 0

which occurrence of the match it is that counts:

SQL

SELECT regexp_instr( 'abbabba', 'b{2}', 1, 2 ); -- 5
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 3 ); -- 0

and the position after match:

SQL

SELECT regexp_instr( 'abbabba', 'b{2}', 1, 2, 0 ); -- 5
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 3, 1 ); -- 7

0 means that MySQL should return the first position of the match, while 1 means the position after the match.

REGEXP_SUBSTR, in addition to expression and pattern, takes three optional arguments: position, occurrence and match_type:

SQL

SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+'); -- abc
SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3); -- ghi

REGEXP_REPLACE performs a full search-and-replace operation:

SQL

SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3); -- abc def X

Conclusion

Based on the description of these new functions, an analogy can be drawn with some industrial DBMS, in particular with Oracle. For example, Common Table Expressions are a kind of CONNECT BY LEVEL in Oracle, and Window Functions are similar in syntax to Analytic Functions, and the sets of functions also overlap.

REGEXP_REPLACE and other functions now provide a powerful string manipulation mechanism. UTF8MB4 is the default character set now, whereby SQL performance – such as sorting UTF8MB4 strings – has been improved by a factor of 20 in 8.0 as compared to 5.7.

In addition to what was covered, also added: NOWAIT and SKIP LOCKED, JSON enhancements, GROUPING function, GIS, Cost Model and Histograms, and more.

There have also been a number of architectural improvements and performance optimizations that DBAs are sure to be interested in.

To summarize, we can say that MySQL is gradually becoming a more mature, industrial tool, while continuing to constantly refine and improve. Despite a number of specific drawbacks of MySQL, we can be sure that the new version is one step closer towards creating a stable and functional solution that will continue to be popular in its segment.

A full description of new features is available at the links: MySQL reference manual and MySQL team blog.

History

15^th March, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)