In this tip, you will learn about MySQL 8's top 8 new features that can make your code more efficient.
Introduction
Even though MySQL 8 was released almost 3 years ago (in April 2018), some projects in development are still targeting version 5.7 without considering the useful features of version 8, which can make your code more efficient. In addition, some features continue to appear and improve as minor versions are released.
Let's take a look at 8 of the most interesting ones.
8 of 8 TL;DR:
- Common Table Expressions
- Window Functions
- Descending Indexes
- Expressions as default values
- Functional index key parts
- Additional target types for casts
- Row and column aliases with ON DUPLICATE KEY UPDATE
- Regular expression support
Common Table Expressions (CTEs)
A CTE (also known as WITH
query) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, recursively or not:
WITH
cte1 AS (SELECT a, b FROM table1),
cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2 WHERE cte1.a = cte2.c;
Recursive CTEs can be useful to generate some data (Oracle "Connect By Prior" equivalent). For example, we can generate 30 rows and there is no need to create a special procedure or temporary table:
WITH RECURSIVE sequence AS (
SELECT 0 AS level
UNION ALL
SELECT level + 1 AS value FROM sequence WHERE sequence.level < 30
)
SELECT level
FROM sequence;
We can use these rows to generate some random data:
WITH RECURSIVE sequence AS (
SELECT 1 AS level
UNION ALL
SELECT level + 1 AS value FROM sequence WHERE sequence.level < 10
)
SELECT LEVEL,
REGEXP_REPLACE(CONCAT(SUBSTRING(MD5(UUID()), 1, 20 + rand()*20) , _
'@mail.com'), '\\d', '') random_email,
concat('+', FLOOR(rand() * 100), ' ', FLOOR(rand() * 1000), ' ', _
FLOOR(rand() * 1000), ' ', FLOOR(rand() * 10000)) random_phone,
REGEXP_REPLACE(MAKE_SET(rand()*9999999999999999,
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', _
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
), ',', '') random_str,
date(date_add(now(), INTERVAL LEVEL DAY)) days_sequence,
date_format(date_add(now(), INTERVAL LEVEL YEAR), '%Y') years_sequence,
date_add(now(), INTERVAL rand() * 999999999 SECOND) rand_future_date,
date_add(now(), INTERVAL - rand() * 999999999 - 16 * 365.25 * 24 * 3600 SECOND) _
rand_past_date_older16
FROM sequence;
As of MySQL 8.0.19, the recursive SELECT
part of a recursive CTE supports a LIMIT
clause, OFFSET
is also supported:
WITH RECURSIVE sequence AS (
SELECT 1 AS level
UNION ALL
SELECT level + 1 AS value FROM sequence WHERE sequence.level < 100
)
SELECT LEVEL
FROM sequence
LIMIT 10 OFFSET 20;
Expressions can also be used to create mathematical sequences such as Fibonacci numbers or to traverse data hierarchically:
WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS
(
SELECT 1, 0, 1
UNION ALL
SELECT n + 1, next_fib_n, fib_n + next_fib_n
FROM fibonacci WHERE n < 20
)
SELECT * FROM fibonacci;
Window Functions
Window functions perform some group calculation on a set of rows, just like grouped aggregate functions. But unlike a grouped aggregate that collapses set of rows into a single row, a window function performs the aggregation for each row in the result set.
There are two types of window functions:
- SQL aggregate functions used as window functions:
COUNT
, SUM
, AVG
, MIN
, MAX
, BIT_OR
, BIT_AND
, BIT_XOR
, STDDEV_POP
, STDDEV_SAMP
, VAR_POP
and VAR_SAMP
- Specialized window functions:
RANK
, DENSE_RANK
, PERCENT_RANK
, CUME_DIST
, NTILE
, ROW_NUMBER
, FIRST_VALUE
, LAST_VALUE
, NTH_VALUE
, LEAD
and LAG
Let's assume that we have a table
containing sales
by employee
:
CREATE TABLE sales(employee VARCHAR(50), `date` DATE, sale INT);
INSERT INTO sales VALUES
('Peter', '2020-03-01', 100),
('Peter', '2020-04-01', 200),
('Peter', '2020-05-01', 300),
('Peter', '2020-06-01', 100),
('John', '2020-03-01', 300),
('John', '2020-04-01', 400),
('John', '2020-05-01', 500),
('Jane', '2020-03-01', 600),
('Jane', '2020-04-01', 800)
;
In the following select
, we have a grouped aggregate and there is no GROUP BY
clause, but it is implicit and contains all rows:
SELECT employee, SUM(sale) FROM sales GROUP BY employee;
Unlike GROUP BY
, we can let a window function only see the rows of a subset of the total set of rows. This is called a partition, which is similar to grouping, except that the sums are different for each salesman:
SELECT employee, date, sale, SUM(sale) OVER (PARTITION BY employee) AS sum FROM sales;
To see the sales of the different months, and how the contributions from our salesmen contribute:
SELECT employee, MONTHNAME(date) AS month, sale, SUM(sale)
OVER (PARTITION BY MONTH(date)) AS sum FROM sales;
The window specification can contain an ordering clause for the rows in a partition:
SELECT employee, sale, date, SUM(sale)
OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;
The above window specification is equivalent to the explicit:
(PARTITION by employee ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
that is, for each sorted row, the SUM
should see all rows before it (UNBOUNDED
), and up to and including the current ROW
. To calculate the percentage of total sales by employee
, we can use:
SELECT employee, sale, sale*100/SUM(sale)
OVER (PARTITION BY employee) AS percent_of_sales FROM sales;
Expressions as Default Values
MySQL now supports use of expressions as default values for the BLOB
, TEXT
, GEOMETRY
and JSON
data types, which previously could not be assigned default values at all.
CREATE TABLE table1 (
field1 FLOAT DEFAULT (RAND() * 10),
field2 BINARY(16) DEFAULT (UUID_TO_BIN(UUID())),
field3 DATE DEFAULT (CURRENT_DATE + INTERVAL 1 YEAR),
field4 JSON DEFAULT (JSON_ARRAY()),
field5 BLOB DEFAULT (SUBSTR(UUID(), 1, 18)),
field6 TEXT DEFAULT (CONCAT('UUID: ', UUID())),
field7 POINT DEFAULT (Point(5, 10)),
field8 GEOMETRY DEFAULT (ST_MPointFromText('MULTIPOINT (1 1, 2 2, 3 3)'))
);
Descending Indexes
Previously, DESC
could be used in defining an index, but this flag was ignored. Key values are now physically stored in descending order. Previously, it was possible to scan indexes in reverse order, but this resulted in performance degradation. The descending index can now be scanned in forward order, which is more efficient.
The benefits can be better understood with an example:
CREATE TABLE tab1 (
col1 INT, col2 INT,
INDEX idx1 (col1 ASC, col2 ASC),
INDEX idx2 (col1 ASC, col2 DESC),
INDEX idx3 (col1 DESC, col2 ASC),
INDEX idx4 (col1 DESC, col2 DESC)
);
The optimizer can perform a forward index scan for each ORDER BY
clause and not use the file sort operation:
ORDER BY col1 ASC, col2 ASC
ORDER BY col1 ASC, col2 DESC
ORDER BY col1 DESC, col2 ASC
ORDER BY col1 DESC, col2 DESC
Functional Index Key Parts
Starting with version 8.0.13, MySQL supports function key parts that index expression values rather than column values or column prefixes. Using parts of a function key allows you to index values that are not directly stored in the table:
CREATE TABLE tab1 (
col1 VARCHAR(10),
col2 VARCHAR(10),
col3 INT,
col4 INT,
INDEX f_idx1 ((concat(col1, col2))),
INDEX f_idx2 ((ABS(col3))),
INDEX f_idx3 ((col3 + col4)),
INDEX f_idx4 ((col3 + col4), (col3 - col4), col3),
INDEX f_idx5 ((col3 * 40) DESC),
INDEX f_idx6 ((md5(col1)))
);
Additional Target Types for Casts
Beginning with MySQL 8.0.17, the CAST
and CONVERT
functions now support conversions to the DOUBLE
, FLOAT
, and REAL
types:
SELECT CAST('123.45' AS DOUBLE);
SELECT CAST('123.45' AS FLOAT);
SELECT CAST('123.45123123123123' AS FLOAT(10));
SELECT CAST('123.45123123123123' AS FLOAT(20));
SELECT CAST('123.45123123123123' AS FLOAT(50));
SELECT CAST('123.45123123123123' AS REAL);
As of 8.0.22, the server allows casting to YEAR
:
SELECT CONVERT(1994.35, YEAR), CONVERT(1994.50, YEAR);
SELECT CONVERT(20.35, YEAR), CONVERT(20.50, YEAR);
SELECT CONVERT("15:20:00", YEAR), CONVERT(TIME "15:20:00", YEAR);
Row and Column Aliases With On Duplicate Key Update
Beginning with MySQL 8.0.19, it is possible to reference the row to be inserted and its columns, using aliases. Using the alias new for the new row, and (if needed) the aliases m and n for this row's columns, the INSERT
statement:
INSERT INTO tab1 SET a=5, b=8
ON DUPLICATE KEY UPDATE a = VALUES(a) + VALUES(b);
can be rewritten in many different ways:
INSERT INTO tab1 SET a = 5, b = 8 AS new
ON DUPLICATE KEY UPDATE a = new.a + new.b;
INSERT INTO tab1 VALUES(5, 8) AS new
ON DUPLICATE KEY UPDATE a = new.a + new.b;
INSERT INTO tab1 SET a = 5, b = 8 AS new(m, n)
ON DUPLICATE KEY UPDATE a = m + n;
INSERT INTO tab1 VALUES(6, 8) AS new(m, n)
ON DUPLICATE KEY UPDATE a = m + n;
Regular Expression Support
In addition to the already existing functions NOT REGEXP
, REGEXP
and RLIKE
, MySQL 8 provides four new ones:
REGEXP_LIKE
- whether string
matches regular expression REGEXP_INSTR
- starting index of substring matching regular expression REGEXP_REPLACE
- replace substrings matching regular expression REGEXP_SUBSTR
- return substring matching regular expression
REGEXP_LIKE
takes a "subject
" string
, a pattern that gets compiled into a regular expression, and an optional parameter string
, more on that later. In its simplest forms, you use it like this:
SELECT regexp_like('aba', 'b+');
SELECT regexp_like('aba', 'b{2}');
MySQL now implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe, so we can collate apples and cherries 🙂:
SELECT regexp_like('🍎🍎🍎🍒🍒🍒', '🍎🍒');
SELECT regexp_like('🍎🍎🍎🍒🍒', '🍒🍎');
REGEXP_INSTR
is very similar, but it doesn’t just return a 1
or 0
. It gives you the position of the match, or 0
if there was none, and a few more options.
For instance, you can choose starting position for search:
SELECT regexp_instr( 'abba', 'b{2}', 2 );
SELECT regexp_instr( 'abba', 'b{2}', 3 );
which occurrence of the match it is that counts:
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 2 );
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 3 );
and the position after match:
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 2, 0 );
SELECT regexp_instr( 'abbabba', 'b{2}', 1, 3, 1 );
0
means that MySQL should return the first position of the match, while 1
means the position after the match.
REGEXP_SUBSTR
, in addition to expression and pattern, takes three optional arguments: position, occurrence and match_type:
SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+');
SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3);
REGEXP_REPLACE
performs a full search-and-replace operation:
SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3);
Conclusion
Based on the description of these new functions, an analogy can be drawn with some industrial DBMS, in particular with Oracle. For example, Common Table Expressions are a kind of CONNECT BY LEVEL
in Oracle, and Window Functions are similar in syntax to Analytic Functions, and the sets of functions also overlap.
REGEXP_REPLACE
and other functions now provide a powerful string
manipulation mechanism. UTF8MB4
is the default character set now, whereby SQL performance – such as sorting UTF8MB4 string
s – has been improved by a factor of 20 in 8.0 as compared to 5.7.
In addition to what was covered, also added: NOWAIT
and SKIP LOCKED
, JSON enhancements, GROUPING
function, GIS, Cost Model and Histograms, and more.
There have also been a number of architectural improvements and performance optimizations that DBAs are sure to be interested in.
To summarize, we can say that MySQL is gradually becoming a more mature, industrial tool, while continuing to constantly refine and improve. Despite a number of specific drawbacks of MySQL, we can be sure that the new version is one step closer towards creating a stable and functional solution that will continue to be popular in its segment.
A full description of new features is available at the links: MySQL reference manual and MySQL team blog.
History
- 15th March, 2021: Initial version