How I can get back from query only not null fields? Yes, you can do it easily with a modern programming language, but you may lose resources by getting redundant information from tables and, many times, it could add redundant code. This task is only oriented to solve in MySQL since version 5.3.
Introduction
You have a lot of information with a tons of spreadsheets, csv, xml, bin, logs or plain text files and after a lite oversight, there are several columns inside the tables that have no values or, of course, NULL
values. For desktop users, it could be only an annoyance. On another hand, for Cloud users, you may not have cheap resources and costs only pay low bill. And for DBA, those NULL
S are a lot of space of cent 'spents' that have no data. But the fact is that ETL jobs only bring data, not guilty here. The present article solved this situation using a very high popular MySQL database engine, for purpose of didactical way.
Background
I present an example of the situation with the help of a database called cinemashop
, based on a streaming video store service company. It is running on MySQL, but it would be functional also for MariaDB and Percona forks. Work further could be advanced from here for these ones.
Using the Code
Inside cinemashop
database, there is a customer
table with several columns as the row size can allowed put there (at date, over 1000 columns).
MySQL ETL jobs put the information there and only it is necessary to take data from columns with no null
values. The next figure shows a small content of the first registers:
So, it is worthy checking if all the registers in the field have at least solely one data. This one data alone could change the model of the datalake / datawarehouse / reporting service input, may be 1% of the population. Due to this situation, the following procedure called check_field_null
generates this behaviour. The procedure uses PREPARE
, EXECUTE
and DEALLOCATE
statements to manage SQL dynamically. It stores the result of the COUNT
scalar function in QN
variable. Thus, it is possible to get the value of how many register are NULL
values.
use cinemashop;
drop procedure if exists check_field_null;
set delimiter //
create procedure check_field_null(col varchar(64), schemaname varchar(255), _
tablename varchar(255), out QN int)
BEGIN
SET @sSQL = concat('SELECT @N := COUNT(*) FROM ', schemaname, '.', _
tablename , ' WHERE (', col, ' <=> NULL);');
prepare stm from @sSQL;
execute stm;
set QN =@N;
deallocate prepare stm;
END
//
set delimiter ;
The following code illustrates the action of the above mentioned function. With respect to customer_id
field, it shows that there are values.
set @p = 0;
call check_field_null('customer_id', 'cinemashop', 'customer', @p);
select @p;
By another hand, if a field like gender
has only NULL
values, the expected result is checked by the counter:
set @p = 0;
call check_field_null('gender', 'cinemashop', 'customer', @p);
select @p;
Finally, the work consists of getting the field name of each column in the table. Today, it could be a data structure, but tomorrow, it could have changed on demand, and, of course, NULL
values may be presented there. This way uses CURSOR
technique inside a stored procedure called cur_cs_customer
. It only works if information_schema
is allowed to fetch, so, some permissions will be required there. Note that Count_Null
variable is used in the same way of our previous p
variable. Also, allcols
variable stores all fields from each column in the MYCOL
variable that do not have any NULL
values.
use cinemashop;
drop procedure if exists cur_cs_customer;
set delimiter //
create procedure cur_cs_customer(inout allcols varchar(255))
BEGIN
DECLARE Count_Null int default 0;
DECLARE initial INT DEFAULT 0;
DECLARE MYCOL char(64);
DECLARE ch_done INT DEFAULT 0;
DECLARE cs_cur1 CURSOR FOR SELECT C.COLUMN_NAME
FROM information_schema.COLUMNS C
WHERE C.TABLE_SCHEMA = 'cinemashop' _
AND C.TABLE_NAME ='customer';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET ch_done = true;
open cs_cur1;
read_cs_cur1:
LOOP
FETCH cs_cur1 INTO MYCOL;
IF (ch_done ) THEN
LEAVE read_cs_cur1;
END IF;
IF NOT isnull(MYCOL) THEN
call check_field_null(MYCOL, 'cinemashop', 'customer', Count_Null);
if Count_Null = 0 then
set initial = initial + 1;
if initial = 1 then
SET allcols = MYCOL;
else
SET allcols = concat( cast(allcols as char(255)), ',', MYCOL);
end if;
end if;
END IF;
END LOOP read_cs_cur1;
close cs_cur1;
select allcols;
END
//
Put all together, we can invoke with non interactive statements using @my_args
variable like a string
of all columns with no NULL
values.
set delimiter ;
call cur_cs_customer(@my_args);
select @my_args;
set @stm = concat('SELECT ', @my_args, ' FROM cinemashop.customer;');
PREPARE stmt1 FROM @stm;
execute stmt1;
deallocate prepare stmt1;
When the process runs, it gets the following output:
Of course, cur_cs_customer
procedure can be created with dynamic SQL for generic use, but we may check this adventure later. My purpose for this moment is only to get a primer solution.
Points of Interest
MySQL is not my favorite database engine, but, a lot of large websites have this RDBMS due to simplicity with the use of many programming languages from web development. Also, my job is focused on platforms integrations where the cookbook has some tasks of ETL, Datawarehouse, Data Pipelines, Data lakes, Data hubs and, of course, Databases for Data Mining & Data Science tasks.
History
- 9th April, 2020: Initial version