Introduction
Many times, there is a need to extract large sets of data from a database. In these cases, most DB tools often fail, either with out-of-memory exceptions, buffer problems, memory access violations or they simply stop responding.
Thus a need emerges to extract the data in the cleanest possible way - by spooling it directly to a text file, mostly as comma-separated values.
Unfortunately, one cannot simply spool an ordinary query - there is some output manipulation in the query itself that needs to be done, otherwise the result will be unusable.
The tool presented in this article is an automatization of the process of preparing a regular SQL query to a form where it would output delimiter-separated values that would be readable and transferrable to, i.e., Excel format.
Background
As already explained, in order to spool the results of a query from Oracle database, the query needs to be prepared in a special way for the output data to be usable.
There are several things to be taken into consideration: first, we need to set the proper session parameters, then we need to mark to SQL processing engine where the spool begins and where it ends, and lastly, to achieve the delimiter-separated output format, we need to manipulate the output of columns from the query in a certain way - all of which is explained in further text.
P.S. I am not saying that this is the only way that spool can work from an Oracle database, but after reading many articles and comments all around the Internet, the conclusion is definitely that the Oracle spool engine has some limitations which need to be overridden manually.
Necessary Session Parameters
SET PAGESIZE 0 LINESIZE 5000 TRIMSPOOL ON FEEDBACK OFF ECHO OFF TERMOUT OFF VERIFY OFF;
PAGESIZE 0
is used to set the page size to infinite - otherwise, the rows in the output would be split 20 by 20 rows (by default) between which there would be an empty line and header repeated on each page. LINESIZE 5000
means to set the size of one line to 5000 characters - otherwise, the line would break after 1024 characters (by default). You can put here any number >= 1 and <= 32767. TRIMSPOOL ON
means that anything that is shorter than the linesize will be trimmed - otherwise you would get whitespaces for each line until max linesize. FEEDBACK OFF
means that SQL engine feedback is turned off - otherwise, you get messages inside your output, such as 'xxx rows selected'. ECHO OFF
means the echo command will not work - otherwise, its output would also end up in the delimited file. TERMOUT OFF
is used to turn off the output of the SQL command itself - otherwise, there is the command also written out in the output file along with the results. VERIFY OFF
is used to turn off information about substitution off substitute variables inside the query - otherwise, the output would contain something like this at the top:
old: ... &as_of_date ...<br /> new: ... 31.08.2019 ...
Oracle Spool Engine Limitations and Workarounds
There is a possibility to create a delimited file from query results through spool automatically using Oracle engine - without the changes to the original query.
This would be done using a list of session parameters like this one:
set heading on
set colsep ;
set headsep on
set pagesize 0
set trimspool on
set linesize 5000
However, this will produce a huge amount of whitespaces, because the default behaviour is such that each column is output with as many characters as it is defined, so if a column is defined as VARCHAR(100)
, it will be 100 characters long in every row - i.e., if there is a value 'Hello world!
' which is 12 characters long, it will be followed by 88 whitespaces.
So, in order to avoid this, we are not using the colsep
parameter to achieve delimiting, but rather we are wrapping the original query inside a query that has only one column, which incorporates all the columns from the original query (now subquery) joined together in one big string and separated with the chosen delimiter, like this:
SELECT
col1 || ';' ||
col2 || ';' ||
col3 || ';' ||
...
FROM
(
SELECT
col1,
col2,
col3,
...
FROM
...
)
Also, the heading functionality often doesn't work quite as expected, so the heading is not showing; therefore, we also apply the same logic for the heading:
SELECT
'col1;' ||
'col2;' ||
'col3;' ||
...
FROM DUAL
UNION ALL
...
There is one special requirement from the original query in order for this to work: ALL THE COLUMNS FROM THE ORIGINAL QUERY NEED TO BE SINGLE-WORD OR HAVE AN ALIAS!!!
These are OK:
column_1
t.column_1
nvl(t.column_1, 0) col1
sum(t.column_1 + t.column_2) col12
These are not OK:
nvl(t.column_1, 0)
t.column_1 + t.column_2
case when column_1 is null then 0 else column_1 end
Resulting Query
SET PAGESIZE 0 LINESIZE 5000 TRIMSPOOL ON FEEDBACK OFF ECHO OFF TERMOUT OFF VERIFY OFF;
SPOOL "&spool_location";
SELECT
'col1;' ||
'col2;' ||
'col3;' ||
...
FROM DUAL
UNION ALL
SELECT
col1 || ';' ||
col2 || ';' ||
col3 || ';' ||
...
FROM
(
SELECT
col1,
col2,
col3,
...
FROM
...
)
Using the Code
HTML Layout
The tool consists of 2 buttons, input and output textarea
, and 2 textarea
s for line numbering.
The original query is written or pasted into the upper textarea
, and the result is produced by clicking the Convert
button and displayed in the lower textarea
(which is readonly
).
The labels below the textarea
s are displaying the current selection.
Clear button clears both upper and lower textarea
.
I will not go into details about the HTML code, I will just mention the event handlers that are defined on each element, and the IDs of the elements:
Convert button:
onclick="convert()"
Clear button:
onclick="clear_text()"
Input line numbering textarea:
id="sql_in_rownr"
Input textarea:
id="sql_in" onclick="selectionchanged(this)" onkeyup="keyup(this,event)"
onfocus="inputfocus()" onfocusout="resetinput()" oninput="input_changed()"
onscroll="scroll_changed(this)"
Output line numbering textarea:
id="sql_out_rownr"
Output textarea:
id="sql_out" onclick="selectionchanged(this)"
onkeyup="keyup(this,event)" onscroll="scroll_changed(this)"
JavaScript
Event Handlers
The following functions are mainly responsible for the HTML display, and are not related to parsing or conversion of the SQL code:
keyup(obj, e)
- This function is called on onkeyup
event and enables scrolling with keyboard. selectionchanged(obj)
- Refreshes the current selection display. Called from keyup
function and on onclick
event. initialize()
- Called on onload
event. Initializes line numbering. populate_rownr(obj, cntline)
- Function that fills rownr
textareas with line numbers. input_changed()
- Called by the oninput
event on input textarea
. It counts the lines of the changed input, compares to the number of lines before the change, and updates rownr
and scrolls accordingly. scroll_changed(obj)
- Called upon the onscroll
event on both input and output textarea
s. Makes sure that the scrolls are in sync between the line numbering and main textarea
. scrollsync(obj1, obj2)
- Called by scroll_changed
event handler. count_lines(txt)
- counts the lines in the string txt
. inputfocus()
- When the input textarea is in focus, this function deletes the default text. resetinput()
- When the input textarea loses focus, this function brings back the default text (if the textarea is empty). clear_text()
- Clears both input and output textareas, and resets selections and line numbering.
Line Numbering
The line numbering on this page is more thoroughly explained in my other article:
SQL Parsing and Conversion
The function convert()
is the main function for converting input SQL and creating output SQL code. It calls all the sub-functions, and handles display of error messages.
First, it checks whether the input SQL query is valid - this means that there must be only one command - if there is more than one command, it will report an error.
insql = trimFunc(sql_in.value);
res = checksql(insql);
Note that the function checksql
does a very robust check; it will go through the entire query char by char, and count the number of occurrences of character ';'.
function checksql(sql)
{
var inquote = false;
var comm_inline = false, comm_mult = false;
var cnt = 0;
for(var i=0; i<sql.length; i++)
{
c1 = sql.substring(i, i+1);
c2 = sql.substring(i, i+2);
if(c1=='\'' || c1=='\"') inquote=!inquote;
if(c2=='--') comm_inline=!comm_inline;
if(c1=='\n' && comm_inline) comm_inline = false;
if(c2=='/*' || c2=='*/') comm_mult = !comm_mult;
if(!inquote && !comm_inline && !comm_mult && c1==';' &&
trimFunc(sql.substring(i,sql.length-1))!='')
{
error_msg = 'INVALID SQL - Multiple commands detected!';
return -1;
}
}
return 0;
}
After this, function gather_cols
is called.
res = gather_cols();
This function first searches and isolates the SELECT
clause by calling the searchsql
function, and then parses the SELECT
clause char
by char
, to extract each column expression, and finally calls the get_alias
function on each of these expressions to extract the valid column name or alias.
Based on the return values from the functions searchsql
and get_alias
, it sets the error message and returns to the main function convert()
.
function gather_cols()
{
var sql = insql;
if(sql.substring(sql.length - 1,sql.length) == ';')
sql = sql.substring(0, sql.length - 1);
sql = searchsql(sql, 'SELECT', 'FROM');
if(sql == '' || sql == 'SELECT')
{
cols = '';
sql == '' ? error_msg = 'INVALID SQL - SELECT not found!' :
error_msg = 'INVALID SQL - FROM not found!';
return -1;
}
sql = trimFunc(sql.substring('SELECT'.length+1,sql.length));
cols = '';
var cnt_par = 0;
var inquote = false;
var expr = '';
var alias = '';
var c;
for(var i = 0; i < sql.length; i++)
{
if(i <= sql.length - 1) c = sql.substring(i, i + 1);
else c = ',';
if(c == '(') cnt_par++;
else if(c == ')') cnt_par--;
if(c == '\"') inquote = !inquote;
if(c == ',' && !inquote && cnt_par == 0)
{
expr = trimFunc(expr);
alias = get_alias(expr);
if(alias == '')
{
error_msg = 'INVALID SQL - Invalid or no alias found in expression' + expr;
return -1;
}
cols = cols + alias + ';' + '\n';
expr = '';
}
else
{
expr = expr + c;
}
}
if(cols != '')
{
cols = cols.substring(0, cols.length - (';' + '\n').length);
}
return 0;
}
The function searchsql
is designed to return everything between 2 keywords from the SQL query.
It parses the SQL expression char
by char
, first looking for the keyword1
, and then when it finds it, records all the characters in a string
variable until it hits the keyword2
.
(The function needs to track when it is in single/double quotes, parentheses or comments, as these should not be taken into consideration.)
If it doesn't find the first keyword, it will return an empty string
, and if it doesn't find the second keyword, it will return only the first keyword.
function searchsql(sql,keyw_from,keyw_to)
{
var found1 = false, found2 = false;
var tmpsql = sql.toUpperCase();
var tmpkeyw1 = keyw_from.toUpperCase();
var tmpkeyw2 = keyw_to.toUpperCase();
var retval = '';
var inquote = false;
var cnt_par = 0, comm_inline = false, comm_mult = false;
var lensql = tmpsql.length;
var skip = true;
var c1, c2, cw1, cw2, lookbehind, lookahead1, lookahead2;
for(var i=0; i<lensql; i++)
{
c1 = tmpsql.substring(i, i+1);
c2 = tmpsql.substring(i, i+2);
cw1 = tmpsql.substring(i, i+tmpkeyw1.length);
cw2 = tmpsql.substring(i, i+tmpkeyw2.length);
lookbehind = tmpsql.substring(i-1, i);
lookahead1 = tmpsql.substring(i+tmpkeyw1.length,
i+tmpkeyw1.length+1);
lookahead2 = tmpsql.substring(i+tmpkeyw2.length,
i+tmpkeyw2.length+1);
if(c1=='\'' || c1=='\"') inquote=!inquote;
if(c1=='(' || c1==')') c1=='(' ? cnt_par++ : cnt_par--;
if(c2=='--' && !comm_inline || c1=='\n' && comm_inline)
comm_inline = !comm_inline;
if(c2=='/*' && !comm_mult || c2=='*/' && comm_mult)
{
comm_mult = !comm_mult;
i++;
continue;
}
if(!comm_inline && !comm_mult)
{
if(cw1==tmpkeyw1 && trimFunc(lookbehind)==''
&& trimFunc(lookahead1)=='' && cnt_par == 0 &&
!inquote)
{
skip = false;
found1 = true;
}
if(!skip)
{
if(cw2==tmpkeyw2 && trimFunc(lookbehind)==''
&& trimFunc(lookahead2)=='' && cnt_par == 0
&& !inquote)
{
found2 = true;
break;
}
retval += sql.substring(i, i+1);
}
}
}
if(!found1) return '';
if(!found2) return keyw_from;
return retval;
}
The get_alias
function is a bit complicated, since there were various challenges that needed to be considered, such as:
- column name or alias might contain whitespaces, and is encapsulated with double quotes
- there could be
string
s in the expression, containing whitespaces, which makes it harder to detect separate words in the expression as a whole - there could be also whitespaces across the expression, that are not word separators, such as in front or after arithmetic operators
- there could also be whitespaces inside functions and commas inside functions, which would make it harder to detect actual words that could represent alias candidates
In order to overcome these challenges, I used substitutions and mapping dictionaries to make things easier and more readable before actually splitting the expression into words and checking the possible alias.
for(var i = 0; i < expr.length; i++)
{
c = expr.substring(i,i+1);
if(c=='\'') inquote1=!inquote1;
if(c=='\"') inquote2=!inquote2;
if(inquote1 || c=='\'') word1 += c;
if(inquote2 || c=='\"') word2 += c;
if(!inquote1 && word1.length > 0)
{
dict_quotes1['#' + i + '#'] = word1;
tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
word1='';
}
if(!inquote2 && word2.length > 0)
{
dict_quotes2['#' + i + '#'] = word2;
tmpexpr = tmpexpr.replace(word2, '#' + i + '#');
word2='';
}
}
word1='';
for(i = 0; i < tmpexpr.length; i++)
{
c = tmpexpr.substring(i,i+1);
if(c=='(') inpar++;
if(c==')') inpar--;
if(inpar>0 || c=='(' || c==')') word1 += c;
if(inpar==0 && word1.length > 0)
{
dict_par['#' & i & '#'] = word1;
tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
word1='';
}
}
Also, I needed to erase all the leftover whitespaces around arithmetic operators; these several replace commands with regular expressions did the trick:
tmpexpr = tmpexpr.replace(/ *\+ */g,'+').replace(/ *\- */g,'-').replace
(/ *\* */g,'*').replace(/ *\/ */g,'/');
Finally, when the expression is readable, the function will split the expression into "words", and we presume the last word should be alias.
The following rules need to be satisfied to have a valid alias:
- If the last word is a single-quote word, then it is not a valid alias.
- If the word contains dot but is not the only word in the expression, it is not a valid alias.
- If the word contains arithmetic operators, it is not an alias.
At last, the dot notation is eliminated, and if the alias is double-quoted, then the original word is taken out of the mapping dictionary.
The entire code for the function get_alias
:
function get_alias(expr)
{
var alias;
var dict_quotes1 = {};
var dict_quotes2 = {};
var dict_par = {};
var tmpexpr = expr;
var inquote1 = false, inquote2 = false;
var c, word1 = '', word2 = '';
var inpar = 0;
for(var i = 0; i < expr.length; i++)
{
c = expr.substring(i,i+1);
if(c=='\'') inquote1=!inquote1;
if(c=='\"') inquote2=!inquote2;
if(inquote1 || c=='\'') word1 += c;
if(inquote2 || c=='\"') word2 += c;
if(!inquote1 && word1.length > 0)
{
dict_quotes1['#' + i + '#'] = word1;
tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
word1='';
}
if(!inquote2 && word2.length > 0)
{
dict_quotes2['#' + i + '#'] = word2;
tmpexpr = tmpexpr.replace(word2, '#' + i + '#');
word2='';
}
}
word1='';
for(i = 0; i < tmpexpr.length; i++)
{
c = tmpexpr.substring(i,i+1);
if(c=='(') inpar++;
if(c==')') inpar--;
if(inpar>0 || c=='(' || c==')') word1 += c;
if(inpar==0 && word1.length > 0)
{
dict_par['#' & i & '#'] = word1;
tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
word1='';
}
}
tmpexpr = tmpexpr.replace(/ *\+ */g,'+').replace(/ *\- */g,'-').replace
(/ *\* */g,'*').replace(/ *\/ */g,'/');
alias = tmpexpr.split(' ')[tmpexpr.split(' ').length-1];
for(var key in dict_par)
if(alias==key) return '';
for(var key in dict_quotes1)
if(alias==key) return '';
if(alias.replace('.','').length != alias.length && tmpexpr.split(' ').length > 1
|| alias.replace('+','').replace('-','').replace('*','').replace('/','').length
!= alias.length) return '';
alias = alias.split('.')[alias.split('.').length-1];
for(var key in dict_quotes2)
if(alias==key) alias = dict_quotes2[key];
return trimFunc(alias);
}
Spool Query Output
Once all the columns are extracted, the only thing left to do is to arrange the output SQL, which is done by function generate_spool_sql
:
function generate_spool_sql()
{
var colsarr = cols.split('\n');
cols = '';
var colshead = '';
var col = '';
for(i = 0; i < colsarr.length; i++)
{
col = colsarr[i];
if(i < colsarr.length - 1) col = col.substring(0, col.length - 1);
colshead += '\'' + col.replace(/\"/g, '') + '\'||\';\'';
cols += col + '||\';\'';
if(i < colsarr.length - 1)
{
colshead += '||' + '\n';
cols += '||' + '\n';
}
}
// get rid of ; from the end of the insql
if(insql.substring(insql.length-1,insql.length)==';')
insql = insql.substring(0,insql.length-1);
// output of spool script
outsql = sql_set + '\n' + '\n' +
'SPOOL \"&spool_location\";' + '\n' + '\n' +
'SELECT' + '\n' +
colshead + '\nFROM DUAL\nUNION ALL\n' +
'SELECT\n' + cols + '\nFROM\n(\n' +
insql + '\n);\n\n' +
'SPOOL OFF;\n' +
'CLEAR BUFFER;';
}
History
- 17th August, 2019: Initial version