Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

Prepare (Oracle) SQL for Spooling

5.00/5 (1 vote)
16 Aug 2019CPOL8 min read 10.3K   67  
An HTML tool written in JavaScript to convert a regular Oracle SQL query into a spooling script

Introduction

Many times, there is a need to extract large sets of data from a database. In these cases, most DB tools often fail, either with out-of-memory exceptions, buffer problems, memory access violations or they simply stop responding.

Thus a need emerges to extract the data in the cleanest possible way - by spooling it directly to a text file, mostly as comma-separated values.

Unfortunately, one cannot simply spool an ordinary query - there is some output manipulation in the query itself that needs to be done, otherwise the result will be unusable.

The tool presented in this article is an automatization of the process of preparing a regular SQL query to a form where it would output delimiter-separated values that would be readable and transferrable to, i.e., Excel format.

Background

As already explained, in order to spool the results of a query from Oracle database, the query needs to be prepared in a special way for the output data to be usable.

There are several things to be taken into consideration: first, we need to set the proper session parameters, then we need to mark to SQL processing engine where the spool begins and where it ends, and lastly, to achieve the delimiter-separated output format, we need to manipulate the output of columns from the query in a certain way - all of which is explained in further text.

P.S. I am not saying that this is the only way that spool can work from an Oracle database, but after reading many articles and comments all around the Internet, the conclusion is definitely that the Oracle spool engine has some limitations which need to be overridden manually.

Necessary Session Parameters

SQL
SET PAGESIZE 0 LINESIZE 5000 TRIMSPOOL ON FEEDBACK OFF ECHO OFF TERMOUT OFF VERIFY OFF;
  • PAGESIZE 0 is used to set the page size to infinite - otherwise, the rows in the output would be split 20 by 20 rows (by default) between which there would be an empty line and header repeated on each page.
  • LINESIZE 5000 means to set the size of one line to 5000 characters - otherwise, the line would break after 1024 characters (by default). You can put here any number >= 1 and <= 32767.
  • TRIMSPOOL ON means that anything that is shorter than the linesize will be trimmed - otherwise you would get whitespaces for each line until max linesize.
  • FEEDBACK OFF means that SQL engine feedback is turned off - otherwise, you get messages inside your output, such as 'xxx rows selected'.
  • ECHO OFF means the echo command will not work - otherwise, its output would also end up in the delimited file.
  • TERMOUT OFF is used to turn off the output of the SQL command itself - otherwise, there is the command also written out in the output file along with the results.
  • VERIFY OFF is used to turn off information about substitution off substitute variables inside the query - otherwise, the output would contain something like this at the top: 
    old: ... &as_of_date ...<br /> new: ... 31.08.2019 ...

Oracle Spool Engine Limitations and Workarounds

There is a possibility to create a delimited file from query results through spool automatically using Oracle engine - without the changes to the original query.

This would be done using a list of session parameters like this one:

SQL
set heading on
set colsep ;
set headsep on
set pagesize 0
set trimspool on
set linesize 5000

However, this will produce a huge amount of whitespaces, because the default behaviour is such that each column is output with as many characters as it is defined, so if a column is defined as VARCHAR(100), it will be 100 characters long in every row - i.e., if there is a value 'Hello world!' which is 12 characters long, it will be followed by 88 whitespaces.

So, in order to avoid this, we are not using the colsep parameter to achieve delimiting, but rather we are wrapping the original query inside a query that has only one column, which incorporates all the columns from the original query (now subquery) joined together in one big string and separated with the chosen delimiter, like this:

SQL
SELECT
col1 || ';' ||
col2 || ';' ||
col3  || ';' ||
...
FROM
(
SELECT
col1,
col2,
col3,
...
FROM
...
)

Also, the heading functionality often doesn't work quite as expected, so the heading is not showing; therefore, we also apply the same logic for the heading:

SQL
SELECT
'col1;' ||
'col2;' ||
'col3;' ||
...
FROM DUAL
UNION ALL
...

There is one special requirement from the original query in order for this to work: ALL THE COLUMNS FROM THE ORIGINAL QUERY NEED TO BE SINGLE-WORD OR HAVE AN ALIAS!!!

These are OK:

SQL
column_1

t.column_1

nvl(t.column_1, 0) col1

sum(t.column_1 + t.column_2) col12

These are not OK:

SQL
nvl(t.column_1, 0)

t.column_1 + t.column_2

case when column_1 is null then 0 else column_1 end

Resulting Query

SQL
SET PAGESIZE 0 LINESIZE 5000 TRIMSPOOL ON FEEDBACK OFF ECHO OFF  TERMOUT OFF VERIFY OFF;

SPOOL "&spool_location";

SELECT
'col1;' ||
'col2;' ||
'col3;' ||
...
FROM DUAL
UNION ALL
SELECT
col1 || ';' ||
col2 || ';' ||
col3  || ';' ||
...
FROM
(
-- original query
SELECT
col1,
col2,
col3,
...
FROM
...
-- end of original query
)

Using the Code

HTML Layout

Image 1

The tool consists of 2 buttons, input and output textarea, and 2 textareas for line numbering.

The original query is written or pasted into the upper textarea, and the result is produced by clicking the Convert button and displayed in the lower textarea (which is readonly).

The labels below the textareas are displaying the current selection.

Clear button clears both upper and lower textarea.

I will not go into details about the HTML code, I will just mention the event handlers that are defined on each element, and the IDs of the elements:

Convert button:

HTML
onclick="convert()"

Clear button:

HTML
onclick="clear_text()"

Input line numbering textarea:

HTML
id="sql_in_rownr"

Input textarea:

HTML
id="sql_in" onclick="selectionchanged(this)" onkeyup="keyup(this,event)" 
onfocus="inputfocus()" onfocusout="resetinput()" oninput="input_changed()" 
onscroll="scroll_changed(this)"

Output line numbering textarea:

HTML
id="sql_out_rownr"

Output textarea:

HTML
id="sql_out" onclick="selectionchanged(this)" 
    onkeyup="keyup(this,event)" onscroll="scroll_changed(this)"

JavaScript

Event Handlers

The following functions are mainly responsible for the HTML display, and are not related to parsing or conversion of the SQL code:

  • keyup(obj, e) - This function is called on onkeyup event and enables scrolling with keyboard.
  • selectionchanged(obj) - Refreshes the current selection display. Called from keyup function and on onclick event.
  • initialize() - Called on onload event. Initializes line numbering.
  • populate_rownr(obj, cntline) - Function that fills rownr textareas with line numbers.
  • input_changed() - Called by the oninput event on input textarea. It counts the lines of the changed input, compares to the number of lines before the change, and updates rownr and scrolls accordingly.
  • scroll_changed(obj) - Called upon the onscroll event on both input and output textareas. Makes sure that the scrolls are in sync between the line numbering and main textarea.
  • scrollsync(obj1, obj2) - Called by scroll_changed event handler.
  • count_lines(txt) - counts the lines in the string txt.
  • inputfocus() - When the input textarea is in focus, this function deletes the default text.
  • resetinput() - When the input textarea loses focus, this function brings back the default text (if the textarea is empty).
  • clear_text() - Clears both input and output textareas, and resets selections and line numbering.

Line Numbering

The line numbering on this page is more thoroughly explained in my other article:

SQL Parsing and Conversion

The function convert() is the main function for converting input SQL and creating output SQL code. It calls all the sub-functions, and handles display of error messages.

First, it checks whether the input SQL query is valid - this means that there must be only one command - if there is more than one command, it will report an error.

JavaScript
insql = trimFunc(sql_in.value);
res = checksql(insql);

Note that the function checksql does a very robust check; it will go through the entire query char by char, and count the number of occurrences of character ';'.

JavaScript
function checksql(sql)
{
    // check if the sql is a single query

    var inquote = false;
    var comm_inline = false, comm_mult = false;
    var cnt = 0;
    for(var i=0; i<sql.length; i++)
    {
        c1 = sql.substring(i, i+1);
        c2 = sql.substring(i, i+2);

        if(c1=='\'' || c1=='\"') inquote=!inquote;
        if(c2=='--') comm_inline=!comm_inline;
        if(c1=='\n' && comm_inline) comm_inline = false;
        if(c2=='/*' || c2=='*/') comm_mult = !comm_mult;

        if(!inquote && !comm_inline && !comm_mult && c1==';' &&
                        trimFunc(sql.substring(i,sql.length-1))!='')
        {
            error_msg = 'INVALID SQL - Multiple commands detected!';
            return -1;
        }
    }
    return 0;
}

After this, function gather_cols is called.

JavaScript
res = gather_cols();

This function first searches and isolates the SELECT clause by calling the searchsql function, and then parses the SELECT clause char by char, to extract each column expression, and finally calls the get_alias function on each of these expressions to extract the valid column name or alias.

Based on the return values from the functions searchsql and get_alias, it sets the error message and returns to the main function convert().

JavaScript
function gather_cols()
{
    // runs through input sql char by char
    // identifies SELECT clause by calling searchsql
    // isolates expressions from the SELECT clause,
    // and from each expression extracts alias by calling get_alias

    var sql = insql;

    // eliminate ; in the end of SQL
    if(sql.substring(sql.length - 1,sql.length) == ';')
           sql = sql.substring(0, sql.length - 1);

    // search for SELECT clause
    sql = searchsql(sql, 'SELECT', 'FROM');
    if(sql == '' || sql == 'SELECT')
    {
        cols = '';
        sql == '' ? error_msg = 'INVALID SQL - SELECT not found!' :
                    error_msg = 'INVALID SQL - FROM not found!';
        return -1;
    }
    sql = trimFunc(sql.substring('SELECT'.length+1,sql.length));

    cols = '';
    var cnt_par = 0;
    var inquote = false;
    var expr = '';
    var alias = '';
    var c;
    // run through the query char by char
    // identify each column expression - search for ,
    // ignore if you are inside a comment or parentheses or quotes
    // for each expr call get_alias
    for(var i = 0; i < sql.length; i++)
    {
        if(i <= sql.length - 1) c = sql.substring(i, i + 1);
        else c = ',';
        // check for parentheses
        if(c == '(') cnt_par++;
        else if(c == ')') cnt_par--;
        // check for quotes
        if(c == '\"') inquote = !inquote;
        if(c == ',' && !inquote && cnt_par == 0) // found end of expression
        {
            expr = trimFunc(expr);
            // get alias from expr
            alias = get_alias(expr);
            if(alias == '')
            {
                error_msg = 'INVALID SQL - Invalid or no alias found in expression' + expr;
                return -1;
            }
            cols = cols + alias + ';' + '\n';
            expr = '';
        }
        else
        {
            expr = expr + c;
        }
    }
    if(cols != '')
    {
        cols = cols.substring(0, cols.length - (';' + '\n').length);
    }
    return 0;
}

The function searchsql is designed to return everything between 2 keywords from the SQL query.

It parses the SQL expression char by char, first looking for the keyword1, and then when it finds it, records all the characters in a string variable until it hits the keyword2.

(The function needs to track when it is in single/double quotes, parentheses or comments, as these should not be taken into consideration.)

If it doesn't find the first keyword, it will return an empty string, and if it doesn't find the second keyword, it will return only the first keyword.

JavaScript
function searchsql(sql,keyw_from,keyw_to)
{
    /*
    this function will return the substring of sql string
    from the first occurrence of keyw_from (not in parentheses or comment or quote)
    until the first following occurrence of keyw_to
    (also not in parentheses or comment or quote)
    + it will erase all the comments from sql
    */

    var found1 = false, found2 = false;
    var tmpsql = sql.toUpperCase();
    var tmpkeyw1 = keyw_from.toUpperCase();
    var tmpkeyw2 = keyw_to.toUpperCase();
    var retval = '';
    // search for first keyword keyw outside of parentheses,
    // comments or quotes (eliminate WITH clause)
    var inquote = false;
    var cnt_par = 0, comm_inline = false, comm_mult = false;
    var lensql = tmpsql.length;
    var skip = true;
    var c1, c2, cw1, cw2, lookbehind, lookahead1, lookahead2;
    // read sql char by char
    // skip everything until first occurrence of keyw_from found
    // (ignore comments, parentheses and quotes)
    // after keyw_from found, stop skipping until keyw_to reached, then return
    for(var i=0; i<lensql; i++)
    {
        c1 = tmpsql.substring(i, i+1); // current char
        c2 = tmpsql.substring(i, i+2); // current and next char

        cw1 = tmpsql.substring(i, i+tmpkeyw1.length); // chars in length of keyw_from
        cw2 = tmpsql.substring(i, i+tmpkeyw2.length); // chars in length of keyw_to

        lookbehind = tmpsql.substring(i-1, i); // prev character
        lookahead1 = tmpsql.substring(i+tmpkeyw1.length,
                                      i+tmpkeyw1.length+1); // char following cw1
        lookahead2 = tmpsql.substring(i+tmpkeyw2.length,
                                      i+tmpkeyw2.length+1); // char following cw2

        if(c1=='\'' || c1=='\"') inquote=!inquote; // check quotes
        if(c1=='(' || c1==')') c1=='(' ? cnt_par++ : cnt_par--; // check parentheses
        if(c2=='--' && !comm_inline || c1=='\n' && comm_inline)
                        comm_inline = !comm_inline;          // check inline comment
        if(c2=='/*' && !comm_mult || c2=='*/' && comm_mult)  // check multiline comment
        {
            comm_mult = !comm_mult;
            i++;
            continue;
        }

        if(!comm_inline && !comm_mult) // ignore comments
        {
            // keyw1 found - stop skipping chars
            if(cw1==tmpkeyw1 && trimFunc(lookbehind)==''
                 && trimFunc(lookahead1)=='' && cnt_par == 0 &&
                 !inquote) // 1st keyw found - it has to be surrounded by blank chars,
                           // and not in parentheses or quotes
            {
                skip = false;
                found1 = true;
            }
            if(!skip)
            {
                if(cw2==tmpkeyw2 && trimFunc(lookbehind)==''
                       && trimFunc(lookahead2)=='' && cnt_par == 0
                       && !inquote) // 2nd keyw found - it has to be surrounded
                                    // by blank chars, and not in parentheses or quotes
                {
                    found2 = true;
                    break;
                }
                retval += sql.substring(i, i+1);
            }
        }
    }
    if(!found1) return ''; // first keyw not found
    if(!found2) return keyw_from; // found only the first keyw -- second keyw not found
    return retval;
}

The get_alias function is a bit complicated, since there were various challenges that needed to be considered, such as:

  • column name or alias might contain whitespaces, and is encapsulated with double quotes
  • there could be strings in the expression, containing whitespaces, which makes it harder to detect separate words in the expression as a whole
  • there could be also whitespaces across the expression, that are not word separators, such as in front or after arithmetic operators
  • there could also be whitespaces inside functions and commas inside functions, which would make it harder to detect actual words that could represent alias candidates

In order to overcome these challenges, I used substitutions and mapping dictionaries to make things easier and more readable before actually splitting the expression into words and checking the possible alias.

JavaScript
//map-substitute single and double quoted words
for(var i = 0; i < expr.length; i++)
{
    c = expr.substring(i,i+1);
    if(c=='\'')    inquote1=!inquote1;
    if(c=='\"')    inquote2=!inquote2;
    if(inquote1 || c=='\'') word1 += c;
    if(inquote2 || c=='\"') word2 += c;
    if(!inquote1 && word1.length > 0)
    {
        dict_quotes1['#' + i + '#'] = word1;
        tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
        word1='';
    }
    if(!inquote2 && word2.length > 0)
    {
        dict_quotes2['#' + i + '#'] = word2;
        tmpexpr = tmpexpr.replace(word2, '#' + i + '#');
        word2='';
    }
}
// map-substitute expressions in parentheses
word1='';
for(i = 0; i < tmpexpr.length; i++)
{
    c = tmpexpr.substring(i,i+1);
    if(c=='(') inpar++;
    if(c==')') inpar--;
    if(inpar>0 || c=='(' || c==')') word1 += c;
    if(inpar==0 && word1.length > 0)
    {
        dict_par['#' & i & '#'] = word1;
        tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
        word1='';
    }
}

Also, I needed to erase all the leftover whitespaces around arithmetic operators; these several replace commands with regular expressions did the trick:

JavaScript
tmpexpr = tmpexpr.replace(/ *\+ */g,'+').replace(/ *\- */g,'-').replace
          (/ *\* */g,'*').replace(/ *\/ */g,'/');

Finally, when the expression is readable, the function will split the expression into "words", and we presume the last word should be alias.

The following rules need to be satisfied to have a valid alias:

  • If the last word is a single-quote word, then it is not a valid alias.
  • If the word contains dot but is not the only word in the expression, it is not a valid alias.
  • If the word contains arithmetic operators, it is not an alias.

At last, the dot notation is eliminated, and if the alias is double-quoted, then the original word is taken out of the mapping dictionary.

The entire code for the function get_alias:

JavaScript
function get_alias(expr)
{
    // function to extract alias from a column expression from SELECT clause
    // first it searches for all single and double-quoted expressions
    // and replaces them with some mapping
    // then it does the same thing with expressions in parentheses
    // --> this is done to clean out the expression, to be more readable
    // next it deletes all the white spaces around arithmetic operators -
    // this will be needed since the next step will be dividing the
    // expression into words --> so we want to see an arithmetic expression as one word
    // next - take the last word and presume it to be alias - check if this word
    // qualifies as alias, if OK, return it, else return blank (error)

    var alias;

    var dict_quotes1 = {};
    var dict_quotes2 = {};
    var dict_par = {};

    var tmpexpr = expr;

    var inquote1 = false, inquote2 = false;
    var c, word1 = '', word2 = '';

    var inpar = 0;

    //map-substitute single and double quoted words
    for(var i = 0; i < expr.length; i++)
    {
        c = expr.substring(i,i+1);
        if(c=='\'')    inquote1=!inquote1;
        if(c=='\"')    inquote2=!inquote2;
        if(inquote1 || c=='\'') word1 += c;
        if(inquote2 || c=='\"') word2 += c;
        if(!inquote1 && word1.length > 0)
        {
            dict_quotes1['#' + i + '#'] = word1;
            tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
            word1='';
        }
        if(!inquote2 && word2.length > 0)
        {
            dict_quotes2['#' + i + '#'] = word2;
            tmpexpr = tmpexpr.replace(word2, '#' + i + '#');
            word2='';
        }
    }
    // map-substitute expressions in parentheses
    word1='';
    for(i = 0; i < tmpexpr.length; i++)
    {
        c = tmpexpr.substring(i,i+1);
        if(c=='(') inpar++;
        if(c==')') inpar--;
        if(inpar>0 || c=='(' || c==')') word1 += c;
        if(inpar==0 && word1.length > 0)
        {
            dict_par['#' & i & '#'] = word1;
            tmpexpr = tmpexpr.replace(word1, '#' + i + '#');
            word1='';
        }
    }
    // delete all the spaces around arithmetic operators
    tmpexpr = tmpexpr.replace(/ *\+ */g,'+').replace(/ *\- */g,'-').replace
              (/ *\* */g,'*').replace(/ *\/ */g,'/');
    // presume last word is alias
    alias = tmpexpr.split(' ')[tmpexpr.split(' ').length-1];
    // if the last word is in parentheses or in single quotes, this is not a valid alias
    for(var key in dict_par)
        if(alias==key) return '';
    for(var key in dict_quotes1)
        if(alias==key) return '';
    // if the alias contains table alias but is not the only word in the expression,
    // or it contains arithmetic operator, this is not a valid alias
    if(alias.replace('.','').length != alias.length && tmpexpr.split(' ').length > 1
    || alias.replace('+','').replace('-','').replace('*','').replace('/','').length
    != alias.length) return '';
    // remove any table alias
    alias = alias.split('.')[alias.split('.').length-1];
    // if it is double-qouted alias, get the original expression from the mapping dictionary
    for(var key in dict_quotes2)
        if(alias==key) alias = dict_quotes2[key];
    return trimFunc(alias);
}

Spool Query Output

Once all the columns are extracted, the only thing left to do is to arrange the output SQL, which is done by function generate_spool_sql:

JavaScript
function generate_spool_sql()
{
    // generate output (spool) sql using the previously populated column list
    // saved in variable cols
    var colsarr = cols.split('\n');
    cols = '';
    var colshead = '';
    var col = '';
    for(i = 0; i < colsarr.length; i++)
    {
        col = colsarr[i];
        if(i < colsarr.length - 1) col = col.substring(0, col.length - 1);
        colshead += '\'' + col.replace(/\"/g, '') + '\'||\';\'';
        cols += col + '||\';\'';
        if(i < colsarr.length - 1)
        {
            colshead += '||' + '\n';
            cols += '||' + '\n';
        }
    }
    // get rid of ; from the end of the insql
    if(insql.substring(insql.length-1,insql.length)==';')
        insql = insql.substring(0,insql.length-1);
    // output of spool script
    outsql = sql_set + '\n' + '\n' +
        'SPOOL \"&spool_location\";'  + '\n' + '\n' +
        'SELECT' + '\n' +
        colshead + '\nFROM DUAL\nUNION ALL\n' +
        'SELECT\n' + cols + '\nFROM\n(\n' +
        insql + '\n);\n\n' +
        'SPOOL OFF;\n' +
        'CLEAR BUFFER;';
}

History

  • 17th August, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)