Optimized way to join tables residing across multiple databases in Oracle

Karthik Narahari

5.00/5 (1 vote)

1 Apr 2013CPOL4 min read

32.6K

Querying data from tables across multiple databases when we need data from all those tables in a single query.

Introduction

With a huge enterprise level datasystems these days, we might have some requirements where we would need to get data from different databases residing on different servers (sometimes residing at different locations), Here we might need to get data joining two tables residing on different databases

For example there exists a table_a under schema_a in database_a and table_b under schema_b in database_b with below columns:

table_a columns
Employee_id
Employee_name

table_b columns
Employee_id
Employee_email

Let's say we need to write a query to get top 9 employee details with there names and emails.

Here this might look simple, thinking that we can join two tables by refering tables with dblinks pointing to needed database just like below queries.

SQL

CREATE DATABASE LINK "ABDBLINK.com" CONNECT TO schema_a IDENTIFIED BY <PWD> USING database_b';

select * from table_a x,table_b@ABDBLINK.com y where x.id=y.id where rownum > 10;

Background

The above query works perfect to get data needed, but if we learn how the above query works, we will get to know how this would create problems when those tables have thousands of records and when we are expecting only a couple from those. When we run the query, actually oracle will fetch all the records in the table_B from database_B into database_a and then try to filter the matching records in both table_a and table_b So now we can imagine how much data traffic it would cause to move thousands of records across databases just to fetch a couple from those.

So the other alternative we can think about is, we need to get only the required records from table_b in database_b to database_a and then try joining both to get the columns needed from both the tables but unfortunately oracle allows only primitive data types (Number, Varchar..etc..) to return using a function across databases but here we need to get a table type (user defined type) in order to be able to join both.

Using The Code

Here I came with a thought of how we can use an old technique/ predefined procedure in oracle which is used to execute dynamic SQL queries called DBMS_SQL. The main advantage or reason to use this is DBMS_SQL is a procedure and as all know oracle supports RPC (Remote Procedure Calls) and this procedure can be executed across databases, Let us see how this works (Refer the link to know more on DBMS_SQL ).

Define a function in database_b which would execute a query to return only the needed records from table_b with the help of its input parameter which would be sent while calling this function from database_a

SQL

PROCEDURE DefineQueryCursor (c IN OUT NUMBER, p_emp_ids VARCHAR)
IS
   rc   NUMBER;
BEGIN
   c := DBMS_SQL.open_cursor;


   DBMS_SQL.parse (c,
                   'SELECT *
 FROM table_b Where employee_id in (' || p_emp_ids || ')',
                   DBMS_SQL.native);


   rc := DBMS_SQL.Execute (c);
END DefineQueryCursor;

We would be calling the above function from database_a to get only the required employee details. Observe the first parameter in function is a out param that sends out the cursor reference in database_b to database_a on which the query would get executed and the results would be saved from that point of cursor where as the second parameter is used to filter out the needed records from table_b. Here I would send a comma separated employee ids from database_a that are needed for us.

Declare a record type and table type variables in database_a with the columns needed from table_b to catch the records/result set we get from query declared in the above function.

SQL

TYPE rec_employee IS RECORD
   (
      EMPLOYEE_ID             VARCHAR (10),
      EMPLOYEE_MAIL     VARCHAR2 (4000)
   );

TYPE tab_employee IS TABLE OF rec_employee;

Define a function in database_a to call the function in database_b and then wrap the data in a table type, we use pipelined functions here that helps in returning table types, please go through the link for more information on pipelined functions

SQL

FUNCTION get_employee_info
      RETURN tab_employee
      PIPELINED
   AS
      v_result        rec_employee;
      cmfs            VARCHAR (4000);
      c               NUMBER;
      v_employee_id             DBMS_SQL@ABDBLINK.com.Varchar2_Table;
      v_employee_name   DBMS_SQL@ABDBLINK.com.Varchar2_Table;
      indx            NUMBER := 0;
      d               NUMBER;
      fetch_count     NUMBER := 10;
   BEGIN
   /*Below Query gives us top 9 employee ids each separated by a comma in a single record 
     which would be sent to function defined in database_b to get the details of only those 9 employees*/
      SELECT RTRIM (
                XMLAGG (XMLELEMENT (e, employee_id || '|')).EXTRACT (
                   '//text()'),
                ',')
        INTO emp_ids
        FROM (SELECT t1.employee_id
                FROM table_a t1
               WHERE rownum < 10);

      SELECT COUNT (t1.employee_id) + 5
        INTO fetch_count
        FROM table_a t1
       WHERE rownum < 10;

    /* Below we are calling the function defined in database_b with a OUT param c that holds 
       the cusrsor reference at database_b and a second param which have all the ids that we need*/
      DefineQueryCursor@ABDBLINK.com (c, emp_ids); 

    /* Below are array definations which we need to define based on the results of query in database_b,
    C is the cursor reference,1 is the column index in the query in database_b,
    employee_id is the column name corresponding to the column index 1 in the query result at database_b,
    fetch_count is a variable which says how many records does the procedure need to fetch at one stretch 
    here we declared it with the no of records we are expecting so as to get all the records in a single stretch */

      dbms_sql@ABDBLINK.com.define_array (c,
                                 1,
                                 employee_id,
                                 fetch_count,
                                 indx);

      dbms_sql@ABDBLINK.com.define_array (c,
                                 2,
                                 employee_email,
                                 fetch_count,
                                 indx);

     
      LOOP
         d := dbms_sql@ABDBLINK.com.fetch_rows (c);

     /*Below we save all the record values into a table variable that we already declared */
         dbms_sql@ABDBLINK.com.COLUMN_VALUE (c, 1, v_employee_id);
         dbms_sql@ABDBLINK.com.COLUMN_VALUE (c, 2, v_employee_name);
         
         EXIT WHEN d != fetch_count;
      END LOOP;

      IF v_employee_id.COUNT > 0
      THEN

    /* Below we save all the values in local table variable 
       and then pipe it to the record type variables we declared earlier */

         FOR i IN v_employee_id.FIRST .. v_employee_id.LAST
         LOOP
            v_result.EMPLOYEE_ID := v_employee_id (i);
            v_result.EMPLOYEE_MAIL := v_employee_name (i);
            
            PIPE ROW (v_result);
         END LOOP;
      END IF;

      dbms_sql@ABDBLINK.com.close_cursor (c);

      RETURN;
   EXCEPTION
      WHEN OTHERS
      THEN
         IF DBMS_SQL@ABDBLINK.com.IS_OPEN (c)
         THEN
            dbms_sql@ABDBLINK.com.close_cursor (c);
         END IF;

         raise_application_error (
            -20001,
            'func-getempdetails-dbms_sql: - ' || SQLERRM);
   END get_employee_info;

Please go through the comments for a detailed explanation at each step

Now we can get only the needed records from table_b in database_b by querying the above function as below in.............................. of course database_a
SQL
```
select * from table(get_employee_info());
```
So now we can easily join the both the above query and table_a and that would give us the required/expected result.
SQL
```
SELECT t1.employee_id, t1.employee_name, t2.employee_email
  FROM table_a t1, TABLE (get_employee_info ()) t2
 WHERE t1.employee_id = t2.employee_id
```

Points of Interest

I know all these looks a bit cumbersome or heavy to maintain but if we can organize all these functions in a single package in two databases, that would give us a lot of performance to the application using these tables, Below are some sample timestamps which we in our team experienced (though table/database names are not real, all the times are facts which we experienced)

Time elapsed When used the query as mentioned in my introduction is 3Min 10 Sec, check the below explain plan

Time elapsed when used the query using DBMS_SQL is 900ms, check the below explain plan

we can observe in the above explain plans that though the cost is less in both the cases, Bytes transferred is very high in the first case which was greatly reduced with out approach.

History

Version created on 4/1/2013.
Added images to show the performance on 4/2/2013

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)