Problem statement
In a recent
assignment I came across an issue where we needed to fetch changed records
from the database. The data is updated regularly and the application has
to fetch the modified records at set intervals and send them to the requester.
The initial
design was to add a timestamp column (named modification time) to the table and
use it to know when a record has been modified or created. The idea was to save
the modification time for the table in another table. In each fetch query,
fetch the records where the timestamp in the modification column is greater than the
last saved timestamp. After the query, fetch the max modification time in the
second table for use in a subsequent fetch query.
This looks
like a simple and easy design for this scenario. But soon we discovered a major issue with
this approach. Here the records are being updated by concurrent transactions.
Different transactions can run for different time durations. This could cause
the records committed late to having modification time before the time committed
by some other transaction. This in turn would lead to some records getting
missed in the fetch query.
The following
scenario will help in understanding the issue in detail (the time taken here are for illustration purpose only, actual times
might be quite less):
- There is table ‘A’. A transaction
T1 starts at 10:00.
- T1 updates record with Primary Key = 1 in table A at 10:05 and continues with other table updates.
- T1 commits at 10:20. So record with
Primary Key = 1 in table A will get updated with time
10:05.
- There was another transaction T2 which starts at 10:10 and updates record with Primary Key = 2 in table A at 10:12.
- T2 commits at 10:14.
- Assume the subscription
notification for this ran at 9:50 last time.
Now the
subscription notification will run at 10:15 and it will search for records with
time > 9:50. It will find a record with Primary Key = 2. The subscription notification read time for table A will be
reset to 10:12 from 9:50.
When T1 commits at 10:20, the subscription notification will run again. This time it
will look for records with time > 10:12. Since records with Primary Key = 1 has time as 10:05 it will
not be picked ever.
The issue here is the commit time is different from the record time and with
multiple transactions, other transactions update records with a later time,
which may result in some records never being picked up while reading. Hence we need
to design a solution in such a way in which we can pick records in the
order of commit performed on the database.
ORA_ROWSCN pseudo column
ORA_ROWSCN
is
based on the internal Oracle system clock, the SCN (System Change Number).
Every time you commit in Oracle, the SCN advances. This pseudo column is useful
for determining approximately when a row was last updated. It is not absolutely
precise, because Oracle tracks SCNs by transaction committed for the block in
which the row resides.
By default, ORA_ROWSCN
is stored at the block level, not at the row level. That is, by default many
rows on a single block will share the same ORA_ROWSCN
value. If you update a
row on a block with 50 other rows, then they will all have their ORA_ROWSCN
advanced as well.
We can obtain a
more fine-grained approximation of the SCN at row level by creating tables
with row-level dependency tracking. We need to use ROWDEPENDENCIES
while
creating a table to store ORA_ROWSCN
at the row level. It increases the size of
each row by 6 bytes.
The key feature of ORA_ROWSCN
that is useful for us is that it is incremented when
the transaction is committed. So it can be used in our solution to find out the
rows committed after the last read.
Solution details
Below are the high level details of the solution to find
the delta records using this pseudo column.
- Recreate
all the tables with
ROWDEPENDENCIES
enabled. This will enable ORA_ROWSCN
to be
stored at row level. - In
each fetch query we need to fetch the
ORA_ROWSCN
as well along with the other
data. - Find the maximum value of
ORA_ROWSCN
for each
table in the above query. Store this value for that fetch query. - In subsequent fetch queries fetch the records which have
ORA_ROWSCN
> the value stored in
the above step.
Query_Details (sample table structure)
Column Name | Description |
Query id | The query id. |
Table Name | Name of table. |
Last Read Number | The max ORA_ROWSCN number from last read.
|
The following example illustrates how this solution will fetch the correct delta records in
a concurrent transactions scenario.
- Request 1 initiates Transaction 1 (T1)
- Start time: 10:00
- Updates table A record with Primary Key = 1 at 10:05
- Continues with updating other tables of the
transaction…
- Request
2 initiates Transaction 2 (T2)
- Start time: 10:10
- Updates table A record with Primary Key =2 at 10:12
- Commits
ORA_ROWSCN
is set to say 1000.
- Fetch query runs at 10:15
- Query with ID 1 fetches data from table A
- The last read number for (max.
ORA_ROWSCN
stored for query
ID) table A
becomes 1000
- T1 commits at 10:20
- Table A record with Primary Key = 1 updated with 10:15
ORA_ROWSCN
is set to say 1020.
- Fetch query runs at 10:25
- Query with ID 1 fetches data from table A with
ORA_ROWSCN
> 1000 - This will fetch the record committed in step 4
References