How to Update a Table with a columnstore Index

rusanu

5.00/5 (2 votes)

15 Jul 2011Apache2 min read

25K

A technique to make tables with columnstore indexes updateable

In my previous article, How to use columnstore indexes in SQL Server, we saw how to create a columnstore index on a table and how certain queries can significantly reduce the IO needed and thus increase performance by leveraging this new feature. But once a columnstore index is added to a table, the table becomes read-only as it cannot be updated. Trying to insert a new row in the table will result in an error:

SQL

insert into sales ([date],itemid, price, quantity) values ('20110713', 1,1.0,1);

The error message:

Msg 35330, Level 15, State 1, Line 1

INSERT statement failed because data cannot be updated in a table 
with a columnstore index. Consider disabling the columnstore index before 
issuing the INSERT statement, then rebuilding the columnstore index 
after INSERT is complete.

The error message recommends a ‘workaround’, but rebuilding the columnstore index for updates may be prohibitively expensive. For the DW and BI scenarios that columnstore indexes are targeting, there is a much better solution: use table partitioning. With SQL Server 11, the limit of maximum 1000 partitions per table has been increased to 15000 partitions, and with this new limit, one can configure the ETL process to update every day into a new partition and still retain many many years of data. The ETL process can upload the daily data into a staging table, create a columnstore index on the staging table, then use the fast ALTER TABLE … SWITCH operation to ‘switch in’ the new data. Using the very same example as in my previous article, let's create a staging table with an identical structure as the sales facts table:

SQL

create table sales_staging (
    [id] int not null identity (1000000,1),
    [date] date not null,
    itemid smallint not null,
    price money not null,
    quantity numeric(18,4) not null,
    constraint check_date check ([date] = '20110716')) on [PRIMARY];
go

create unique clustered index cdx_sales_staging_date_id
   on sales_staging ([date], [id]) on [PRIMARY];
go

Note how the staging table has a constraint check that enforces the date to be the valid date for the next partition to be switched in. Now let's populate the staging table with some more dummy sales facts:

SQL

set nocount on
go

declare @i int = 0;
begin transaction;
while @i < 250000
begin
    insert into sales_staging ([date], itemid, price, quantity)
        values ('20110716', rand()*10000, rand()*100 + 100, rand()* 10.000+1);
    set @i += 1;
    if @i % 10000 = 0
    begin
        raiserror (N'Inserted %d', 0, 1, @i);
        commit;
        begin tran;
    end
end
commit;
go

Now that our fake ETL process has finished preparing the last day's sales data into a staging table, let's add a columnstore index identical with the one on the real sales table:

SQL

create columnstore index cs_sales_price_staging
          on sales_staging ([date], itemid, price, quantity);
go

OK, our staging table is complete so let's switch it in into the 'big' sales table:

SQL

alter partition scheme ps next used [PRIMARY];
alter partition function pf() split range ('20110717');go

alter table sales_staging switch to sales partition $PARTITION.PF('20110716');
go

That's it! We've just updated our sales table with the sales fact for the last day, despite the fact that it contained a columnstore index, without disabling the columnstore index. The increased partitions count supported in SQL Server 11 combined with the fact that aligned columnstore indexes are supported for fast partition switch operations makes tables with columnstore indexes updatable in practice, if the ETL process uses a staging table and the ETL schedule matches the partitioning scheme.

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0