Some tips and tricks for azure table storage

Purbasha Ghosh

4.91/5 (4 votes)

21 Oct 2013CPOL3 min read

16.2K

Save enities to support faster CURD operations

Save judiciously

Since we have seen in the previous articles that, CRUD operations on the entities follow slightly different rules than the ubiquitous sql tables and rows, we should save the data in a way that enables us to do the CRUD operations faster.

Like, if I am storing an e-commerce business’ data, say Myntra, where in, each item for sale is categorised based on target audience, then on commodity group that it belongs to and then on the genre. Something like this:

So, at the very first thought think of our entities like this:

Where we think making item id as the primary identifier makes our lives easy.

Which is correct from a DBA’s perspective, the item id can indeed identify each record uniquely.

However, let’s put a little bit more thought into how these sites are being used by its audience.

Each customer is in search of a specific category of commodities and explores possibilities within them. Well most of the time when the person is sure what he/she wants to buy J

In that case, most of the fetching of records will happen based on category and not item.

So, we could modify our entity as below:

Here we are making category as the partition key, so that, all items in the same category stay together and retrieving them is quick.

Another advantage that comes as a by-product of this entity structure is, whenever the business decides to stop a category, deletion is also quick, since everything is on same partition, batch operations are valid.

Also, notice, I have removed the “Vendor” property from the entity. We will store the entities in tables named after vendor ids, like tables named : HiDesign20114576Items.

This will enable us to mark all the items to delete by any vendor whenever any vendor drops contract and the business doesn’t want to sell their products anymore (which is quite a frequent situation). Deleting a table is a less than a second’s work and a single url request. Whereas, if we had kept the previous entity structure, we would have to loop through each partition causing the whole table scan to find which items match the vendor and then delete them. That would be pretty expensive.

Another example is the entries in WADLogsTable. The entries keep on increasing exponentially, however, it takes time for dev to be sure that the logs are unimportant now and could be deleted.

We can schedule a monthly job to archive the logs in tables named by months like that shown in the below screen capture, and delete those logs from WADLogsTable.

This arms us with the possibility to delete the logs with a single request in a fraction of seconds, whenever the person in charge feels they are not required anymore.

  private void Background_PurgeAndArchive(object sender, DoWorkEventArgs e)
        {
            List<Object> genericlist = e.Argument as List<Object>;
            SelectedTableName = genericlist[0].ToString();
            DateTime StartDate = Convert.ToDateTime(genericlist[1]);
            DateTime EndDate = Convert.ToDateTime(genericlist[2]);
 
            TableQuery="PartitionKey ge '0"+StartDate.Ticks+"' and PartitionKey le '0"+EndDate.Ticks+"'";
            if (!OpenAccount()) return;
 
            try
            {
                CloudTableClient tableClient = CloudStorageAccount.CreateCloudTableClient();
                TableServiceContext tableServiceContext = CreateTableServiceContext(tableClient);
 
                IEnumerable<GenericEntity> entities = null;
                entities = (from entity in tableServiceContext.CreateQuery<GenericEntity>(SelectedTableName) select entity);
 
                if (!String.IsNullOrEmpty(TableQuery))
                {
                    entities = (from entity in tableServiceContext.CreateQuery<GenericEntity>(SelectedTableName)
                                    .AddQueryOption("$filter", TableQuery)
                                select entity);
                }
 
                var partitions = entities.Distinct(new GenericEntityComparer()).Select(p => p.PartitionKey);
                IEnumerable<IEnumerable<GenericEntity>> ChunksOfWork = null;
                foreach (string partition in partitions)
                {
                    var ThisPartitionEntities = entities.Where(en => en.PartitionKey == partition).ToList();
                    if (ChunksOfWork != null)
                        ChunksOfWork = ChunksOfWork.Union(ThisPartitionEntities.Chunk(100));
                    else
                        ChunksOfWork = ThisPartitionEntities.Chunk(100);
 
                }
 
                //Create the table
                string TableName = StartDate.ToString("MMMM", CultureInfo.InvariantCulture).Substring(0,3) + "Logs";
 
                if (NewTableIfNotExists(TableName))
                {
 
 
                    //Insert in batch
 
                    const bool forceNonParallel_A = true;
                    var options_A = new ParallelOptions { MaxDegreeOfParallelism = forceNonParallel_A ? 1 : -1 };
 
                    Parallel.ForEach(ChunksOfWork, chunk =>
                    {
                        CloudTableClient MonthlyTableClient = CloudStorageAccount.CreateCloudTableClient();
                        TableServiceContext tsContext = CreateTableServiceContext(MonthlyTableClient);
                        foreach (GenericEntity entity in chunk)
                            tsContext.AddObject(TableName, entity);// gv the new table name
 
                        tsContext.SaveChangesWithRetries(SaveChangesOptions.Batch);
                    });
 
                  
 
                    //Delete corresponding entries from WADLogsTable
                    const bool forceNonParallel_P = true;
                    var options_P = new ParallelOptions { MaxDegreeOfParallelism = forceNonParallel_P ? 1 : -1 };
                    Parallel.ForEach(ChunksOfWork, chunk =>
                    {
                        TableServiceContext tsContext1 = CreateTableServiceContext(tableClient);
                        foreach (GenericEntity entity in chunk)
                        {
                            tsContext1.AttachTo(SelectedTableName, entity, "*");
                            tsContext1.DeleteObject(entity);
 
                        }
                         tsContext1.SaveChangesWithRetries(SaveChangesOptions.Batch);
                    });
                }
                int deleteCount = 0;
                foreach (GenericEntity entity in entities)
                {
                    tableServiceContext.DeleteObject(entity);
                    tableServiceContext.SaveChanges();
                    deleteCount++;
                }
 
                if (deleteCount == 1)
                {
                    ReportSuccess("1 entity archived");
                }
                else
                {
                    ReportSuccess(deleteCount.ToString() + " entities archived");
                }
            }
            catch (Exception ex)
            {
                ReportException(ex);
            }
 
        }

The same code could be kept in controllers and a scheduled url call to this code will do the trick.

So, to conclude, we have to put a bit of thought on what could be a table, an entity and most importantly, which property to designate as the partition key and which one will be the row key.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)