Introduction
This was all about fetching the data
quickly. Let’s walk through the options available for deleting the data in a
jiffy.
When we have a never ending list of
entities and we want to delete most of it, going sequentially can be really
expensive. That will make 1 url with delete verb for each entity and each
delete request takes around 17sec to respond successfully, so it could take
more than an hour to delete an hour’s logs.
What we can do is, divide the whole list of
entities into smaller lists holding entities belonging to the same partition
together.
var partitions = entities.Distinct(new GenericEntityComparer()).Select(p => p.PartitionKey);
IEnumerable<IEnumerable<GenericEntity>> ChunksOfWork = null;
foreach (string partition in partitions)
{
var ThisPartitionEntities = entities.Where(en => en.PartitionKey == partition).ToList();
}
Then, chunk these partition specific lists
into chunks of 100 entities. Why 100? Because that’s the upper limit on the
number of operations allowed per batch. Rules of the game.
var partitions = entities.Distinct(new GenericEntityComparer()).Select(p => p.PartitionKey);
IEnumerable<IEnumerable<GenericEntity>> ChunksOfWork = null;
foreach (string partition in partitions)
{
var ThisPartitionEntities = entities.Where(en => en.PartitionKey == partition).ToList();
if (ChunksOfWork != null)
ChunksOfWork = ChunksOfWork.Union(ThisPartitionEntities.Chunk(100));
else
ChunksOfWork = ThisPartitionEntities.Chunk(100);
}
public static class IEnumerableExtension
{
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}
}
Create a context and attach all these entities
in a single chunk to the context and delete trigger the delete request in
batch.
TableServiceContext tsContext = CreateTableServiceContext(tableClient);
foreach (GenericEntity entity in chunk)
{
tsContext.AttachTo(SelectedTableName, entity,"*");
tsContext.DeleteObject(entity);
}
tsContext.SaveChangesWithRetries(SaveChangesOptions.Batch);
To make it faster, we can trigger the
requests for each partition, in parallel using .net framework’s “Parallel”
class. This is because operation going on in each partition is independent of
the other one and batch operations could be done on one partition each batch.
Parallel.ForEach(ChunksOfWork, chunk =>
{
TableServiceContext tsContext = CreateTableServiceContext(tableClient);
foreach (GenericEntity entity in chunk)
{
tsContext.AttachTo(SelectedTableName, entity,"*");
tsContext.DeleteObject(entity);
}
tsContext.SaveChangesWithRetries(SaveChangesOptions.Batch);
});