Introduction
“If at first you don’t succeed, retry an X amount of times with a configurable delay between attempts “, No One.
Engineers can follow all the best practices and errors will still happen. The issue can be completely out of the developers control such as network or hard disk failure. Some systems are resilient and within a few seconds, a failover will takeover and the system health is back to normal. Unfortunately, the error has already been reported to the user and they need to take further action. It is not uncommon for the user to simply click the same button again without making any changes. “Magically”, “it works” and the user is left with a bad impression. Negative reviews come in for the App stating it is unreliable or “dumb”. A RetryPolicy
can be useful in such a situation by automatically resubmitting the same action in the event of a transient failure. If successful, the end user is no wiser that an error happened and was gracefully handled.
Background
Retrying an action expecting a different result is very common in the real world. A person in a car will turn the ignition, the car fails to turn on, and attempts again until they get the desired result. When failing to change the channel on the television or getting a snack out of a vending machine, a person will attempt the same action again until the television responds or the snack is delivered. This behavior is also observed in video games, mobile apps, and web sites. As software engineers, we can automate this process with a RetryPolicy
and give users a better experience.
A RetryPolicy
may not be useful in every situation. The first step is to identify when and where you might need to retry an operation. A possible use would be when network delays might cause a problem or perhaps when committing to a database, a temporary table lock might causes a short term failure. Once a good use has been identified, the code that might fail execution is passed as a delegate to the RetryPolicy
.
Using the Code
The RetryPolicy
class requires a logger factory, the retry limit, and retry delay. These are used by the private
methods, Execute
and ExecuteAsync
. The logic is the same in both functions with one designed to be used by asynchronous code. They will execute the code, if a failure occurs, it will attempt to execute the code again until it succeeds, or the maximum retry limit is reached. The process flow is depicted in the chart below:
The Console App in the project demonstrates the usage. Once the RetryPolicy
class is instantiated, the code passed to the ExecuteAsync
and Execute
methods will be run by the RetryPolicy
until it succeeds or hits the exit condition. The RetryPolicy
will exit when the execution count limit is reached or if an execution condition delegate returned success. The example code below randomly throws an Exception
to simulate an error that will be handled gracefully.
private static async Task Main(string[] args)
{
ILoggerFactory loggerFactory = new LoggerFactory();
loggerFactory.AddConsole(LogLevel.Debug);
var logger = loggerFactory.CreateLogger(typeof(Program));
var random = new Random();
var retryPolicy = new RetryPolicy(loggerFactory, 7, 500);
var executionAttemptCount = 0;
await retryPolicy.ExecuteAsync(async (token) =>
{
var randomNumber = random.Next(1, 3);
logger.LogInformation("Executing Async");
if (randomNumber % 2 == 1)
{
logger.LogInformation("Simulating Random Transient Exception");
throw new Exception("Random Transient Exception");
}
logger.LogInformation("Execution Async Complete");
await Task.Yield();
});
executionAttemptCount = 0;
await retryPolicy.ExecuteAsync(async (token) =>
{
var randomNumber = random.Next(1, 3);
executionAttemptCount++;
logger.LogInformation("Executing Async with custom exit condition");
if (randomNumber % 2 == 1)
{
logger.LogInformation("Simulating Random Transient Exception");
throw new Exception("Random Transient Exception");
}
logger.LogInformation("Execution Async Complete with custom exit condition");
await Task.Yield();
}, async (token) =>
{
if (executionAttemptCount % 2 == 1)
{
return await Task.FromResult(false);
}
else
{
return await Task.FromResult(true);
}
});
retryPolicy.Execute(() =>
{
var randomNumber = random.Next(1, 3);
logger.LogInformation("Executing");
if (randomNumber % 2 == 1)
{
logger.LogInformation("Simulating Random Transient Exception");
throw new Exception("Random Transient Exception");
}
logger.LogInformation("Execution Complete");
});
executionAttemptCount = 0;
retryPolicy.Execute(() =>
{
var randomNumber = random.Next(1, 3);
executionAttemptCount++;
logger.LogInformation("Executing with custom exit condition");
if (randomNumber % 2 == 1)
{
logger.LogInformation("Simulating Random Transient Exception");
throw new Exception("Random Transient Exception");
}
logger.LogInformation("Execution Complete with custom exit condition");
}, () =>
{
if (executionAttemptCount % 2 == 1)
{
return false;
}
else
{
return true;
}
});
System.Console.ReadLine();
}
The Console output below depicts exactly what will happen when the RetryPolicy
is used in the event of an error:
- Execute the delegate
- Random failure occurred (e.g., network disconnect, hard disk error, table lock)
- After a 500 milliscond delay, try again
- No Random failure this time
- Execution Completed successfully
Points of Interest
This project targets dotnetcore 2.x.
Source code on Github: https://github.com/SenseiCris/RetryPolicy
History
- Version 1.0 – Initial release