(untagged)

Blocking and Asynchronous Operations Without Timeout are Broken

User 11633146

5.00/5 (1 vote)

25 May 2015CC (BY-ND 3.0)4 min read

Blocking and asynchronous operations without timeout are broken

Any time we are waiting for something to happen, from reading the disk to locking a mutex, we need to have a timeout. Without a timeout, we run the risk of that operation never actually completing and our program completely hanging. It’s unfortunate that so many languages still have APIs lacking such timeouts.

A Typical File Hang

File code like below is abundant. I’d be surprised if there were any non-trivial server programs that don’t have such code.

1 handle = open(some_file)
2 text = handle.read()

It may not be obvious why this type of code can hang. If the file is on the local disk, we generally expect a very quick response and never expect the disk to simply not respond. The problem is with abstraction. In the case of a file, it’s never certain where that file really is. It doesn’t matter that open doesn’t accept URLs. The drive could be a network mount, it may be a pipe, or it could be a cloud block device. These things can all result in hung ‘read’ calls.

Everywhere by Default

Any IO function has the potential to block. This includes writing functions; anything which writes can also wait endlessly for a buffer to flush, or connection to re-establish. Meta-information functions, like stat also require a timeout.

Any function which “waits” for something should also have a timeout. This goes beyond IO as it includes locking functions, such as mutexes. The assumption must always be that whatever we’re waiting for may never actually happen.

And this timeout must be applied by default; it is not merely an “available” feature. The default timeout should also be a short period of time, short enough so that any unusual delay will trigger it. This forces programmers to think about what happens in these cases, dealing with the timeout or consciously extending it.

Asynchronous

To be clear, the timeout applies to blocking and asynchronous calls equally. While it’s certainly helpful that a “hanging” async operation doesn’t block processing of other activities, it still results in some operation never making any progress. This usually leads to some external process never getting the response they were waiting for.

Bandwidth

We need to consider the definition of timeout as well. A common definition is simply a period of time where nothing happens. In many APIs, especially network sockets, the timeouts are only triggered if no data is exchanged. I don’t think this is valid.

Consider a situation with low bandwidth. There is little practical difference between a hung connection and one sending data at only 1B/s. Having one timeout and the other not seems wrong.

Streaming operations should have minimum bandwidth requirements. If a certain speed is not maintained, then it should simply fail.

Total timeout

Many HTTP libraries are silly when it comes to timeout handling. We can find different parameters for the DNS lookup, the initial connection, the exchange of headers, and the document exchange. The one thing that very few provide is what I actually want: a total timeout.

I want to specify an upper limit for the time from when I start the HTTP request to the time the document is fully retrieved. I really don’t care why the request has failed.

If I expect a large document, or am streaming, I’d prefer to give an upper limit to the “negotiations” phase and a bandwidth requirement for the document phase. I’ve unfortunately not seen either of these options in an HTTP library before.

No Excuses

It’s unfortunate that libraries are not designed around failures by default. Perhaps 20 years ago, this could have been forgiven, but now where even the most trivial of devices are multitasking and network enabled, it’s just not acceptable. All it takes it one minor hiccup to render some programs completely inoperable.

The simple rule is: if we are waiting for some event, we have to assume that it may never happen. We don’t need always need advanced error handling, but just some way to fail gracefully is often enough. Simply hanging there doing nothing is rarely helpful.

If like to explore languages and compilers then follow me on Twitter. I always have more ideas and things to uncover. If there’s something special you’d like to hear about, or want to arrange a presentation feel free to contact me.

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-NoDerivatives 3.0 Unported