You're (Probably) Documenting That Wrong

mwootendev

3.75/5 (4 votes)

1 Nov 2011Apache11 min read

8.5K

This post will look to address bad software documentation habits and how they can be improved.

Programmers Don't Like to Document Code

Code documentation is one of those tasks that software developers like to slack on. It's not the documentation that's important, it is the code that is important! Lots of developers either don't add documentation blocks or fill them in with only the basic amount of information. This post will look to address bad software documentation habits and how they can be improved.

Why Should We Document Code?

I've heard lots of arguments as to why writing code documentation is a waste of time. It takes too much time to write! The code is self-documenting! It will just be out of date by the next release, so why bother! Sure, documentation is time consuming. Agreed, well written code can be partially self documenting. Yes, if you don't have good habits in place the documentation will be out of date.

A lot of the arguments are perfectly valid if you are documenting the wrong things. Good code documentation provides a clear understanding of the contract that code will adhere to. Good code documentation will alert consumers of the code of the "gotchas" that can crop up from using it. Good code documentation will serve as a reminder for why something was done or not done.

What Should You Be Documenting Internally?

By internally I mean inside functions. This refers to code that is not part of an interface, but in the lowest level blocks of code that only developers on the same project will ever be able to see.

Internal Code that Is Hard to Understand

First, you don't need to be documenting each and every line of code in a function with what it does. If the code is not clear enough, it should be rewritten. One of the "code smells" from Martin Fowler's book Refactoring: Improving the Design of Existing Code is too much inline documentation. If you think you need to write a lengthy treatise, first consider renaming some variables for clarity or extracting the block into a well-named method with its own documentation. That's not to say that there aren't cases where you need internal developer documentation. If the code is difficult to understand because it is a complex formula or a particularly tricky regular expression, then by all means throw some comments above it. Just don't do this:

<script src="https://gist.github.com/1250644.js?file=gistfile1.java"></script>

The "Why" and not the "How"

So we've covered not documenting how something is done unless absolutely necessary, but sometimes comments are still needed to explain why something is being done. Often far more important than how you did something is why you did it, especially if it is not immediately clear to someone unfamiliar with the code. Sometimes comments are needed because the code looks funny, or relies on a side effect elsewhere that most readers of the code wouldn't know about. Comments like the following can be extremely useful for other team members (or yourself weeks later when you forget why you wrote the code that way):

<script src="https://gist.github.com/1250650.js?file=gistfile1.java"></script>

What Should You Be Documenting Externally?

Far more important than what you document internally is what you document externally. By externally, I mean an interface exposed to consumers (either internal through private methods, or external through public methods). While the code's interface provides a contract between the code and its consumer, the external documentation provides the finer details of that contract.

What You Should Definitely NOT Document

Before we dive in to what you should document, I think it is very important to call out exactly what you should not document. Never expose implementation details in your external documentation. Doing so can lead to a lot of the issues that developers complain about. Internal details are far more likely to change and there's a good chance someone will forget to update the documentation. Even worse, a consumer of the code might make assumptions based on the details, so if the implementation does change the user's code might be broken. Consider the following:

<script src="https://gist.github.com/1264597.js?file=gistfile1.java"></script>

Not the best interface, but it serves a point. The only method you can call on Iterable is iterator(), so that is all that is really exposed to the consumer. Except, the documentation explicitly calls out which Iterable will be returned. Since the documentation is a contract, if you change the implementation of the Iterable from LinkedHashSet you will be breaking that contract. The sorting method mentioned is purely fluff; all the consumer really needs to know is that the items in the iterator are sorted. Now consider what a consumer can do with this:

<script src="https://gist.github.com/1264608.js?file=gistfile1.java"></script>

Most likely we don't want users adding new line items to the collection of sorted ones, but by documenting the actual type we allow users who want to be a little risky to try to anyway. I say risky because a few months down the road you may realize that line items don't have to be unique and so a set is not appropriate, so you switch to a LinkedList instead. You may also realize that the bubble sort is not as efficient as a quick sort and change that too. In both cases the documentation would have to be updated, and in the first case you may have just broken a consumer relying on the original documentation.

In general, make sure your code documentation is a black box that is only exposing information that its consumers really need. Avoid mentioning specific classes used for interfaces (I see ArrayList called out a lot for the List return type). Also avoid describing the algorithms used, since you might want to change them later.

Document At Minimum What the Documentation Tool Supports

So what do you want to document? Most programming languages have a documentation system that can be used to generate detailed developer documentation. Java has Javadoc, there is XML documentation for .NET, and C++ and C code can use Doxygen. In most of those cases the documentation will include a description of the class or method, any parameters it takes, the return value, and any exceptions it might throw. Make sure that you document all of these things at a very minimum.

Document Units of Measure

Here's something I see developers trip up on all the time. Consider that you're going to sell your software and I come up and hand you a contract offering to buy it for one million! Would you accept that offer? You shouldn't! One million dollars? One million pesos? One million bottle caps? There's a very vital piece of information missing from that contract: the units. At least one ~$200M space craft was lost due to using the wrong units of measure. Now considering the following piece of code:

<script src="https://gist.github.com/1264614.js?file=gistfile1.java"></script>

Does that look fairly standard to you. I've seen code like it at least a half of a dozen times. Since the person who wrote that was probably also the consumer it made perfect sense to him at the time. That is not a good specification though. What is the distance measured in? What is the speed measured in? What unit of time is being returned? Now consider the following:

<script src="https://gist.github.com/1264622.js?file=gistfile1.java"></script>

That looks a lot better and provides the missing information that consumers will need to call the method. Common places to look for missing units of measure include times, distances, sizes, amounts, weights, temperatures, and screen measurements (em vs. pixel).

Assumptions About Parameters

When we write new functions we automatically make a lot of unconscious assumptions about the parameters, especially if we are also writing the code that will be consuming that function. A lot of common assumptions we make are about the format or expected values of parameters. We expect that certain parameters will never be null. We expect that the string provided representing a phone number will be in the form "XXX-XXX-XXXX" and not "(XXX) XXX-XXXX". We expect that the number of items the user wants to add to their cart is a positive number. We expect the user's GPS latitude value to be between the range of -90 and 90. Many times we test for these assumptions, and some times we don't. The import thing is to document those assumptions though, so that consumers can know why something went wrong when they pass in bad values.

I've found it is a good practice to always document any restrictions on parameters next to the parameters themselves. I usually do it in parenthesis following the parameter description. Here is an example:

<script src="https://gist.github.com/1264636.js?file=gistfile1.java"></script>

The Reason for Thrown Exceptions

I see this one quite a lot. Take a look at this example:

<script src="https://gist.github.com/1264640.js?file=gistfile1.java"></script>

It is nice that at least you know what checked and unchecked exceptions can be thrown from the method. Unfortunately, that's all you know. If a consumer does catch a NullPointerException thrown from this method, will they know why (especially if the message is the wonderful default "null")? Just as important as listing the exceptions is to list why they are thrown so consumers can troubleshoot their code. Consider the following revised version instead:

<script src="https://gist.github.com/1302085.js"> </script>

That adds some clarity as to why an exception would be thrown and gives the user something to look for in their own code.

Null Return Values

I encountered this issue with a fairly well known API a few weeks ago. According to the documentation, I provide a URL to the function and I get an InputStream back with the contents of the file located at the URL. For the protection of the offender I offer this version:

<script src="https://gist.github.com/1302097.js"> </script>

That seems pretty good. If the URL can't be reached, exceptions will be thrown. Ok, I can handle that. I write my code, wrap it in a try...catch block, and use the InputStream. During testing we get a NullPointerException and we trace it back to the InputStream returned from the method being null. What? Nothing in the documentation says that the InputStream returned can be null. It turns out that if the URL can be reached but the contents of the document are blank then null is returned instead of an empty InputStream. Well, instead of having to figure that out through trial and error, it would have been nice for the documentation to have read more like this:

<script src="https://gist.github.com/1302105.js"> </script>

The general rule is that if your methods can return null, make sure that the user knows that so they can null check the response. A common offender is "search" methods that don't find a result.

<ph4>All Values and Meanings of "Return Codes"

Return codes, which are integer values that represent the response from a function, have a way of slipping in at the last minute. Often a developer will realize after writing a method that the caller needs to know the result of the operation (especially if it fails), so they change "void" to "int" and return some magic number. The programmer then updates his code to make use of this number and moves on with his day, and the actual meaning of it is lost in time. Consider the following:

<script src="https://gist.github.com/1302113.js"> </script>

I've seen code like that too many times before. I'm certain that the author knows what magic codes can be returned from that function, but I don't. Consider this revision:

<script src="https://gist.github.com/1302119.js"> </script>

That's a nice start. The next step is a bit more refactoring than it is documentation, but it really adds clarity. With a few constants declared for dropbox return values the comments and code will be much clearer:

<script src="https://gist.github.com/1302129.js"> </script>

That makes the documentation and the code make more sense than the magic number return codes presented before.

Side Effects and Relationships

Code that has "side effects" (i.e. affects something other than it was intended to or that is not obvious) should definitely have the side effect documented. Side effects, in general, should be avoided. To reduce possible confusion about the side effects, always make sure that they are well documented and stand out (preferably in bold text). Additionally, if there is a relationship between two calls, such as one call cannot be made until another one is made, those relationships need to be listed. In those cases, always make sure to point out the required order of the calls and indicate exactly which method must be called first.

<script src="https://gist.github.com/1322779.js"> </script>

Use Examples for Tricky Interfaces

Someone actually asked me the other day if it was appropriate to put code examples in the documentation. Yes! They say a picture is worth a thousand words, and sometimes a simple example code snippet can be be worth a few paragraphs. I have encountered many classes where I have read the class documentation and method documentation and still have had no idea how to use it. Usually, I end up searching Google for an example to use for clarification. For classes that are complex to use (for instance, have a lot of associations between calls, calls that have to be in specific orders, have specific "states", methods with lots of parameters, etc.) it is a good idea to show an example how of the class or is used in its own documentation. The same can be true for methods that require very specific sets of parameters. This is one of those areas where the documentation can easily become out of sync with the code, so make sure to always check for examples when you change an interface.

Conclusion

I hope that these tips help you to better document your code. Code documentation can take a lot of time, but really good code documentation can pay off big in providing clarity and understanding of how the code works and should be used. If you can think of anything I missed, please feel free to drop a comment.

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0