Introduction
Unlike in the real world where the winner is good, in programming world good programming wins. It is not uncommon to see developers shaking their head in frustration when they are looking at the code written by others or themselves and using all those unparliamentary four letter words. If not all, many of the times this arises because of bad programming practices.
While it is universally accepted by programmers that standards are very important, the disagreement comes in the standards that are to be followed. (Un)Fortunately, programming is not a science and still remains an art. I believe that it shall remain to be so for quite a while, if not forever. Until that time comes, the best that can be done is in following some good programming practices.
Being of the opinion that there is no point in listing the good programming practices (which have been discussed ad-nauseam by computer scholars, project leaders and Quality Assurance Groups) I only wish to list the bad coding practices which I have experienced or heard so far.Most of the discussion here is limited to C,C++ programming. Though some of the concepts can be applied to other languages, the discussion in this document is concentrated primarily on C and C++.
It is not possible for all the programmers to program in all kinds of scenarios. So, I wish to maintain a list of scenarios and bad programming practices in that scenario. As the saying goes, "Fools learn from their own experience and wise men learn from others". I was foolish enough to implement some of the bad programming practices and learnt frommy mistakes.
1. Oooops!!! This is not OOP
I have fallen prey to this many times. The temptation of writing a program in the procedural way is sometimes too strong to resist and I have exposed the internals of a class by either making the member variables as public or worse, grouping them as structures instead of classes.
Before I proceed any further, let me say it for the record: I am not against procedural programming. It is the way we all started learning programming and is one of the easiest ways to imagine business logic in terms of code. And what the heck, it is a lot simpler!!! There is no shame in stating yourself to be a procedural programmer.
But then for some reason like a stubborn project leader or an adamant project manager or a god damn crazed visionary on the top, we are forced to write the program in an Object Oriented Programming language. Here starts the war of the worlds.
OOP is more than a group of classes. It is much vaster than the simple textbook definition of class as a collection of related data and functions. If we go by this definition alone, then all the functions in any software package can be viewed as related and can all be made as the member functions of a single class. This might work. And with some additional effort, it actually does work. But with all due respects, it ain’t Object Oriented Programming.
For example, consider the following program. This is one of the classic cases of
procedural object oriented jumbled up garbage.
Example 1A
#include <stdio.h>
class CSingleClass
{
int i,j;
public:
void Execute()
{
int k=0;
scanf("%d%d",&i,&j);
}
};
void main()
{
CSingleClass o;
o.Execute();
}
Syntactically, the above program is object oriented. But that is only because, it cant be compiled in a C compiler. Semantically, this program is not even close to what is called object orientated programming. The above is a fairly simple example and I hope you are still with me. Now let us see a slightly more tricky case where I have seen programmers more experienced than me succumbing to Procedural programming temptation.
Example 1B
Let’s say I have a class which deals with rectangles as shown below. After a few months into the development, there was a need to support parallelograms too. It is sometimes tempting to just add an Enum or a flag to the existing rectangle class and use it for parallelograms too.
typedef struct sPoint
{
double x; double y;
} sPoint:
class CRectangle
{
sPoint m_BottomLeft;
sPoint m_TopRight;
public:
void Area();
};
This class can be used for Parallelograms with little modifications as
class CRectangle
{
sPoint m_BottomLeft;
sPoint m_TopRight;
double m_dInclusiveAngle;
bool m_bIsParallelogram;
public:
void Area()
{
if (m_bIsParallelogram)
{.. .. ..}
else
{.. .. ..}
}
}
This sure as hell works. It saves a lot of time for you because you need not write the code for set/get of the points etc. The more such functions, the more time you save when writing code. It may even win you a pat on the back from your project leader for finishing the work quickly. But, with all due respects, it is neither Object oriented nor is it procedural. It is a badly cooked spaghetti. For every minute you saved in avoiding code duplication, the time spent in maintaining this code is manytimes more than that.
With a diligent analysis of the design, I am sure that such spaghetti code can be completely avoided with minimum code duplication.
Note: I am not a student of probability. So, I plead guilty at the arrogance of using the word "completely".
2. Enums or #defines?
We all had the need to use some or other arrays in our programming. Many a time, we also know the index of the object we are interested in. It is a very compelling to argue that "This index is not something which the user can modify. Why should I not hard code the number?"
One of the best examples I have seen for this is in the case of UI programming using Grids or tables. Let say, we are collecting user information in a table as shown below:
User Name (Last,First) | User Id | Date of Birth | City of Birth | Marital Status |
Tom Cruise | tcruise | 03-july-1962 | Syracusae, New York | Married |
| | | | |
Then we can safely use
Example 2A
Table[RowNumber][0] = strUserName;
Table[RowNumber][1] = nUserId;
Table[RowNumber][2] = DOB;
Table[RowNumber][3] = strCity;
Table[RowNumber][4] = bIsSingle?"Single":"Married";
This definitely works and there seems to be no reason why this should not be done. But the reason is that, there might always be a tom fool of a user who wants ‘Marital status’ to come before ‘City of birth’. He might even be willing to pay for this change. Now imagine doing this change for this one user’s installation. Imagine the development effort of replacing 3 with 2 and vice versa. The nightmare continues into testing. A ‘1’ mistakenly replaced with ‘2’ or a unreplaced ‘3’ shall wreak havoc and keep you from your night’s sleep.
This gets even more tricky when one field is a dropdown list (Example Marital status can be Single or married.) Then the sequence of selections might not really matter to the user. But then your project manger gets the novel idea of providing options like "Single and looking", "Married but interested" etc. Lo!!! Afore mentioned nightmare has materialized.
A safe way to minimize troubles with this kind of situations is using a #define or an enum for the column numbers or the dropdown box entries etc. Consider the same case but now with #defines.
Example 2B
#define COL_USER_NAME 0
#define COL_USER_ID 1
#define COL_DOB 2
#define COL_CITY 3
#define COL_MARITAL_STATUS 4
Table[RowNumber][COL_USER_NAME] = strUserName;
Table[RowNumber][COL_USER_ID] = nUserId;
Table[RowNumber][COL_DOB] = DOB;
Table[RowNumber][COL_CITY] = strCity;
Table[RowNumber][COL_MARITAL_STATUS] = bIsSingle?"Single":"Married";
Now changing the sequence of columns is limited to correcting the column numbers in the #define
s. Rest is automatically taken care of.
Corollary:
Over a period of time, I have found myself leaning towards using Enum
s instead of #define
s. The Enumerations have the unique advantage that, they maintain their numbering all by themselves.
When using #define
s, reordering would mean that we not only need to remember the order in which they should appear, but also make sure that the numbering is maintained. i.e. if the user has asked for DOB as the first field and ID as the last field, the #define
s shown in the Example 2B shall change as
Example 2C
#define COL_USER_NAME 1
#define COL_USER_ID 4
#define COL_DOB 0
#define COL_CITY 3
#define COL_MARITAL_STATUS 2
If the number of columns is large around 20 say, then maintaining all the numbers becomes confusing while modifying the code. You may need 3 or 4 debugging sessions to get the sequence correct. Enum
s offer a better way of doing it.
Consider Example 2B rewritten using Enums
Example 2D
enum{
COL_USER_NAME = 0,
COL_USER_ID,
COL_DOB,
COL_CITY,
COL_MARITAL_STATUS
};
Table[RowNumber][COL_USER_NAME] = strUserName;
Table[RowNumber][COL_USER_ID] = nUserId;
Table[RowNumber][COL_DOB] = DOB;
Table[RowNumber][COL_CITY] = strCity;
Table[RowNumber][COL_MARITAL_STATUS] = bIsSingle?"Single":"Married";
By virtue of being declared as enums, COL_USER_ID and others are automatically assigned a value by the compiler. This can be used as an index in the array.
Now, changing the columns sequence is a piece of cake. Change the relative positions in the Enum and you are done.
Example 2E
enum{
COL_DOB,
COL_USER_NAME,
COL_MARITAL_STATUS,
COL_CITY,
COL_USER_ID
};
Table[RowNumber][COL_USER_NAME] = strUserName;
Table[RowNumber][COL_USER_ID] = nUserId;
Table[RowNumber][COL_DOB] = DOB;
Table[RowNumber][COL_CITY] = strCity;
Table[RowNumber][COL_MARITAL_STATUS] = bIsSingle?"Single":"Married";
Note: The “Single”, “Married” strings being hard coded into the code is a glaring bad practice. I have put it here only for clarity sake.
3. Macros or inline functions
Why are C/C++ programmers so happy at writing macros? Sure, macros offer an easy way of writing reusable pieces of code. But they also have the side effect of making the changes in-context. Indiscriminate use of macros can cause rather unintended effects. Consider the following example. The macro checks the return value and throws exceptions based on various circumstances.
Example 3A
#define CheckValueForError \
{ \
if(i+j-k!=0) \
{ \
value = value/(i+j-k); \
} \
if(value>1) \
throw exception; \
} \
Can you reuse this macro? Of course, you can!!! But how easy is it to reuse? That is a debatable question. This macro has the preconditions that the variables i,j,k and value are declared in context of the macro call and that they have the appropriate values. Moreover, the macro changes the value of these variables. If for any reason, the variable name conflicts, then this macro cannot be used. By the time the macro execution finishes, you have no idea of whether or not the macro has done what it should.
The solution for these is to use inline functions. The same macro above can be rewritten as a function as
Example 3B
bool inline CheckValueForError(int i, int j, int k, float value)
{
bool bFlag = true;
if (i + j - k != 0)
{
value = value / (i + j - k);
bFlag = true;
}
if (value > 1)
throw exception;
return bFlag;
}
While it is true that you need to explicitly provide the variables that need to be checked as input parameters, this actually is helpful. All the above mentioned evils of macros shall vanish with inline
functions. Saying that function has the overhead of stack maintenance is not valid here since this is an inline
function.
So, are Macros devils in disguise? Noph. They are not. They have their uses. If you want a change to be done in the context of the call, and this change to be done is many places in the code, then use a macro. For example, you want a code which swaps the value of two variables. The same algorithm needs to be written over and over for many different data types and there is no way for you to use function overloading or perhaps function overloading is overkill. Then macros are the only way out. Macros also come handy for a clean and streamlined implementation for Single entry single exit programming. This, I shall discuss further on in my discourse.
If you observe carefully, I have slightly cheated you above to put forth my point. The example I gave for bad macros does not have any input arguments. And when I said of macros being a good way in the previous paragraph, I was referring to macros which have input arguments. When using macros with input arguments, then all the advantages which I attributed to inline functions shall be available for macros too. Still, the inline functions have one significant advantage. They can be stepped through in the debugger!!!
When a macro is called, then the macro is processed by the preprocessor and the macro code is expanded in the source program. The expanded source program is written in a file known as Intermediate file and it is this intermediate file which is processed by the compiler. In the debug information stored in the PDB file, all these lines of code in the intermediate file are linked to the single line where macro is written in the source C/C++ file. As a result, you will see the debugger passing your macro call as if it is only a single statement.
This behavior is both helpful and troublesome. When the macro call is working fine and you don’t need to check the code flow inside the macro, then this is good. But if you are debugging some problem and are suspecting that the macro itself is having the bug, then it is very difficult to see inside the macro.
As with everything, there are counter arguments to this statement. It is certainly possible to instruct the compiler to save the Intermediate file and use it in the debugger so that, you can see the expanded code. But the catch is that, the intermediate file contains the code which was written in the way which is easy for the compiler to understand. It contains the expanded code for various macros provided by the SDK you are using, written by you or your colleagues and the various libraries you are using. In the best cases, this code can be understood easily. At times, this intermediate file can be so cryptic that, you might find it difficult to find where your code dealing with business logic is.
4. Single Entry Multiple exit
Single Entry Single Exit is a significant tenet in structured programming (Pls. do not confuse Structured programming with Procedural programming).
Single Entry Single Exit programming says that a function/procedure should have a single entry point and single exit point. In all cases of the code flow i.e. success, failure, error or exception the control must enter the procedure at one point and exit and another. There should not be any other entry or exit.
I find it hard to imagine how one can write a procedure which has multiple entry points in a modern languages. But writing code with multiple exit points is a very common practice. For example, consider the following program which finds the square root
Example 4A
bool SquareRoot(double dValue, double& dSquareRoot)
{
if (dValue < 0)
{
return false;
}
dSquareRoot = pow(dValue, 0.5);
return true;
}
A similar coding is very common. This is a single entry multiple exit program. Now let us consider a single entry single exit counterpart for this function.
Example 4B
bool SquareRoot(double dValue, double dSquareRoot)
{
bool bRetVal = false;
if (dValue < 0)
{
dSquareRoot = NAN;
bRetVal = false;
}
else
{
dSquareRoot = pow(dValue, 0.5);
bRetVal = true;
}
return bRetVal;
}
Though this is a simple example chosen specifically for being compact, in real life, programming is seldom this simple. The condition where procedure has to be exited is encountered in the middle for a nested loop which is 3 for()
loops deeps. Then having multiple return statements all over the place is good or bad? Consider the below where The tan(arcsine())
is calculated for an array of numbers.
Example 4C
bool TanArcSine(double* pValues, int nLength, double** ppResult)
{
if (nLength < 0)
return false;
*ppResult = new double[nLength];
for (int inx=0; inx <= nLength; inx++)
{
if ((-1 <= pValues[inx]) && (pValues[inx] <= 1))
{
*ppResult[inx] = asin(pValues[inx]);
if (fabs(*ppResult[inx]) != M_PI / 2)
{
*ppResult[inx] = tan(*ppResult[inx]);
}
else
{
return false;
}
}
else
{
return false;
}
}
return true;
}
Consider the same program written using a Single entry single exit mechanism.
Example 4D
bool TanArcSine(double* pValues, int nLength, double** ppResult)
{
bool bRetVal = false;
if (nLength < 0)
bRetVal = false;
else
{
*ppResult = new double[nLength];
for (int inx=0; inx < nLength; inx++)
{
if ((-1 <= pValues[inx]) && (pValues[inx] <= 1))
{
*ppResult[inx] = asin(pValues[inx]);
if (fabs(*ppResult[inx]) != M_PI / 2)
{
*ppResult[inx] = tan(*ppResult[inx]);
}
else
{
bRetVal = false;
break;
}
}
else
{
bRetVal = false;
break;
}
}
bRetVal = true;
}
return bRetVal;
}
An interestingly large number of programmers believe writing Single entry multiple exit programs is easier, safer, compact, reliable etc… The argument they often provide is – “why carry the excess baggage when it is no longer needed?” Looking at the programs in example 4A and 4B, one tends to agree with them. And they can’t be blamed for doing so. It certainly is of no value addition to execute a procedure when an error condition is encountered.
But as a responsible programmer, it is our responsibility to make sure that whenever our function fails, it does so very gracefully. No one should be given a chance to grumble when it fails. For this purpose, I find single exit point concept extremely helpful.
As you can imagine, procedures are written to achieve some really complicated tasks like writing data to a table in the database, entering a critical section and performing some operation in a thread safe manner in multi threaded/multi process system. Now when we return from the procedure we must do things like committing or rolling back the database, releasing the mutex and exiting the critical tasks. There might also be things to do like closing a file etc.
If there are more than one exit points, then this exit tasks need to be done in all those exit points. In most cases, this would mean blunt code duplication. As you know very well, code duplication is a bad practice.
Instead, if we maintain a single exit point, then by virtue of having a single exit, all the exit tasks will be concentrated in a single location in the code. This directly reflects on the code quality. So, with a little extra baggage, we can avoid possible code duplications and keep our code top class.
I will not be so arrogant here as to call Single entry Multiple exit as a bad coding practice all by myself. The jury is still out on this. Single Entry Single Exit is a good practice in my opinion and all my remaining discussion is based on this style of programming.
Fortunate or not C, C++ do not have any built-in support for single exit programming. If you want to employ this kind of programming, then you have to do it with carefully prepared standards and adhering to them strictly. Needless to say, this is very subjective and varies with taste. C# helps in this to some extent in the try catch finally
blocks. But beyond that, you are on your own.
5. Resource lock-up
One of the easily committed mistakes is forgetting to free the memory allocated when an error condition is encountered. Both in example 6A and 6B, whenever the return false statement is executed, the memory allocated to ppResult has not been freed. As a responsible programmer, it is the procedure’s responsibility to free all the resources that are not used. We cannot expect the caller to check the return value and free all the memory. If the ppResult happens to be a local variable, then the caller cannot free that memory.
Java, C# and .Net extensions to C++ have introduced Garbage Collecter to take care of this situation. But then, there might be other resources like open files or network sockets or semaphores etc which must be released. There is no built-in support for this kind of resources.
Most of the times, programmers do free up resources. But they do it only in the success cases. When employing Single Entry Multiple exit, very often some or other resource shall not get freed in the error case. This can clog the resources bringing the system to its knees. If you adopt Single Entry Single Exit style of coding, then solving this problem becomes a bit easy. At the exit point, you need to write the code for freeing the resources and in all circumstances, i.e. success, failure, exception or error you are sure to have your resources freed up.
Consider the following example:
Example 5A
bool TanArcSine(double* pValues, int nLength, double** ppResult)
{
bool bRetVal = false;
if (nLength < 0)
bRetVal = false;
else
{
*ppResult = new double[nLength];
for (int inx=0; inx < nLength; inx++)
{
if ((-1 <= pValues[inx]) && (pValues[inx] <= 1))
{
*ppResult[inx] = asin(pValues[inx]);
if (fabs(*ppResult[inx]) != M_PI / 2)
{
*ppResult[inx] = tan(*ppResult[inx]);
}
else
{
bRetVal = false;
break;
}
}
else
{
bRetVal = false;
break;
}
}
bRetVal = true;
}
if (false == bRetVal)
{
if(*pResult)
delete *pResult;
*pResult = NULL;
}
return bRetVal;
}
There is nothing like a free lunch in this world. The code quality in this case comes with the price of extra baggage. In my opinion, the extra baggage is worth it.
6. The infamous goto
How many of you like goto
? How many of you think goto
is bad? How many of you believe goto
is bad? What will you say when you see a person writing code using goto
s? With break
, continue
and return
statements, it is almost always possible to write code without using gotos. But gotos come in handy for a very easy and simple implementation of Single Entry Single Exit style of programming.
Consider the following example
Example 6A
bool TanArcSine(double* pValues, int nLength, double** ppResult)
{
bool bRetVal = false;
if (nLength < 0)
goto error_exit;
*ppResult = new double[nLength];
for (int inx=0; inx < nLength; inx++)
{
if ((1 < 1))
goto error_exit;
*ppResult[inx] = asin(pValues[inx]);
if (fabs(*ppResult[inx]) == M_PI / 2)
goto error_exit;
*ppResult[inx] = tan(*ppResult[inx]);
}
bRetVal = true;
goto success_exit;
error_exit:
bRetVal = false;
if(*ppResult) delete *ppResult;
*ppResult = NULL;
success_exit:
return bRetVal;
}
This code is certainly more understandable than the one in Example 7. Isn’t it? The only goto
statements present are for success_exit
or error_exit
. No other goto
statement is permitted. I know of many people who employ code quality check tools to trap all other usages of goto
. Thereby Single Entry Single Exit is achieved with minimum over head of extra variables and maintaining the ease of programming. A very controlled usage of goto
is perhaps a viable tradeoff in these cases. So the next time your quality assurance team screams at you for using a goto
, ask them what they prefer. A clean looking code with all resources freed with the over head of exactly two goto
statements OR a code which gets mired in if then else
statements , hard to write, understand and maintain but does not have goto
.
If seeing a goto
statement brings a tear to your cheek, then try this. The same program as Example 6A but slightly rewritten to use macros. This is one place where I find macros not just handy, but indispensable.
Example 6B
#define IF_TRUE_EXIT(expr) \
{ \
if(expr) \
goto error_exit; \
} \
#define SUCCESS_EXIT goto success_exit;
bool TanArcSine(double* pValues, int nLength, double** ppResult)
{
bool bRetVal = false;
IF_TRUE_EXIT(nLength < 0);
*ppResult = new double[nLength];
for (int inx=0; inx < nLength; inx++)
{
IF_TRUE_EXIT((1 < -1));
*ppResult[inx] = asin(pValues[inx]);
IF_TRUE_EXIT(fabs(*ppResult[inx]) == M_PI / 2);
*ppResult[inx] = tan(*ppResult[inx]);
}
bRetVal = true;
SUCCESS_EXIT;
error_exit:
bRetVal = false;
if(*ppResult) delete *ppResult;
*ppResult = NULL;
success_exit:
return bRetVal;
}
7. Return values
I add this section as an after thought. The intention for this section is just to remind people to check the return value of the function call. Even if you are sure that the function cannot possibly fail, it is better to check for the return value. You cant be too careful when it comes to coding.
Assume the case where the function you have called has failed. But the function has forgotten to free the memory allocated to an output pointer. (Probably because the called function is not Single Entry Single Exit). Now you don’t check the return value but go ahead to access the output parameter. This can result in an memory access violation or overflow etc. This is a perfect hacker’s exploit. Sounding farfetched??? Think again.
Conclusion
Can Bad coding practices be concluded??? No. I don’t think so. There isn’t enough space on mybrain to learn all the coding practices. There certainly isn’t enough computing power available in my brain to judge them all and classify as good and bad. Feeling rather proud of my experiences with C,C++ coding, I have started listing the practices which I found bad. Most of these bad practices were once followed by me. I make a diligent effort to not commit them now-a-days but that is something of a perfection which has always remained beyond my grasp. I wish to keep this doc updated whenever I learn a lesson.