Introduction
Are you a developer who finds it easier to read code instead of comments, only to discover how difficult it is to analyse the code and how simple a function/procedure turns out to be? Perhaps you have been in a situation in which you wanted to re-document your code which contains 20-pages (or more) from scratch knowing how tedious it is to remove comments line-by-line. Or maybe you are pondering how to "water down" your code before transmitting it to reduce network transfer times.
What happens to comments in your code when you compile your program? This article provides an insight to these and a tool which strips existing comments within an ASCII source code file.
How it works
Ordinary compilers do not understand comments. It simply skips over them. However, having them in the plain text source will likely cause problems during the compilation process. Hence to overcome this situation, most C-style languages use /* */ or // to denote comments. This will flag to the compiler/intepreter not to "read" what comes after
int x; //commentary about an unknown alien x, in one line
Or
int x; /* Tell me more about the
stars and the moon, in an essay */
So ever wonder what what happens to those pesky comments when the moment you hit that compile button? They get dumped! Well, I mean they stay in the source file. Surprised? As mentioned earlier, comments are not for the compiler! Having said that, it may be possible to store comments in a binary's metadata section. (Although I don't know of a compiler that implements this functionality, at much performance trade off)
Take this code snippet in Java for example
@VisibleForTesting
static String simpleName(Class<?> clazz) {
String name = clazz.getName();
int start = name.lastIndexOf('$');
if (start == -1) {
start = name.lastIndexOf('.');
}
return name.substring(start + 1);
}
When you compile the code the compiler sees it as
@VisibleForTesting
static String simpleName(Class<?> clazz) {
String name = clazz.getName();
int start = name.lastIndexOf('$');
if (start == -1) {
start = name.lastIndexOf('.');
}
return name.substring(start + 1);
}
as the comments are striped on the fly. That's all the compiler needs to generate object or/and machine code! The removal of comments is always done prior to compilation and it is very often transparent and invisible to developers.
The application I present here today implements this functionality. Given a text source file, it strips of comments, leaving compilable code behind. This come in handy when you wish to redocument someone else's or your code without having to manually remove the code line-by-line. In the example screenshot, all single line, multiline and even Javadoc comments are removed. Likewise, in C#, XML comments are also removed.
Algorithm Overview
The basic rule is when a single line(//) comment is found in the line of code, the program should stop reading until a new line is encounted ('\n'), the next line read.
When a multiline (/*) token is found, the program should stop processing until a */ is found. A "\n" or a "*/" returns the state to normal.
In this implementation, a StreamReader
is employed to read our ASCII source code. As code is processes line-by-line using the readLine()
method, detecting and handling comment delimeters becomes slightly more difficult. Many compiler implementations written in C such as gcc parses the code on a char-by-char implementation, for performance and optimisation. However what we are developing is nothing close to a full fledged compiler, so line-by-line processing should be adequate.
I was able to keep it to a minimum of 2 methods, the main method doUncomment()
and an internal method to handle string literals. All methods are implemented as static
, so there is no need to create instances. For more information, please refer to the Uncommenter
class.
Using the code
To use the code insert the directive
using UcommenterCS;
Then simply call the static
function to do the work. For example
Uncommenter.doUncomment("src.cpp");
That's it.
If you, however wish to run it standalone "out of the box" or simply like to try it out, I have included a compiled binary which is just as good. It is in the /bin
folder. To use it, issue the following command
UcommenterCS <source.cpp/c/cs/java/h/js>
Parsing capabilities
Comments within string delimeters should be avoided. This application is able to correctly ensure comments are not part of a string! It ignores comments delimeters between "
and "
blocks.
Future functionality includes detecting and warning against unterminated block comments and string literals with the option of breaking execution should they be found.
Because I'm not a compiler linguist, I am not able to think of all the possible scenarios in which the code may fail. However, if you are up for a challenge, you are welcome to attempt to break my code. If that happens, please do let me know.
History
- 1st version - 9th July
- 2nd update (repackaged under different class name and namespace and changed to static methods) - 14th September
I plan to write a Window Forms version in the not too distant future. Also in the works is a Java version.