Introduction
The C preprocessor is a useful tool that allows programmers to work around the limitations of C. It was so abused to do clever and un-maintainable things that C++ programmers now avoid it, and in fact, it has been left out of more modern languages like C#. If you really wanted to, you could put a separate preprocessor step in (using a standalone preprocessor like CPP) but you would probably not be thanked by those who have to read the code later.
The class I'll present here does the lexical substitution part of the C preprocessor. You can add macros which behave like C macros and which can do useful things like 'stringizing' and token-pasting. Then calling the Substitute()
method will return your string after all macros have been expanded. It is not intended to replace standalone C preprocessors (that has been done repeatedly) but it is sometimes convenient to have the ability to make complex macro substitutions. I wrote this class as part of an interpreter project, and it makes typing in an interactive session much easier. It's also a cool way to explore C macro substitution interactively, and shows how .NET regular expressions can make complex text substitutions easier. The full implementation is only 130 lines of C#, including comments!
Background
System.Text.RegularExpressions.Regexp
is a marvelous class and everyone should be familiar with it. It is very useful when making intelligent, possibly context-sensitive replacements in text. The idea is to find the match, and then depending on the matched value, replace the match with some other text. For instance, the formal parameters of a macro are the dummy variables which were used when defining it; the preprocessor will then find the actual parameters and have to substitute these into the macro text.
Here is an example from MacroSubstitutor
where we have to replace the formal parameters of a macro with the actual parameters. The code has been slightly simplified (ignoring '#') to show the match/replace loop more clearly.
public class MacroEntry {
public string Subst;
public string[] Parms;
}
...
static Regex iden = new Regex(@"[a-zA-Z_]\w*");
public string ReplaceParms(MacroEntry me, string[] actual_parms) {
Match m;
int istart = 0;
string subst = me.Subst;
while ((m = iden.Match(subst,istart)) != Match.Empty) {
int idx = Array.IndexOf(me.Parms,m.Value);
int len = m.Length;
if (idx != -1) {
string actual = actual_parms[idx];
subst = iden.Replace(subst,actual,1,istart);
len = actual.Length;
}
istart = m.Index + len;
}
return subst;
}
Please note the overloaded forms of Regex.Match
and Regex.Replace
that are used here. They allow you to specify the starting position where the matching and replacing begin and how many replacements to actually make (by default, it does a global replace, which is not good for context-sensitive replacements). Obviously, it could be faster, but it's fast enough for what I needed it to do, and (most importantly) it's concise and readable code.
The tricky method is MacroSubstitutor.Substitute
, because we have to be careful extracting the arguments passed to the macro. At first, this seems easy; grab the string up to ')' and split using ','. But consider FOR(i,f(x,g(y)))
; it is necessary to carefully count the bracket level and only pull in arguments at level one. Not everything can be done with regular expressions, and sometimes a loop is more obvious.
int idx = 0, isi = i;
while (parenDepth > 0) {
if (parenDepth == 1 && (str[i] == ',' || str[i] == ')')) {
actuals[idx] = str.Substring(isi,i - isi);
idx++;
isi = i+1;
}
if (str[i] == '(') parenDepth++; else
if (str[i] == ')') parenDepth--;
i++;
}
The interesting point is that the results of substitutions are themselves examined for any new macros to be substituted. We start our next match at the newly substituted text. It works like this:
FOR(i,N)
for(int i = 0; i < (N); i++)
for(int i = 0; i < (20); i++)
Using the code
The class is straightforward to use:
static void Test() {
MacroSubstitutor ms = new MacroSubstitutor();
ms.AddMacro("FOR","for(int i = 0; i < (n); i++)",new string[]{"i"});
ms.AddMacro("N","20",null);
Console.WriteLine(ms.Substitute("FOR(k,N)"));
}
There's an even easier method, where you let the class handle the macro definitions. These look just like C preprocessor #define
s, except that I prefer #def
because it's easier to type in a hurry. Here is the complete source for a simple interactive preprocessor.
class MacroTest {
static string Read() {
Console.Write(">> ");
return Console.In.ReadLine();
}
static void Main() {
MacroSubstitutor ms = new MacroSubstitutor();
string line;
while ((line = Read()) != null)
Console.WriteLine(ms.ProcessLine(line));
}
}
Here is an example of an interactive session:
>> #def N 20
>> #def write Console.WriteLine
>> #def FOR(i,n) for(int i = 0; i < (n); i++)
>> FOR(k,N) write(k);
for(int k = 0; k < (20); k++) Console.WriteLine(k);
>> #def quote(x) #x
>> #def cat(x,y) x##y
>> quote(cat(dog,mouse))
"dogmouse"
Occasionally, you will need to customize the lookup. The example here is where we want to replace DATE
and USER
with their values at the time of substitution. MacroSubstitutor
provides a CustomReplacement
method which can be overridden; it will be called if the macro replacement for a symbol is a special value MacroEntry.Custom
:
class MyMacroSubstitutor : MacroSubstitutor {
public override string CustomReplacement(string s) {
switch(s) {
case "DATE": return DateTime.Now.ToString();
case "USER": return Environment.GetEnvironmentVariable("USERNAME");
}
return "";
}
}
static MacroSubstitutor ms = new MyMacroSubstitutor();
...
ms.AddMacro("DATE",MacroEntry.Custom,null);
ms.AddMacro("USER",MacroEntry.Custom,null);
Alternatively, we could have used a delegate here, but a good old-fashioned virtual method does the job nicely. When C# 2.0 becomes widely available, it will be a pleasure to be able to write code using anonymous delegate functions:
ms.AddMacro("DATE",delegate(string s) {
return DateTime.Now.ToString();
});