Keywords
C# parse tree, C# compiler, mono
Important Update
The original article describes how to grab the parse tree of C# code produced by the mono C# compiler. Unfortunately, the mono C# compiler code IS TOO HARD TO USE. Fortunately, another project CSharpDevelop (http://www.icsharpcode.net/OpenSource/SD/Default.aspx) provides a much better solution to parsing C#. The project I tried is located in
SharpDevelop_4.0.0.6721_Beta3_Source\src\Libraries\NRefactory\NRefactory.sln
Simply remove a file called GlobalAssembyInfo.cs and compile the solution. Set NRefactoryDemo project as the start up project. See screenshot. More work is required on my part to compare the CSharpDevelop Solution and the Mono Solution.
What is the C# Mono Compiler?
The Mono Project ships an open source C# Compiler among other software such as a .Net Base Class library implementation, a .Net virtual machine and so on. In fact mono ships a number of compilers: gmc, dmc for versions 3.0 and 4.0 of C# respectively. Those are derived from the same source code but are compiled under different configuration. The mono compiler is relatively complete and is likely to parse most of the C# source code one encounters in practice.
Use cases
An open source C# compiler is useful in a number of scenarios. On big group of applications is analysis of source code. Among this group are intellisense, code metrics and source code analysis applications. Another use of the C# compiler might be for code generation.
Alternatives
There are no alternative compilers or parsers for C# code that are available for use. The .Net BCL provides interfaces to parsing C# code, but those are not implemented. If Visual Studio is installed one might try to use some of the undocumented COM interfaces that come with it for parsing.
Challenges to using Mono C# Compiler
The greatest challenge in using the mono C# Compiler is that it is not designed with code reusablity in mind or meant as being used as a library. While mono C# compiler works, the usage of its code requires global understanding from the library user. Among the issues I encountered in practice were the following.
- Certain member variables or classes need to be public but are protected;
- Certain passes of the compiler over the code undo various fields on objects from the parse tree and as a result destroy parts of the parse tree.
- One has to trace the code from the parser into the parse tree to find out where certain information is stored as there are lots of fields that are misleadingly named.
- A forth challenge is the large number of classes used to represent the parse tree and the deep inheritance hierarchy. After getting familiar with the code this turned out to be the greatest challenge.
Goals
My goal is to show how to use the C# parser to dump information from the parse tree. I am not going to touch upon code generation in the mono C# compiler. I will show two applications. The first one is a parse tree browser. By using this application one is able to connect parts from the code as text to the parse tree in an interactive way. This application will be useful for any scenario where understanding the parser code is required. The second application is extraction of parts from the parse tree to be used in an application that searches source code.
Preparation for use of the Mono C# Compiler Code.
One can either download my code which contain modifications to the mono compiler or start from scratch by downloading mono.
- Option 1 (My code): My code is available as a Visual Studio 2010 project and contains the generated cs-parser.cs together with other modifications that I explain in this article. See beginning of article for download link.
download mono
compile the jay parser generator: mono-2.6.7\mcs\jay
generate cs-parser.cs from mono-2.6.7\mcs\mcs\cs-parser.jay
compile the dmcs project in mono-2.6.7\mcs\mcs\dmcs.csproj
I found it easiest to compile the complete mono project in Ubuntu Linux. I used apt-get to install additional libraries, then I ran configure and make. The file cs-parser.cs was produced in mono-2.6.7\mcs\mcs\ directory. I copied this file to my Visual Studio environment in Windows.
Understanding the code
Method 1
The code that launches the compiler is in driver.cs. I looked at this code to see how to instantiate a parse tree:
static ModuleContainer ParseFile(string fileName){
string[] args = new String [] {fileName, "--parse"};
CompilerCallableEntryPoint.Reset();
Mono.CSharp.Driver d = Mono.CSharp.Driver.Create(args, false, new ConsoleReportPrinter());
d.Compile();
return RootContext.root;
}
string fileName = @"monitor.cs"
ModuleContainer parseTree = ParseFile(fileName);
One thing I should note is that the parse tree appears as a static field in a class called RootContext. This approach may create problems if we parse more than one file and the static fields are not cleaned appropriately. To make this code run from an external project I had to modify the scope of a few classes and member fields to public. Before running the parser it is a good idea to clear the state by calling CompilerCallableEntryPoint.Reset()
There are a great number of classes that are used to represent pieces of information from the parse tree. It took me more than four hours to touch each one of them and add an additional interface. I suggest to use the application that comes this article to browse through the representation generated by a particular piece of code. As an example, a C# class is represented by an instance of an object called Class. Class contains as member variables lists of methods (class Method), constructors (class Constructor), fields (class Field), etc.
Often times it will be unclear by browsing the code where a particular piece of information is stored. As a specific example, consider the question where the types of method parameters are stored. To be concrete where is Bar in “void foo(Bar b)” stored. To answer this question we need actually look at the parser code in cs-parser.jay. The C# file, cs-parsers.cs is automatically generated and not understandable by programmers. Looking for the keyword method in cs-parser.jay I find that methods are parsed via method_declaration. In turn the arguments to methods are parsed via method_header.
Inspection of method_header shows that the type of arguments is a FullNamedExpression which is stored in the Method object.
method_header
: opt_attributes
opt_modifiers
member_type
method_declaration_name
...
method = new Method (current_class, generic, (FullNamedExpression) $3, (int) $2, name, ...
Then I look at the class Method implementation to find out that I can obtain the type of method arguments with code like this:
foreach (var maybeParam in m.Parameters.FixedParameters)
{
var param = (maybeParam as Parameter);
var typeName = "?";
if (param.TypeName is TypeNameExpression)
{
typeName = (param.TypeName as TypeNameExpression).name;
}
else
{
typeName = (param.TypeName as TypeLookupExpression).name;
}
}
Upon looking in this code, I am unsure if I have properly handled all the cases of obtaining the expression name. FullNamedExpression is an abstract class and I need to find all subclasses and properly obtain the name of the type. Using the class diagram tool from Visual Studio I can see which classes I need to touch. It is probably best to have an abstract getter property called Name which to override in the subclasses of FullNamedExpression.
Method 2
As a second example consider the question of where the using directives such as "using System.Collections;" are stored. The suggested approach is the following. Create two examples which are the same initially, but the second example includes the using directive. Then dump the parse trees of both examples to two text files and run diff to find out the differences. I generate the parse trees via two different methods. One that uses reflection and one that uses the interface. The advantage of reflection is that is simple to implement. Its disadvantage however is the large amount of data that it dumps. As an alternative I have implemented a different strategy: one that list only the information I ask for. Each class that represents information from the parser needs to implement an interface and export the interesting pieces of data.
Application 1.
The application I want to show is a browser of parsed C# code. I already described the techniques to find my way through the mono C# compiler code. For all relevant classes I will implement an interface called IVisitable which will help me navigate through the parse tree.
As I already mentioned, there are two modes:
- the first one uses an interface, and exports what I tell it to
- the second one uses reflection and forbids exports of what I tell it to
Both modes are complimentary and are shown in the screenshots.
To demonstrate the use of this application consider the "Hello world" program. I simply want to dump the parse tree in a TreeView control and look at the C# representation of the parse tree.
This article is still work in progress. Please check back soon for the finished version of the article. Comments or corrections are appreciated.