Introduction
In this tip, I am demonstrating CodeConvert
, a Java based translator which converts arithmetic expressions to 3-address codes.
This program was written as a part of Compiler Design course and its aim was to help students familiarize with various compiler modules, for example: lexer, parser, intermediate code generator and optimizer.
Using the Program
To run executable jar file, open terminal and do:
$ cd demo
$ java -jar CodeConvert.jar
Instruction File Path: input_file_path
CodeConvert
takes input in the form of text file which contains arithmetic expressions. The content of input file is converted to 3-address codes which may contain many redundancies. Finally, the program performs various optimization techniques, such as Constant folding, Algebric simplification, Copy propagation and Dead code removal to give final optimized code.
Input | Unoptimized 3-address codes | Codes after 4 optimization rounds | Final output (8 optimization rounds) |
x = 9
y = sin(23) + cos(76)
z = x*x + 2*y*x + y*y
x = z/2 + y/2
y = (x + y - z * (z - 2))/(x/z + y/z + 2) | x = 9 + 0
y = sin(23) + cos(76)
t_1 = x * x
t_2 = 2 * y
t_3 = t_2 * x
t_4 = t_1 + t_3
t_5 = y * y
z = t_4 + t_5
t_6 = z / 2
t_7 = y / 2
x = t_6 + t_7
t_8 = x + y
t_9 = z - 2
t_10 = z * t_9
t_11 = t_8 - t_10
t_12 = x / z
t_13 = y / z
t_14 = t_12 + t_13
t_15 = t_14 + 2
y = t_11 / t_15 | x = 9
y = 0
t_4 = 81
z = 81
t_6 = 81 / 2
x = t_6
t_9 = 81 - 2
t_10 = 81 * t_9
t_11 = t_6 - t_10
t_12 = t_6 / 81
t_13 = 0 / 81
t_14 = t_12 + t_13
t_15 = t_14 + 2
y = t_11 / t_15 | x = 9
y = 0
z = 81
x = 40
y = -3179 |
Language
Note from the example in the previous section that the input file has a specific structure.
- Each line contains only one expression
- Each expression can have only a variable on LHS. Numbers or other expressions are not allowed in LHS
- RHS may contain nested expression with both numbers and variables.
- Any variable can be used on LHS any number of times.
- Expression may contain undefined variables in them, unlike the example given above.
-
Some other functions are also supported with the help of JExel. These functions are:
max
, min
floor
, ceil
neg
, abs
cos
, sin
, tan
acos
, asin
, atan
deg
, rad
Alternatives
This program was written to help understand different parts of compiler and challenges faced in their implementation. Instead of writing these parts from scratch, one can simply use existing utilities such as flex and bison to build lexer and parser.
Also I have assumed that input file follows the structure specified above and contains no syntactic errors. Alternatively, one can modify the source code and place checks for correctness.
Brief Description of Working
1. Reading the File
File is read line-by-line and its contents are saved in an ArrayList
. Each item in ArrayList
corresponds to one expression (one line from file).
See Test.java, line 60 to 68
while(scan.hasNextLine()){
lines.add(scan.nextLine());
System.out.println(lines.get(lines.size()-1)); // Print input line
}
2. Converting to Token Stream
Each element of ArrayList
is processed to find tokens using regular expressions.
See Builder/TreeBuilder.java, function GetTokens, line 81
[a-zA-Z_]+[a-zA-Z0-9_]*\\([^()]*\\) to find functions e.g. floor(2.1), cos(9.1)
[0-9]+ to find numbers e.g. 9, 13, 103
[a-zA-Z_][a-zA-Z0-9_]* to find variables e.g. abc, _velocity, speed0
\\Q(\\E to find opening brackets (
\\Q)\\E to find closing brackets )
Each line of input file is itself converted to an ArrayList
of tokens by the function GetTokens
referred above.
3. Token Stream to Syntax Tree
Token stream from previous section is now passed to function GenerateTree
which reads tokens one-by-one and identifies its type. Utilizing tokens' types, it places them accordingly in a Binary Tree. Here we have to take care of operator precedence and importance of brackets. We define root
as the root node of tree generated so far. current
is a new node containing current token and is not attached to the tree yet.
- If
root
is a number and current
is an operator, root
will become the left child of current
. current
will become new root
of the tree. - If
root
has lower precedence than current
, then it will remain root
of the tree and current
will become its right child. - Entities within brackets
(...)
will be treated as units for all outside nodes. We define current_br
as the root of subexpression within brackets, it is yet not attached to the expression tree. - If
root
has higher precedence than current
and current
is not a current_br
then root
will replace left child of current
. The original left child of current
will become right child of root
. current
will become new root
. - If
root
has higher precedence than current
but current
is a current_br
then root
will remain root and will take current_br
as right child.
Actions in other situations are simpler and if you wish to check them, see Builder/TreeBuilder.java
, line 96-169
. There may be flaws in this logic but all the test expressions have given desired results so far. To test validity of any expression, you can print its expression tree using the TreePrinter
class as follows:
TreeBuilder tb = new TreeBuilder();
Node n = tb.GenerateTree("a*b-p/(a+b)"); // build syntax tree
n = ThreeAddressBuilder.normalize(n); // hack for special case
TextNode m = TreePrint.TreePrinter.TreeToTextTree(n); // convert to printable TextNode
String print = TreePrint.TreePrinter.TreeString(m);
System.out.println(print);
The above code will print the tree in standard output as:
[-]
/ \
/ \
/ \
/ \
/ \
[*] [/]
/ \ / \
[a] [b] [p] \
\
\
\
[+]
/ \
[a] [b]
4. Syntax Tree to 3-Address Codes
Every single expression will now be converted to a series of 3-address codes. Class ExprBuilder
in Builder
package does this job. Tree is traversed top-bottom and temporary variables are defined to build 3-address codes.
See Builder/ThreeAddressBuilder.java, line 65-77
ArrayList<Instruction> i = new ArrayList<>(); // temporary storage for 3-address codes
Node n = tb.GenerateTree(s[1].trim());
n = normalize(n);
ExprBuilder.Parse(i, n, start); // convert tree to 3-address codes, start: starting index of temp-variables
code.addAll(i); // merge 3-address codes of current expression to codes so far
start = LastNumbered(code) + 1; // update starting index of temp-variables of next series so to avoid confusion
This stage will give us unoptimized 3-address codes of entire input file.
5. Optimization
See Optimizer.java
in Optimize
package to check working of various optimization techniques.
See Test.java, line 102-109
boolean optimized = true;
int i = 1;
while(optimized && i < MAX_ROUNDS){ // try optimization as long as possible
optimized = false;
optimized = op.Optimize(ins);
i++;
}
Features Missing
Some features are not implemented but I may do so in future. These are:
- Custom operators and with defined semantics. See
Elements/Operator.java
- Associativity of operators may have bugs
- Optimization implementation is very basic and can be extended to add other optimizations