Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / web / HTML

3-Address Code Generator

5.00/5 (4 votes)
24 May 2014GPL35 min read 65.9K   2.9K  
A Java program to translate arithmetic expression to three-address codes

Sample Image - maximum width is 600 pixels

Introduction

In this tip, I am demonstrating CodeConvert, a Java based translator which converts arithmetic expressions to 3-address codes.

This program was written as a part of Compiler Design course and its aim was to help students familiarize with various compiler modules, for example: lexer, parser, intermediate code generator and optimizer.

Using the Program

To run executable jar file, open terminal and do:

$ cd demo
$ java -jar CodeConvert.jar
  Instruction File Path: input_file_path

CodeConvert takes input in the form of text file which contains arithmetic expressions. The content of input file is converted to 3-address codes which may contain many redundancies. Finally, the program performs various optimization techniques, such as Constant folding, Algebric simplification, Copy propagation and Dead code removal to give final optimized code.

Input Unoptimized 3-address codes Codes after 4 optimization rounds Final output (8 optimization rounds)
x = 9
y = sin(23) + cos(76)
z = x*x + 2*y*x + y*y
x = z/2 + y/2
y = (x + y - z * (z - 2))/(x/z + y/z + 2)
x = 9 + 0
y = sin(23) + cos(76)
t_1 = x * x
t_2 = 2 * y
t_3 = t_2 * x
t_4 = t_1 + t_3
t_5 = y * y
z = t_4 + t_5
t_6 = z / 2
t_7 = y / 2
x = t_6 + t_7
t_8 = x + y
t_9 = z - 2
t_10 = z * t_9
t_11 = t_8 - t_10
t_12 = x / z
t_13 = y / z
t_14 = t_12 + t_13
t_15 = t_14 + 2
y = t_11 / t_15
x = 9
y = 0
t_4 = 81
z = 81
t_6 = 81 / 2
x = t_6
t_9 = 81 - 2
t_10 = 81 * t_9
t_11 = t_6 - t_10
t_12 = t_6 / 81
t_13 = 0 / 81
t_14 = t_12 + t_13
t_15 = t_14 + 2
y = t_11 / t_15
x = 9
y = 0
z = 81
x = 40
y = -3179

Language

Note from the example in the previous section that the input file has a specific structure.

  1. Each line contains only one expression
  2. Each expression can have only a variable on LHS. Numbers or other expressions are not allowed in LHS
  3. RHS may contain nested expression with both numbers and variables.
  4. Any variable can be used on LHS any number of times.
  5. Expression may contain undefined variables in them, unlike the example given above.
  6. Some other functions are also supported with the help of JExel. These functions are:

    • max, min
    • floor, ceil
    • neg, abs
    • cos, sin, tan
    • acos, asin, atan
    • deg, rad

Alternatives

This program was written to help understand different parts of compiler and challenges faced in their implementation. Instead of writing these parts from scratch, one can simply use existing utilities such as flex and bison to build lexer and parser.

Also I have assumed that input file follows the structure specified above and contains no syntactic errors. Alternatively, one can modify the source code and place checks for correctness.

Brief Description of Working

1. Reading the File

File is read line-by-line and its contents are saved in an ArrayList. Each item in ArrayList corresponds to one expression (one line from file).

See Test.java, line 60 to 68

while(scan.hasNextLine()){
    lines.add(scan.nextLine());
    System.out.println(lines.get(lines.size()-1));   // Print input line
}

2. Converting to Token Stream

Each element of ArrayList is processed to find tokens using regular expressions.

See Builder/TreeBuilder.java, function GetTokens, line 81

[a-zA-Z_]+[a-zA-Z0-9_]*\\([^()]*\\) to find functions e.g. floor(2.1), cos(9.1)
[0-9]+                              to find numbers   e.g. 9, 13, 103
[a-zA-Z_][a-zA-Z0-9_]*              to find variables e.g. abc, _velocity, speed0
\\Q(\\E                             to find opening brackets (
\\Q)\\E                             to find closing brackets )

Each line of input file is itself converted to an ArrayList of tokens by the function GetTokens referred above.

3. Token Stream to Syntax Tree

Token stream from previous section is now passed to function GenerateTree which reads tokens one-by-one and identifies its type. Utilizing tokens' types, it places them accordingly in a Binary Tree. Here we have to take care of operator precedence and importance of brackets. We define root as the root node of tree generated so far. current is a new node containing current token and is not attached to the tree yet.

  1. If root is a number and current is an operator, root will become the left child of current. current will become new root of the tree.
  2. If root has lower precedence than current, then it will remain root of the tree and current will become its right child.
  3. Entities within brackets (...) will be treated as units for all outside nodes. We define current_br as the root of subexpression within brackets, it is yet not attached to the expression tree.
  4. If root has higher precedence than current and current is not a current_br then root will replace left child of current. The original left child of current will become right child of root. current will become new root.
  5. If root has higher precedence than current but current is a current_br then root will remain root and will take current_br as right child.

Actions in other situations are simpler and if you wish to check them, see Builder/TreeBuilder.java, line 96-169. There may be flaws in this logic but all the test expressions have given desired results so far. To test validity of any expression, you can print its expression tree using the TreePrinter class as follows:

TreeBuilder tb = new TreeBuilder();
Node n = tb.GenerateTree("a*b-p/(a+b)");                // build syntax tree
        
n = ThreeAddressBuilder.normalize(n);                   // hack for special case
        
TextNode m = TreePrint.TreePrinter.TreeToTextTree(n);   // convert to printable TextNode
String print = TreePrint.TreePrinter.TreeString(m);     

System.out.println(print);

The above code will print the tree in standard output as:

            [-]
           /   \
          /     \
         /       \
        /         \
       /           \
    [*]             [/]
   /   \           /   \
[a]     [b]     [p]     \
                         \
                          \
                           \
                            [+]
                           /   \
                        [a]     [b]

4. Syntax Tree to 3-Address Codes

Every single expression will now be converted to a series of 3-address codes. Class ExprBuilder in Builder package does this job. Tree is traversed top-bottom and temporary variables are defined to build 3-address codes.

See Builder/ThreeAddressBuilder.java, line 65-77

ArrayList<Instruction> i = new ArrayList<>();       // temporary storage for 3-address codes
Node n = tb.GenerateTree(s[1].trim());
n = normalize(n);
            
ExprBuilder.Parse(i, n, start);        // convert tree to 3-address codes, start: starting index of temp-variables
code.addAll(i);                        // merge 3-address codes of current expression to codes so far

start = LastNumbered(code) + 1;        // update starting index of temp-variables of next series so to avoid confusion

This stage will give us unoptimized 3-address codes of entire input file.

5. Optimization

See Optimizer.java in Optimize package to check working of various optimization techniques.

See Test.java, line 102-109

boolean optimized = true;
int i = 1;
        
while(optimized && i < MAX_ROUNDS){     // try optimization as long as possible
    optimized = false;
            
    optimized = op.Optimize(ins);
    i++;
}

Features Missing

Some features are not implemented but I may do so in future. These are:

  1. Custom operators and with defined semantics. See Elements/Operator.java
  2. Associativity of operators may have bugs
  3. Optimization implementation is very basic and can be extended to add other optimizations

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)