Compiler and language 2

Asif Bahrainwala

5.00/5 (3 votes)

5 May 2015CPOL5 min read

12K

176

Download Expression_Evaluation.zip

Introduction

A continuation of my previous article related to Compiler and language , http://www.codeproject.com/Articles/802873/Compiler-and-language

Background

As this is a continuation, it would be best if you go through my previous article related to this topic. As before, the input program is provided using program.txt

A series of programs are provided in the attachment to give you a fair idea of this language's syntax. They exploit its function calling, array, auto variable capabilities

You will require a strong back ground in assembly programming since the output of a language is an assembly, though I have generated a C file with inline assembly to make it easier for us to view the results of the variables.

We will extend the language to support simple concepts (that we take for granted). As before the syntax must support both versions: compiler and interpreter, this allows users of this language to maintain code coherence.

Array

Array access via array index is supported using the following syntax:

z=1
s=20
ARRAY:a1:1+z=24+3*s

Here we can see that array 'a1' is written to at index 1+z, the value written is 24+3*s

Similarly

d=ARRAY:a1:z*2

Here we read the value of array a1 at index z*2

AUTO (variable on stack)

These variables have their scope limited to the function call , i.e. their scope is limited from the LABEL under which they are defined till the RETURN the program will encounter.

Usage of AUTO variables:

AUTO:a=10*20

This will create an auto variable 'a' and associate value 200 to it

c=AUTO:a

This will read the value from AUTO variable 'a', this variable must exists in the scope of the function call, else the asm code generated will be incorrect, (we can always place a simple check for this).

One optimization:- tail call : http://en.wikipedia.org/wiki/Tail_call

This optimization is enabled for VS2012 release build and gcc -o3 optimization

It is a simple optimization that checks if additional stack frame creation can be ignored and the code generated for a 'call' can be replaced by 'jmp' (this saves the trouble of pushing return address to stack), a 'return' from such optimization forces the code to return to a previous function.

Using the code

As with my previous articles, the code must be referred to at all times and debugged to get a better understanding of how it works.

Let's look at array support

We start by declaring arrays via DIM:a1:10, this creates a one dimension array of size 10, this is required during the compilation stage since this array size has to be declared globally, for the interpreter, this is ignored

For working with arrays, we need 2 temporary variables: __TarrayIndexProcessing_ , __TarrayValueProcessing_. These variables are used for calculating and holding array index and value at that index. You will notice that index processing for compilers have one additional step: to multiply by 4 since the size of int is 4 bytes to correctly access the required element . This is not required for interpreter since it holds the values using:

struct arrayvar{
 string m_name; 
 int index;
 DWORD sz;  //maintains the size of the whole array (only one instance is actually required per name)
};

map<arrayvar,int> arrayList;

Notice that arrayvar (used by interpreter) uses index and name to uniquely identify an element, please refer to code to view the comparator function required by the map.

AUTOs (variables on stack)

Variables on stack for interpreter is fairly straight forward, we must maintain the stack frame for every function call

This is done using the following line of code

stack< map<string,pair<UINT,int>> *> stackFrame; 

map<string,pair<UINT,int>> &autoStack=*stackFrame.top(); //maintains the current stack frame

Stack frame is created at the start of the program and at all times we point to the top of the stack, when a function call is encountered, a new stack frame is created and top of stack is updated, this ensures that the current stack is always maintained.

Please refer to CALL functionality implementation for the interpreter.

Similarly RETURN: functionality pops the stack and updates the top of stack.

Stack implementation for compiler is not very straight forward as the code has to parsed multiple times to find the definition of this variable in current LABEL scope.

If variable is not found within LABEL scope, then a variable is created by calling push element

The below function looks up auto variables from the source code, since we are in compilation phase the source code is available to find the index with respect to LABEL keyword.

int getVariablesIndex(string varName, UINT uiLineNumber)

It returns -1 if variable is not found else the index on stack frame if variable is found

eg:

AUTO:a=10
b=20
AUTO:c=30

Auto variable 'a' will be at index 1
Auto variable 'c' will be at index 2
Accessing these variables will be via EBP (this register is used for holding the frame pointer)
We never use ESP since its value can change when a push/pop/call is encountered
a will be accessed using EBP-4,
c will be used accessing EBP-8

Tail call:-

This optimization is similar for compilers and interpreter.

We parse through the file after the CALL is processed to look for an immediate RETURN, we ignore comments generated using '#'. If RETURN is found immediately after the CALL (ignoring comments) , we know that this code can be tail call optimized by placing a 'JMP' instead of a 'call', if AUTOs are used then additional code is required before RETURN for stack clean up and will not perform tail call optimization.

If RETURN is not immediately encountered , then the code has to return to a point after CALL to execute the statements found before RETURN, hence a 'call' will be used and not a 'JMP'.

This optimization is coded in place where 'CALL' language keyword is being evaluated, please refer to the code.

e.g.:-

The above program will output the below displayed assembly, notice that JMP f1 is used in place of call f1

f:
jmp f1
f1:
ret

The above program will output the below displayed assembly, notice that call f1 was used since tail call optimization was not used

f:
mov EBP,ESP
sub EBP,4
call f1
mov EBP,ESP
add EBP,4
MOV EAX,10
MOV a,EAX
ret

Code usage

The following input program:-

CALL:f
BREAK:
LABEL:f
AUTO:a=1020
CALL:f1
c=AUTO:a
# c will hold the value 200
RETURN:
LABEL:f1
AUTO:a=1030
c=AUTO:a
RETURN:

Notice that a call has been made to 'f' , an AUTO variable 'a' has been used and then a call to 'f1', and a new AUTO variable 'a' has been created , notice in the compile output that the stack will get cleaned up and when 'f1' returns , AUTO:a will correctly point to the variable belonging to its scope, therefore c will hold value '200' and not '300'.

Points of Interest

This article introduces the implementation of ARRAYs and AUTOs in compilers and interpreters. We also see the active use of EBP for maintaining stack frames so as to correctly access stack variables.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)