Home Made Java Virtual Machine

Maruf Maniruzzaman

4.92/5 (87 votes)

16 Mar 2008GPL317 min read

6.2K

Functional java virtual machine - can run java applications for most of the instructions.

Introduction

Back in year 2004 I had to choose a thesis topic as a prerequisite of completion of my undergraduate course in Computer Science and Engineering. I had choosen Process Migration. And our teacher Mahmud Shahriar Hossain agreed to supervise my work. My partner was Md. Helal Uddin. As part of the thesis I had to implement a Java Virtual Machine. I wanted to write an article since then. But it never actually happened. Today (March 2) is my birth day and I want to start it. The virtual machine is also used in my new project Morpheus - a prototype of Silverlight 1.1. The seminar presentation downloadable from above link shows how a JVM works. You may also look at the JVM source code from above link. Please note that most the implementation decision taken may not match with other commercially available JVM implementation. Whenever JVM Spec is found to say nothing, the most easiest approach is taken to save time.

Java Virtual Machine Parts

Class File Structure

The java virtual machine needs an application that is made up of collection of java classes. At the beginning of any class there is a defined structure like JavaClassFileFormat.

struct  JavaClassFileFormat
{
    u4 magic;
    u2 minor_version;
    u2 major_version;
    u2 constant_pool_count;
    cp_info **constant_pool; //[constant_pool_count-1];
    u2 access_flags;
    u2 this_class;
    u2 super_class;
    u2 interfaces_count;
    u2* interfaces; //[interfaces_count];
    u2 fields_count;
    field_info_ex *fields; //[fields_count];
    u2 methods_count;
    method_info_ex* methods; //[methods_count];
    u2 attributes_count;
    attribute_info** attributes; //[attributes_count];
};

Following are the structures used in the format. They represents constant pool (constant values used in class files), fields, methods and attributes in a class file. I'll describe them in details later.

struct cp_info
{
    u1 tag;
    u1* info;
};

struct field_info
{
    u2 access_flags;
    u2 name_index;
    u2 descriptor_index;
    u2 attributes_count;
    attribute_info* attributes; //[attributes_count];
};

struct method_info
{
    u2 access_flags;
    u2 name_index;
    u2 descriptor_index;
    u2 attributes_count;
    attribute_info* attributes; //[attributes_count];
};

struct attribute_info
{
    u2 attribute_name_index;
    u4 attribute_length;
    u1* info;//[attribute_length];
};

We first load the class file in memory as raw byte and then use JavaClass object to parse the raw bytes and identify the fields, methods, exception table etc. The JavaClass class represent a class in memory in structured form. It holds a pointer to the raw byte stream that was loaded from class file.

Java Class in Memory

Here we just inherit the JavaClassFileFormat for simplicity of storing the values in in memory class representation. We must parse the raw class file in memory to get the values for JavaClassFileFormat fields.

class JavaClass: public JavaClassFileFormat
{
public:
    JavaClass(void);
    virtual ~JavaClass(void);

public:
    virtual BOOL LoadClassFromFile(CString lpszFilePath);
    void SetByteCode(void* pByteCode);

    BOOL ParseClass(void);
    BOOL ParseInterfaces(char* &p);
    BOOL ParseFields(char* &p);
    BOOL ParseMethods(char* &p);
    BOOL ParseAttributes(char* &p);
    BOOL GetConstantPool(u2 nIndex, cp_info& const_pool);

    BOOL GetStringFromConstPool(int nIndex,CString& strValue);
    CString GetName(void);
    CString GetSuperClassName(void);
    BOOL ParseMethodCodeAttribute(int nMethodIndex, Code_attribute* pCode_attr);
    int GetMethodIndex(CString strMethodName, CString strMethodDesc,
          JavaClass* &pClass);
    int GetFieldIndex(CString strName, CString& strDesc);
    void SetClassHeap(ClassHeap *pClassHeap){this->m_pClassHeap=pClassHeap;}
    virtual u4 GetObjectSize(void);
    virtual u4 GetObjectFieldCount(void);
    JavaClass* GetSuperClass(void);
    BOOL CreateObject(u2 index, ObjectHeap *pObjectHeap, Object& object);
    BOOL CreateObjectArray(u2 index, u4 count, ObjectHeap *pObjectHeap,
       Object& object);
private:
    size_t m_nByteCodeLength;
    void *m_pByteCode;
    u2    m_nObjectFieldsCount;
    BOOL ParseConstantPool(char* &p);
    int GetConstantPoolSize(char* p);
    ClassHeap *m_pClassHeap;
};

To do that we first load the file in memory and then call the ParseClass method which is shown in the next section.

Class Loader

As there are some variable length fields it is not possible to directly load the structure. So we load the values one by one. First we load the value of magic which is an unsigned integer (u4) value. It must be value of 0xCafeBabe. If it is not the class file may be either corrupted or not a java class file at all. Then we load other values and structures. To load structures we first load the count and then load the structure. For example first we load short (u2) value constant_pool_count and then load that number of constant pool. To parse I used definitions getu4(p) or similar which just picks 4 bytes starting at p and returns the unsigned int value. To parse structures hare use sepaate methods like ParseConstantPool. It takes the reference of the byte stream pointer and increments in that class. I did this only for simplicity. It would be more readable if I'd return the total length and increent in ParseClass method but that would me less managable.

BOOL JavaClass::ParseClass(void )
{
    //just to be safe
    if (m_pByteCode==NULL ||
            m_nByteCodeLength < sizeof (JavaClassFileFormat)+20)
        return FALSE;
    char *p=( char *)m_pByteCode;
    magic = getu4(p); p+=4;
    ASSERT(magic == 0xCAFEBABE);

    if(magic != 0xCAFEBABE)
        return FALSE;

    minor_version=getu2(p); p+=2;
    major_version=getu2(p); p+=2;
    constant_pool_count=getu2(p); p+=2;

    if (constant_pool_count>0)
        ParseConstantPool(p);

    access_flags=getu2(p); p+=2;
    this_class=getu2(p); p+=2;
    super_class=getu2(p); p+=2;
    interfaces_count=getu2(p); p+=2;

    if (interfaces_count>0)
        ParseInterfaces(p);

    fields_count=getu2(p); p+=2;

    if (fields_count > 0)
        ParseFields(p);

    methods_count = getu2(p);p+=2;

    if (methods_count > 0)
    {
        ParseMethods(p);
    }
    attributes_count = getu2(p);p+=2;

    if (attributes_count > 0)
        ParseAttributes(p);

    return 0;
}

Constant Pool

In a java class several constant value is stored. It stores numeric, string and reference values in a pool and those are used in machine codes known as 'Java Byte Code'. Constant pool contains constant_pool_count items in a sequential list of following structure.

struct cp_info
{
    u1 tag;
    u1* info;
};

Constant pool information structure starts with one byte of tag information that indicates the type of the constant pool. Constant pool structure length is variable depending on the type of constant. Constant pool tag value can be one of the following values depending on the type of constant.

#define CONSTANT_Integer  3
#define CONSTANT_Float  4
#define CONSTANT_Long  5
#define CONSTANT_Double  6
#define CONSTANT_Utf8  1
#define CONSTANT_String  8

#define CONSTANT_Class  7
#define CONSTANT_Fieldref  9
#define CONSTANT_Methodref  10
#define CONSTANT_InterfaceMethodref  11
#define CONSTANT_NameAndType  12

Depending on the value of tag we can cast the cp_info structure in more precise structures listed here.

CONSTANT_Integer_info

If tag value equals CONSTANT_Integer in cp_info structure then it is an integer constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Integer_info {
    u1 tag;
    u4 bytes;
};

This structure does not have any reference to any other constant. It represents direct 4 byte integer value.

CONSTANT_Float_info

If tag value equals CONSTANT_Float in cp_info structure then it is a float constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Float_info {
    u1 tag;
    u4 bytes;
};

It is a direct value constant without any reference.

CONSTANT_Long_info

If tag value equals CONSTANT_Long in cp_info structure then it is a long constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Long_info {
    u1 tag;
    u4 high_bytes;
    u4 low_bytes;
};

It is a direct value constant without any reference. It uses two four bute values to construct the 8 byte long value.

CONSTANT_Long_info

If tag value equals CONSTANT_Double in cp_info structure then it is a double constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Double_info {
    u1 tag;
    u4 high_bytes;
    u4 low_bytes;
};

It is a direct value constant without any reference. It uses two four bute values to construct the 8 byte double value.

CONSTANT_Utf8_info

If tag value equals CONSTANT_Utf8 in cp_info structure then it is a utf8 string constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1* bytes;//[length];
};

It is a direct value constant without any reference. The short value length defines the length of the byte array followed by length number of bytes.

CONSTANT_String_info

If tag value equals CONSTANT_String in cp_info structure then it is a string reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_String_info {
    u1 tag;
    u2 string_index;
};

It is a reference value constant. The short value string_index refers to a CONSTANT_Utf8_info index in the constant pool.

CONSTANT_Class_info

If tag value equals CONSTANT_Class in cp_info structure then it is a class reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Class_info {
     u1 tag;
       u2 name_index;
};

It is a reference value constant. The short value name_index refers to a CONSTANT_Utf8_info index in the constant pool that is the fully qualified name (ie java/lang/String) of the class- dot replaced by slash.

CONSTANT_Fieldref_info

If tag value equals CONSTANT_Fieldref in cp_info structure then it is a field reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Fieldref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
};

It is a reference value constant. The short value class_index refers to a CONSTANT_Class_info index in the constant pool and name_and_type_index refers to a string index in the constant pool that is the fully qualified name (ie java/lang/String@valueOf(F)Ljava/lang/String;) of the class- dot replaced by slash.

CONSTANT_Methodref_info

If tag value equals CONSTANT_Methodref in cp_info structure then it is a method reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
};

CONSTANT_InterfaceMethodref_info

If tag value equals CONSTANT_InterfaceMethodref in cp_info structure then it is an interface method reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_InterfaceMethodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
};

CONSTANT_NameAndType_info

If tag value equals CONSTANT_NameAndType in cp_info structure then it is an interface method reference constant. We can cast the cp_info structure in following structure.

struct CONSTANT_NameAndType_info {
    u1 tag;
    u2 name_index;
    u2 descriptor_index;
};

It is a reference value constant. The short value name_index refers to a string index in the constant pool and descriptor_index refers to another string index in the constant pool.

Parsing constant pool

Here we set the values of constant pool list pointers. When we need to retrieve the actual value we look at the tag and pick the value directly.

BOOL JavaClass::ParseConstantPool(char* &p)
{
    constant_pool = new cp_info*[constant_pool_count-1];
    if(constant_pool == NULL) return FALSE;
    for(int i=1;i<constant_pool_count;i++)
    {
        //We set the constant pointer here
        constant_pool[i]=(cp_info*)p;

        //We now calculate constant size. If it is an integer we get size = 5
        int size = GetConstantPoolSize(p);
        p+= size;

        // If constant type is long or double constant pool takes two entries.
        // Second entry is not used by virtual machine but kept NULL to walk
        // constant pool correctly.
        if(constant_pool[i]->tag == CONSTANT_Long || constant_pool[i]->tag ==
          CONSTANT_Double)
        {
            constant_pool[i+1]=NULL;
            i++;
        }
    }
    return TRUE;
}

Interfaces

In the interfaces field of a class there are interfaces_count number of short (u2) values. Each value is a reference to a constant pool entry of type CONSTANT_Class. We parse them and store in our in-memory object-

BOOL JavaClass::ParseInterfaces(char* &p)
{
    interfaces = new u2[interfaces_count];
    for(int i=0;i<interfaces_count;i++)
    {
        interfaces[i] = getu2(p); p+=2;
    }

    return TRUE;
}

Fields

A class may contain zero, one or more fields. The actual number is stored in the fields_count field. A list of field_info structure followes this value.

struct field_info
{
    u2 access_flags;
    u2 name_index;
    u2 descriptor_index;
    u2 attributes_count;
    attribute_info* attributes;//[attributes_count];
};

The short value access_flags describes the allowed field access. Here is the possible access flags values which is shared also by methods and classes-

#define ACC_PUBLIC  0x0001
/*Declared public; may be accessed from outside its package.  */

#define ACC_PRIVATE  0x0002
/*Declared private; accessible only within the defining class.  */

#define ACC_PROTECTED  0x0004
/*Declared protected; may be accessed within subclasses.  */

#define ACC_STATIC  0x0008  /*Declared static.  */

#define ACC_FINAL  0x0010
/*Declared final; may not be overridden.  */

#define ACC_SYNCHRONIZED  0x0020
/*Declared synchronized; invocation is wrapped in a monitor lock.  */

#define ACC_NATIVE  0x0100
/*Declared native; implemented in a language other than Java.  */

#define ACC_ABSTRACT  0x0400
/*Declared abstract; no implementation is provided.  */

#define ACC_STRICT  0x0800
/*Declared strictfp; floating-point mode is FP-strict  */

The name_index and descriptor_index are reference to two constant pool of type utf8 string. The attributes field defined the attributes of the field. Attributes are described later. Here is how we parse fields in a class's raw bytes-

BOOL JavaClass::ParseFields(char* &p)
{
    fields = new field_info_ex[fields_count];
    if(fields == NULL) return FALSE;

    for(int i=0;i<fields_count;i++)
    {
        fields[i].pFieldInfoBase = (field_info*)p;

        fields[i].access_flags= getu2(p); p+=2; //access_flags
        fields[i].name_index= getu2(p);p+=2; //
        fields[i].descriptor_index= getu2(p);p+=2; //
        fields[i].attributes_count=getu2(p); p+=2;

        if(fields[i].attributes_count>0)
        {
            //skip attributes - we do not need in simple cases
            for(int a=0;a<fields[i].attributes_count;a++)
            {
                u2 name_index=getu2(p); p+=2;
                //printf("Attribute name index = %d\n", name_index);
                u4 len=getu4(p);p+=4;
                p+=len;
            }
        }
    }
    return TRUE;
}

Methods

A java class file may contain arbitrary number of methods. The count is stored in methods_count member of class file structure. As it is a two byte field the theoritical upper limit is essentially 2^16. Like fields info, method info structure contains access flags, name index, descriptor index, and attributes.

struct method_info
{
    u2 access_flags;
    u2 name_index;
    u2 descriptor_index;
    u2 attributes_count;
    attribute_info* attributes;//[attributes_count];
};

Method body (if any) is stored in an attribute named Code- which contains the actual 'Java Byte Code'. Here is how we parse methods in out virtual machine-

//TODO: Cashe the findings here
BOOL JavaClass::ParseMethods(char* &p)
{
    methods = new method_info_ex[methods_count];

    if(methods == NULL) return FALSE;

    for(int i=0;i<methods_count;i++)
    {
        //methods[i] = new method_info_ex;
        methods[i].pMethodInfoBase=(method_info*)p;
        methods[i].access_flags= getu2(p);    p+=2; //access_flags
        methods[i].name_index = getu2(p); p+=2; //name_index
        methods[i].descriptor_index= getu2(p); p+=2; //descriptor_index
        methods[i].attributes_count=getu2(p); p+=2;

        CString strName, strDesc;
        GetStringFromConstPool(methods[i].name_index, strName);
        GetStringFromConstPool(methods[i].descriptor_index, strDesc);

        TRACE(_T("Method = %s%s\n"),strName, strDesc);

        TRACE("Method has total %d attributes\n",methods[i].attributes_count);

        methods[i].pCode_attr=NULL;
        if(methods[i].attributes_count>0)
        {
            //skip attributes
            for(int a=0;a<methods[i].attributes_count;a++)
            {
                u2 name_index=getu2(p); p+=2;

                TRACE("Attribute name index = %d\n", name_index);
                u4 len=getu4(p);p+=4;
                p+=len;
            }

            methods[i].pCode_attr = new Code_attribute;
            ParseMethodCodeAttribute(i, methods[i].pCode_attr);
        }
    }

    return TRUE;
}

In case of method structure ( and also same in fields structure) I have used method_info_ex instead of method_info structure. This extended structure on pointer to pint to raw method info in the in-memory bytestream of class file. Here with other fields we parse the Code attribute. The details id given later in attributes section.

Attributes

In most classes attributes takes most of the space in file. Class has attributes, method has attributes, field has attributes. The raw definition of attribute is like this-

struct attribute_info
{
    u2 attribute_name_index;
    u4 attribute_length;
    u1* info;//[attribute_length];
};

The attribute_name_index field is the reference index in the constant pool for string type constant. The attribute_length field is the length of info field- which is another structure depending on the type/ name of the attribute. An arrtibute can be a constant value, exception table or code type.

Constant value attribute

struct ConstantValue_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 constantvalue_index;
};

Code Attribute

It is also a method specific attribute. The name of the attribute is hardcoded as 'Code'. This attribute has maximum stack and maximum local values of the method. The code field is variable length defined by code_length and it contains the actual 'Java Byte Code'.

struct Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1* code;//[code_length];
    u2 exception_table_length;
    Exception_table* exception_table;//[exception_table_length];
    u2 attributes_count;
    attribute_info* attributes;//[attributes_count];
};

Exception table structure

This structure is used to define the exception table for methods. The exception table describes the exception handler depending on the program counter value or offset of byte code. The handler code is also an offset in the byte code.

struct Exception_table
{
    u2 start_pc;
    u2 end_pc;
    u2  handler_pc;
    u2  catch_type;
};

The field catch_type is a reference to a constant pool entry that describes the type of the exception- for example reference to a class named 'java/lang/Exception'.

Java Instruction Set

Java has more than 200 instructions. The java language file, when compiled, is converted to a class file that contains intrtuctions as byte codes. If we have a method like this-

public int mul(int a, int b)
{
    return a * b;
}

we will get this method in byte code attribute like this- (java has also assembly like representation for instructions to represent byte codes in human readable format)

Code Attribute:
  Stack=2, Locals=3, Args_size=3,  Code Length = 4
  Code:
  0:   iload_1
  1:   iload_2
  2:   imul
  3:   ireturn

Here if we follow the instructions we go like this:

0: Push (load) the local variable 1 on stack
1: Push the local variable 2 on stack
3: Pop two values from stack, do an integer multipucation and push the result
4: Return the integer value from stack top.

What we need to do in our virtual machine is load classes and follow the instructions in methods. There are methods to create new objects, to call methods of object. It is also possible to call native methods from a java method. Please refer to source code for most other codes (opcodes.h) or Java Virtual Machine Specification for a complete list.

Class heap

In the virtual machine we must maintain a heap where the class definition objects can be stored. I have implemented it as a separate heap for simplity. In this heap we load classes from files and store it in the heap. The ClassHeap class is responsible for maintaining the class heap in meory.

class ClassHeap
{
    CMapStringToPtr m_ClassMap;
    FilePathManager *pFilePathManager;
public:
    ClassHeap(void);
public:
    virtual ~ClassHeap(void);
public:
    BOOL AddClass(JavaClass* pJavaClass);
    JavaClass* GetClass(CString strClassName);
    BOOL LoadClass(CString strClassName, JavaClass *pClass);


};

We store JavaClass objects pointer in the m_ClassMap member using the class name as key.

Object heap

Object heap is virtual machine's RAM. All objects are created on object heap and its reference can be stored in another object or on the stack. Any reference is stored in an union type storage named Variable. Any field of a class can be represented using variable object. Anything can be stored in a Variable object.

union Variable
{
    u1 charValue;
    u2 shortValue;
    u4 intValue;
    f4 floatValue;
    LONG_PTR ptrValue;
    Object object;
};

Object creation on heap is described later in detail.

Virtual Machine Stack

Java instruction set is designed in such a way that it can work with very limited set of registers. Instead it uses its stack very extensively. The JVM stack element is one item regardless of - it may be primitive ype or object type. Only long and double type takes two stack spaces. The 'Execution Engine' maintains the JVM stack. For example when the execution engine executes iadd instruction it pops two numeric values from the stack and add the values and pushes the result on the stack. The virtual machine has empty stack initially and the stack is populated after the initial thread and the initial method is started for execution. Each method instruction is allowed to operate on stack within a limited boundary of the stack. The compiler sets the upper limit (top) as max_stack field of each methods Code attribute. The lower limit is the top+1 stack position of previous method. This boundary of stack is named as Stack Frame of that method.

The Stack Frame

As we mentioned each methods boundary in the JVM stack is known as 'Stack Frame'. Each stack frame reserves positions for the parameters and local variables of that method. If it is not a static method the first parameter is the object reference ( the this parameter) of type of the class of the method. The Execution Engine operates between the frame and when the method returns a value it pops every element from the current frame including this reference and pushes the return value (if method is not void return type) value on the frame of previous frames top. To keep the implementation simple I used slightly different methodology. I used stack of stack. Each frame is stack type and it is pushed on the JVM stack. The stack grows on method invocation and shrinks on method return.

Local Variables

In a stack frame local variables takes positions from zero to max_locals - 1 positions or less. If the method is not static the object takes the position zero and other locals follow it. Local variables are accessed using putfield and getfield instructions.

Native method stack

Unlike virtual machine stack, native methods stack is not maintained by JVM. It is maintained by the native system. Actually while a native method is being executed the virtual machine component that was managing the java thread waits until the native method completes and returns.

Runtime Environment

Each java thread has its own frame stack. All java threads in a process share common class heap and object heap. This things are bundled together in a RuntimeEnvironment object and carried among the execution engine components.

class RuntimeEnvironment
{
   public:
       Frame *pFrameStack;
       ClassHeap *pClassHeap;
       ObjectHeap *pObjectHeap;
};

Execution Unit

This is the main module of the JVM. It interprates the instructons. Advanced JVMs may use JIT compiler to convert java instructions into native instruction. But I did not do that because of the complexity of the JIT compiler. When a JVM starts it usually takes initial class name as parameter. Our JVM also takes class name as a parameter. The class heap is then requested to load that class. Then the JVM finds its main method (it can be any name like Entry in case of my first implementation), creates the initial stack frame and requests the execution engine to start execution. The heart of the Execution Unit is the Execute method. Here is the skeleton:

u4 ExecutionEngine::Execute(Frame* pFrameStack)
{
    ASSERT(pFrameStack);
    ASSERT(pFrame);

    Frame* pFrame=&pFrameStack[0];

    DbgPrint(_T("Current Frame %ld Stack start at %ld\n"),
            pFrame-Frame::pBaseFrame,     pFrame->stack-Frame::pOpStack );

    if(pFrame->pMethod->access_flags & ACC_NATIVE)
    {
        ExecuteNativeMethod(pFrame);
        return 0;
    }

    u1 *bc=pFrame->pMethod->pCode_attr->code + pFrame->pc;

    i4 error=0;
    JavaClass *pClass = pFrame->pClass;

    CString strMethod;
    pClass->GetStringFromConstPool(pFrame->pMethod->name_index, strMethod);

    DbgPrint(_T("Execute At Class %s Method %s \n"), pClass->GetName(), strMethod);
    i4 index=0;
    i8 longVal;
    while(1)
    {
        switch(bc[pFrame->pc])
        {
        case nop: //Do nothing
            pFrame->pc++;
            break;

        //Integer Arithmetic
        case iadd: //96 : Pop two int values from stack add them and push result
            pFrame->stack[pFrame->sp-1].intValue=pFrame->stack[pFrame->sp-1].intValue
                    + pFrame->stack[pFrame->sp].intValue;
            pFrame->sp--;
            pFrame->pc++;
            break;

        //Method return instructions
        case ireturn:
            //172 (0xac) : Pop everything from stack and push return value (int)
            pFrame->stack[0].intValue=pFrame->stack[pFrame->sp].intValue;
            return ireturn; // here we break the while loop
            break;

        // Method invokation Instructions

        // Here actually we do a recursive call to Execute
        // to keep things simple- after the java method return we
        // also return from Execute- some memory waste for simplicity
        case invokevirtual: //182: Invoke a virtual method.
            // The object reference and parameters are on stack by java instructions
            ExecuteInvoke(pFrame, invokevirtual);
            pFrame->pc+=3;
            break;
        }

        //Instructions that deal with objects

        case _new:// 187 (0xbb)
            ExecuteNew(pFrame);
            pFrame->pc+=3;
            break;
        case putfield: //181 (0xb5): Set field in object from stack top
            PutField(pFrame);
            pFrame->sp-=2;
            pFrame->pc+=3;
            break;

        case getfield:
            //180 (0xb4) Fetch field from object and push on stack
            GetField(pFrame);
            pFrame->pc+=3;
            break;
    }
    return 0;
}

Creating object on Object Heap

An object is usually created by JVM when a new or newarray or multinewarray instruction is executed. When a virtual machine creates an object it first calculate the size of the object. To calculate the object size we we first take the fields_count value in the class structure then we add its super classes fields_count value with it then we add super classes super classes fields_count and so on recursively until we reach the final base class java.lang.Object. This way we calculate total fields of the object and add one with it that holds the class pointer in the ClassHeap. Now we multiply the sizeof(Variable) to the count and get number of bytes required for the object. We now allocate the required bytes and return the pointer to that memory in a Variable object on the stack top. Here is the implementation.

int ExecutionEngine::ExecuteNew(Frame* pFrame)
{
    pFrame->sp++;
    u1 *bc=pFrame->pMethod->pCode_attr->code;
    u2 index=getu2(&bc[pFrame->pc+1]);
    if(!pFrame->pClass->CreateObject(
        index, this->pObjectHeap, pFrame->stack[pFrame->sp].object))
        return -1;
    return 0;
}

BOOL JavaClass::CreateObject(u2 index, ObjectHeap *pObjectHeap, Object& object)
{
    char *cp=(char*)this->constant_pool[index];
    ASSERT(cp[0] == CONSTANT_Class);
    ASSERT(pObjectHeap);
    if(cp[0] != CONSTANT_Class)
        return FALSE;

    u2 name_index=getu2(&cp[1]);
    CString strClassName;
    if(!this->GetStringFromConstPool(name_index, strClassName))
        return FALSE;

    JavaClass *pNewClass=this->m_pClassHeap->GetClass(strClassName);
    if(pNewClass == NULL) return FALSE;
    object=pObjectHeap->CreateObject(pNewClass);
    return TRUE;
}

Setting or getting value in object

The putfield instruction is used to set a value (from stack) of a field and the getfield instruction is used to load a variables value on the stack. When the execution engine needs to execute a getfield instruction it pops two values from the stack. One value is the object pointer another is field position (zero based index). Here is my implementation:

// Gets value or reference from stack and set in object
void ExecutionEngine::PutField(Frame* pFrameStack)
{
    u2 nIndex = getu2(
        &pFrameStack[0].pMethod->pCode_attr->code[pFrameStack[0].pc+1]);
    Variable obj=pFrameStack[0].stack[pFrameStack[0].sp-1];
    Variable value=pFrameStack[0].stack[pFrameStack[0].sp];
    Variable *pVarList=this->pObjectHeap->GetObjectPointer(obj.object);
    pVarList[nIndex+1]=value;
}

//Gets the value from variable and push on stack
void ExecutionEngine::GetField(Frame* pFrame)
{
    //TODO: Bug check for long and double
    u2 nIndex = getu2(
        &pFrame->pMethod->pCode_attr->code[pFrame->pc+1]);
    Variable obj=pFrame->stack[pFrame->sp];
    Variable *pVarList=this->pObjectHeap->GetObjectPointer(obj.object);
    pFrame->stack[pFrame->sp]=pVarList[nIndex+1];
}

Invoking method

When execution engine requires a method invocation it needs to creates a new 'Stack Frame' and the pc or Program Counter is set to the first byte of the methods byte code. Before that, the execution engine must save the current methods pc so that it can come back to resume method execution after the callee method returns. Here is our implementation. Please note how we handle static method invocation - simply we do not have the this reference on our stack.

void ExecutionEngine::ExecuteInvoke(Frame* pFrameStack, u2 type)
{
    u2 mi=getu2(
        &pFrameStack[0].pMethod->pCode_attr->code[pFrameStack[0].pc+1]);
    Variable objectRef = pFrameStack[0].stack[pFrameStack[0].sp];
    char *pConstPool = (char *)pFrameStack[0].pClass->constant_pool[mi];

    ASSERT(pConstPool[0] == CONSTANT_Methodref);

    u2 classIndex = getu2(&pConstPool[1]);
    u2 nameAndTypeIndex = getu2(&pConstPool[3]);

    //get class at pool index
    pConstPool =
        (char *)pFrameStack[0].pClass->constant_pool[classIndex];

    ASSERT(pConstPool[0] == CONSTANT_Class);

    u2 ni=getu2(&pConstPool[1]);

    CString strClassName;
    pFrameStack[0].pClass->GetStringFromConstPool(
            ni, strClassName);


    JavaClass *pClass=pClassHeap->GetClass(strClassName);
    pConstPool =
        (char *)pFrameStack[0].pClass->constant_pool[nameAndTypeIndex];

    ASSERT(pConstPool[0] == CONSTANT_NameAndType);

    method_info_ex method;
    method.name_index = getu2(&pConstPool[1]);
    method.descriptor_index = getu2(&pConstPool[3]);
    method.access_flags = 0; // set later

    CString strName, strDesc;
    pFrameStack[0].pClass->GetStringFromConstPool(
        method.name_index, strName);
    pFrameStack[0].pClass->GetStringFromConstPool(
        method.descriptor_index, strDesc);


    JavaClass *pVirtualClass=pClass;
    int nIndex=pClass->GetMethodIndex(strName, strDesc, pVirtualClass);

    memset(&pFrameStack[1],0,sizeof(pFrameStack[1]));
    pFrameStack[1].pMethod = &pClass->methods[nIndex];

    method.access_flags = getu2((char *)pFrameStack[1].pMethod);
    if( ACC_SUPER & method.access_flags)
    {
        pFrameStack[1].pClass = pVirtualClass->GetSuperClass();
    }
    else
    {
        pFrameStack[1].pClass=pVirtualClass;
    }

    int params=GetMethodParametersStackCount(strDesc)+1;

    //invokestatic - there is no this pointer
    if(type==invokestatic) params--;
    // else invokevirtual has this pointer

    int nDiscardStack =params;
    if(pFrameStack[1].pMethod->access_flags & ACC_NATIVE)
    {
    }
    else
    {
        nDiscardStack+=pFrameStack[1].pMethod->pCode_attr->max_locals;
    }

    pFrameStack[1].stack =
    &Frame::pOpStack[pFrameStack->stack-Frame::pOpStack+pFrameStack[0].sp-params+1];
    pFrameStack[1].sp=nDiscardStack-1;

    this->Execute(&pFrameStack[1]);

    //if returns then get on stack
    if(strDesc.Find(_T(")V")) < 0)
    {
        nDiscardStack--;
    }
    //Before we return to caller make the stack of caller right
    pFrameStack[0].sp-=nDiscardStack;
}

Invoking native method

In a java class a method may be marked as native-

public class Test
{
    public native void Print(string message);
}

In byte code ACC_NATIVE is set in the access_flags field of the method_info structure. We decide here like this:

if(pFrame->pMethod->access_flags & ACC_NATIVE)
{
    ExecuteNativeMethod(pFrame);
    return 0;
}

Each native method usually has a fixed predefined prototype. Here is the type definitation for our JVM:

typedef Variable (*pNativeMethod)(RuntimeEnvironment* pRuntimeEnvironment);

Here is how we handle native methods in the JVM:

u4 ExecutionEngine::ExecuteNativeMethod(Frame* pFrameStack)
{
    ASSERT(pFrameStack);

    ASSERT(pFrame->pMethod->access_flags & ACC_NATIVE);
    Frame* pFrame=&pFrameStack[0];

    JavaClass *pClass = pFrame->pClass;
    CString strClassName, strMethod, strDesc, strSignature;
    strClassName=pClass->GetName();
    pClass->GetStringFromConstPool(
        pFrame->pMethod->name_index, strMethod);
    pClass->GetStringFromConstPool(
        pFrame->pMethod->descriptor_index, strDesc);
    DbgPrint(_T("Execute At Class %s Method %s%s  \n")
        ,strClassName , strMethod, strDesc);
    strSignature=strClassName+_T("@")+strMethod+strDesc;
    pNativeMethod pNativeMethod=GetNativeMethod(strSignature);
    RuntimeEnvironment rte;
    rte.pFrameStack=pFrameStack;
    rte.pClassHeap= pClassHeap;
    rte.pObjectHeap= pObjectHeap;

    if(pNativeMethod == NULL)
    {
        // what should I do here??
        // System Panic??
        ASSERT(FALSE);
        return -1;
    }
    else
    {
        //Here we go native
        Variable retVal = pNativeMethod(&rte);

        //if returns then get on stack
        if(strDesc.Find(_T(")V")) < 0)
        {
            pFrame->stack[0]=retVal;
        }
    }
    return 0;
}

Here is the implementation of Print method of Test class:

//Signature: _T("Test@Print(Ljava/lang/String;)V")
 Variable Print(RuntimeEnvironment* pRuntimeEnvironment)
{
    Variable returnVal;
    Frame *pFrame=&pRuntimeEnvironment->pFrameStack[0];
    Object object=pFrame->stack[pFrame->sp].object;
    Variable *pVar
        =pRuntimeEnvironment->pObjectHeap->GetObjectPointer(object);
    if(pVar)
    {
        CString *pString = (CString *)pVar[1].ptrValue;
        if(pString)    wprintf(_T("%s"),*pString);
    }

    returnVal.intValue=0;
    return returnVal;
}

It is the native methods responsibility to correctly operate on stack. The java instruction is here out of scope. Everything is running on real machine here. So, here we pop the string type object reference from stack, convert it to CString object and do our native printing to console. This way we can handle any native operation like creating new window, drawing or do network operation. All these things are done in the implementation of Morpheus project.

The Garbage Collector

Java language does not have memory release mechanism. So the JVM must take the responsibility to release memory when some objects are out of scope no longer required or referenced by the application. To do this the JVM may take a number of strategy like reference count, mark and sweep etc. I used mark and sweep method because of its simplicily and accuracy. We start from the stack. We mark each object that is being referenced from the stack references. Then we mark all the objects that is being referenced by the marked objects and so on recursively. After the mark operation we know which objects are connected and which are out of scope. Then we take each object one by one and release its memory from heap. Before that we must call the finalize method of that object to do any cleanup required by the object itself programetically.

Conclusion

Thats all about to say for now about how we can implement a simple JVM. The JVM I present here is very limited implementation - though most of the java instructions are supported. It lacks heavily for library and native interface. Please look at the seminar presentation downloadable above for a visual description of JVM and how instructions are executed in a JVM. Full screen view of the presentation is best viewed with spacebar to go to next slide. I am busy with the implementation of Morpheus project at my free time and wish to come with that as well as with a new JVM with a new Window subsystem implementation having Windows Vista and Office 2007 look and feel (ok, ok, thats not technically a big deal but good to have for user interface degign) and oh yes also with a .NET Virtual Execution System with Morpheus for .NET support also. The .NET VES is similar except the complex onfile structure is really painfull when I decode them. All these things I do because I love to do- so I do not try to make a zero bug system. I leave when it seems to work for 'Hello World ++' applications.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)