Introduction
The purpose of this article is to share with the community, the understanding and knowledge, to the how and why, Object Oriented Programming was born.
In this article we shall attempt to construct an array of Shape pointers, while each specific Shape shall be either a Circle, a Square, or a Goat.
And all of this shall be done in pure C.
Background
OOP (Object Oriented Programming) is unarguably on of the greatest programming design patterns to have manifested itself, in the grey matter of Sapiens minds.
This article is a far cry attempt, to mimic the adventure, of self achieving, the implementation of OOP in C.
While all the nay sayers may be cursing me for trying to reinvent the wheel. This article is designed to serve the followings.
- Consolidate your understanding of the inner mechanics of OOP.
- Consolidate your understanding of C.
Using the code
I used my "gcc" compiler found in the MinGW compiler suit for windows to compile all code. (Note - I am not certain all the features I exploited are a C standard, and might be compiler specific).
Article
Oh how long has it been since I wrote in pure C. Let us get reacquainted with the basics. This is how "Hello World" is supposed to look like.
#include <stdio.h>
int main(){
printf("Hello World");
return 0;
}
compile it, run it, make sure your system is functioning properly, and lets start accelerating. What we need is make three Shape classes who shall inherit from an abstract Shape class. Well C doesn't have classes or inheritance, but it does have structs :), and that's a start.
struct Square{
int width;
int height;
};
struct Circle{
float radius;
};
struct Triangle{
int base;
int height;
};
nothing fancy here, lets investigate this further. Type the following inside main()
printf("size of square is %d\n", sizeof(struct Square));
printf("size of Circle is %d\n", sizeof(struct Circle));
printf("size of Triangle is %d\n", sizeof(struct Triangle));
output should be (not guaranteed)
sizeof() is a special operator in C that tells one how many Bytes a certain Type takes. We asked our compiler how many Bytes does use in memory to represent a "struct Cube" and it said 8. This is logical because a Cube is made out of two ints. While if you try to sizeof( int ) you should get 4, unless you have a really old computer. This is cool because apparently in C there is no headover in structs, this is how our simple structs look like in memory.
Carefully notice that the alignment of declared members in structs coincides with the alignment of those variables in memory. Meaning that.
struct Square{
int width;
int height;
};
will be constructed in memory as 4 bytes for width and only then 4 bytes for height. Lets not assume anything and assert this ourselfs.
struct Square square;
square.width = 1;
square.height = 2;
printf("the first 4 bytes of square are %d", (*(int*)(&square)));
Basically this is how you would spy on yourself. We make a Square called square. Assign its values, and print the int in its first 4 bytes. My output was 1 which means that as expected the width variable is represented by the first 4 bytes of Square, and the height by the 2nd 4 bytes of Square. Now don't get frightened by the insanity at the last argument of the printf() function, lets take it slowly.
printf("the first 4 bytes of square are %d\n", square);
this actually prints the first 4 bytes of square as an int too!!, to my surprise, even though it achieves the same effect, it is not a very general technique to investigate.
printf("the first 4 bytes of square are %d\n", &square);
the & operand gives us the address of the beginning of the square in memory. And this can be printed as well. However, what we want is to print the int whose address also begins there. So we cast this address to an int address
printf("the first 4 bytes of square are %d\n", (int*)&square);
now we can happily print the "int" that is pointed by our casted int pointer. Which is how I did it in the first example. But before you print it, you can choose to move the pointer wherever you want. Which you couldn't do by the second example. So if we would like to print the second 4 bytes of the square we would need to move the pointer by 4 bytes forward, or one int forward, thus.
printf("the second 4 bytes of square are %d\n", (*(int*)&square + 1));
A point to note that pointer arithmetic are done by jumps relative to the type they refer too. Meaning that an int pointer moves by int steps forward and backwards. Thus +1 is actually +1 int forward or +4 bytes forward.
Gandalf: Ok enough C, back to polymorphism. Now that we know how structs get constructed in memory lets give each struct two functions. A print() function and an area() function.
Frodo Baggins: But but but, one does not simply give functions to structs in C!
Gandalf: Well one does not simply eat an invisible hobbit either, so what?
Frodo Baggins: Then what shall we do master Gandalf?
Gandalf: We shall use function pointers my little hobbit friend!
Frodo Baggins: I don't want to play this game anymore!
Gandalf: You shall not pass!
That's right more pointers, and this time they are function pointers, C has no compassion for the weak. Any struct in C treats a function pointer just like any other member, be it an int, a pointer to an int, or a pointer to itself. Thus theoretically we can give our structs functions through a function pointer member. Consider this function for example.
void print_square( void ){
printf("Hello Square\n");
}
This is a function that receives nothing and returns nothing called print_square. Which means that a pointer to this function would be of Type "pointer to a function that receives void and returns void", and here is how you would declare such a variable.
struct Square{
int width;
int height;
void (* print)( void );
float (* area)( struct Square * this );
};
The thumb rules for reading and writing types in C is to start at the variable name, go always right as much as possible, then go left.
Lets do it together step by step
print
1) print is a
(* print)
2) print is a pointer to
(* print)(
3) print is a pointer to a function that
(* print)( void )
4) print is a pointer to a function that receives void (nothing) and returns
void (* print)( void );
5) print is a pointer to a function that receives void (nothing) and returns void (nothing)
Now we want to give our Square an area function that will receive a Square (itself) and return a float representing its area. It reads in exactly the same process as the print function.
Its not over yet, pointers by themselves are great and all, but they kind of gotta point to something to be useful. So how do you mind a pointer may point to the memory location where the function is located? how does one know where a function is located? Good news for once, the function name itself is the address of its memory location. Here is a practical example.
struct Square square;
square.print = print_square;
Like I mentioned before, function pointers behave just like any other member. When we create a Square and call it square, all its members have garbage values. We need to manually assign them to the correct values. And this job requires a constructor. C doesn't have that either. So we shall be making our own constructor function like so.
void init_square( struct Square * square, int w, int h ){
(*square).print = print_square;
(*square).width = w;
(*square).height = h;
(*square).area = calc_square_area;
}
This constructor is just another function, that needs to change the values of the square passed to it, thus it MUST be a pointer to a Square, passing by value here wont do. And as you can see here we assign the two function pointers the proper functions. You can go ahead and implement the calc_square_area function by yourself, or you could peek at the downloadable complete example.
Kindly forgive me for not supplying all 9 functions (print, area, init) of the three Shapes. Because time is of the essence. We must proceed onward to victory.
Let us test what we have crafted thus far.
struct Square square;
struct Circle circle;
struct Triangle triangle;
init_square( &square, 2, 2 );
init_circle( &circle, 7.7 );
init_triangle( &triangle, 2, 3 );
square.print();
circle.print();
triangle.print();
printf("the area of the square is %f\n", square.area(&square));
printf("the area of the circle is %f\n", circle.area(&circle));
printf("the area of the triangle is %f\n", triangle.area(&triangle));
In C everything is backwards, instead of making all three shapes inherit from a Shape struct we shall make a Shape struct that will father (in a way) all three shapes. Since we need to start somewhere lets create a logical struct Shape.
struct Shape{
void (* print)( void );
float (* area)( struct Shape * this );
};
Now that we know where to begin, lets start thinking what we want to happen. So if you create a Square one more time, initialize it and everything.
struct Square square;
init_square(&square, 2, 4);
what would happen if I would try to print it?
square.print();
now what would happen if some unwanted pointer to a Shape tried to print our square.
struct Shape * pointer = (struct Shape *)□
(*pointer).print();
I got a segmentation fault, how about you? well the results are currently unexpected, as expected. What we want to happen is that the print() activation of the Shape pointer will activate the print() function of the Square object which its pointing to.
We haven't talked about memory in a while, lets do that again. How does our Shape construct looks like in memory compared to our Square construct? How do function pointers effect the construct? Like I mentioned before they behave just like any other member, thus in the end, its just another pointer in the construct of a memory. Here is a picture.
My spider senses are tingling! The print() function in the Shape memory model is the first 4 bytes, while the print() function in the Square memory model is the third 4 bytes! aha, first does not equal third. What is going on?
When we casted a pointer to a Square into a pointer to a Shape, The memory was left untouched. Remember this in general, casting usually doesn't change anything inside the memory. Unless your some wierdo like me using dynamic casting, memory is never changed in casting, its even not done in run time, All your casts are done during compilation. However, the memory itself might not have changed but we treat it does change.
square.print();
This uses the pointer in the third 4Byte to activate a function called print(). While this
(*pointer).print();
It uses the pointer in the FIRST 4Byte to activate a function called print(). But there isn't a pointer in the first 4Byte, there is just an old int, which we know is equal to 2. Well our pointer doesn't really care that its 2, it is faithful to the fact that 2 is a pointer to a function that it must activate, thus it goes to the memory location of 2, (which is some BIOS driver \ operating system memory location) and activates whatever is there. And hats the downfall.
Now that we kind of know the problem, lets make a solution. Only if only the print() pointer of the Shape struct would be the third 4Byte, Oh how I wish the Shape struct would have something to fill up the first 8 Bytes, that that its function pointers would align with that of Square function pointers. And so it be, we call it "padding technique" in C.
struct Shape{
char padd[8];
void (* print)( void );
float (* area)( struct Shape * this );
};
Take a look at the Circle struct, what type of padding technique would we need in order for Shape struct to be aligned with the Circle too?
Final Test.
struct Shape * shapes[3];
shapes[0] = (struct Shape *)□
shapes[1] = (struct Shape *)&circle;
shapes[2] = (struct Shape *)▵
int i;
for(i=0; i<3; ++i){
(*shapes[i]).print();
printf("%f\n\n", (*shapes[i]).area(shapes[i]));
}
If you got this to work properly, than you have achieved implementing the basics of OOP up to polymorphism in C. Now that is the taste of success.
Conclusion
Overall this design is fairly simple to implement. One must always align the members in their structs to their liking to achieve the desired comparability and usability. The more acute readers might have noticed that it would be more memory efficient to first declare the function pointers. But Again there must always be room for improvement. I have greatly enjoyed once again programming in C, and believe that if ever fortune smiles down upon anyone, this practice can become useful. Polymorphism is not the only technique achievable by these means. With the proper knowledge of how memory is aligned you can quickly implement inheritance and virtual tables, and stuff noone has ever thought to do. I believe that even if you wont ever use OOP designs in C this article is still a good lesson since you are forced to dwell into the nature of things to understand how things work behind the curtains of modern languages, including C which is by all means a modern language.