This article shows the basics of working with multiple files and how to create objects without allocating them on the heap.
Introduction
What is the optimum size of a function? How many things should a function do? What should be the desired minimum size? Does one line make sense? By which criteria do I sort functions into files?
This will be suitable for those with a more academic C background. Familiar with moderate algorithmic tasks, but lacking blue-collar experience.
Pure Functions
Functions are meaningless without data. Some deal with the datatype as a whole and some with every elementary aspect of it. If you have a Person
datatype; the former operate on Persons
, the later on Person
's properties like: mass
, age
, name
, address
, picture
...
I found a more indirect solution to my 'size of a function problem' in pure functions. Pure functions don't have side effects. They copy data via parameters, make whatever calculations they have and return a result. A pure function always returns the same result given the same arguments.
Having only one value to return to the outside world, a pure function says everything about its size without mentioning quantities like lines of code. You are supposed to have only one effect. You should not modify 5 outer objects of different type using pointers in a function.
Some elementary values have bonds with others and are meaningless alone. They represent a logical entity. For instance, a 3D point could represent a Person
's location. It is more practical to have the object's location returned as a point
than to have three different functions that return its x
, y
and z
coordinate.
Languages like Python have the ability to return multiple values from a function and I think C does more appropriate than that. Why would you want to return a Person
's: mass
, picture
and address
from one function? You would want to return as one thing a couple of elementary values that represent a logical entity, in a struct
. Like some bio info: weight
, height
and age
.
The Heap and the Stack
I avoid malloc and the heap (if possible). It makes me happy to think that the default way of passing arguments in C is by copy/value. Sorry, not the default, the only way. Opposite to C++ and Pascal, in C you do not have passing by reference. You have a pointer datatype that you can pass by value and that value is good at representing references of objects.
The code examples here are without practical meaning, just to illustrate points in the discussion.
Example 1
point.h
struct point {
unsigned x;
unsigned y;
};
struct point point_new(unsigned, unsigned);
struct point point_move(struct point, int, int);
point.c
#include "point.h"
struct point point_new(unsigned x, unsigned y) {
struct point t;
t.x = x;
t.y = y;
return t;
}
struct point point_move(struct point t, int dx, int dy) {
t.x = t.x + dx > 0 ? t.x + dx : 0;
t.y = t.y + dy > 0 ? t.y + dy : 0;
return t;
}
rect.h
struct rect {
struct point p;
unsigned w;
unsigned h;
};
struct rect rect_new(struct point, unsigned, unsigned);
struct rect rect_move(struct rect, int, int);
struct rect rect_size(struct rect, unsigned, unsigned);
rect.c
#include "point.h"
#include "rect.h"
struct rect rect_new(struct point p, unsigned w, unsigned h) {
struct rect t;
t.p.x = p.x;
t.p.y = p.y;
t.w = w;
t.h = h;
return t;
}
struct rect rect_move(struct rect t, int dx, int dy) {
return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}
struct rect rect_size(struct rect t, unsigned w, unsigned h) {
t.w = w;
t.h = h;
return t;
}
example1.c
#include "point.h"
#include "rect.h"
#include <stdio.h>
int main(void) {
struct rect a = rect_new(point_new(5, 10), 20, 30);
printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
a = rect_size(rect_move(a, 10, 5), 40, 50);
printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
return 0;
}
compile:
cl example1.c point.c rect.c
output:
rectangle at (5, 10), width 20, height 30
rectangle at (15, 15), width 40, height 50
In the Appendix at the end of this article, there is info how on to get some free compilers and compile the examples on Windows.
Let's get straight with the notions of the caller and the callee. If function A
is called from inside of the body of function B
, we say that function B
is the caller and function A
is the callee. On the call stack, the callee is always on top of the caller.
The crucial thing here is: objects of type point
and rect
are created as local variables of the callee and they are returned to the caller, being again only local variables in the caller's function stack frame. No pointers, no allocation on the heap and when the main
function exits, you get everything nicely cleaned up.
You would also have everything nicely cleaned up even if you used the heap and forgot to free the allocated memory, because when the program terminates, the process that runs the program in the OS terminates and no memory could get lost. It was virtual memory to begin with, but you don't want to send an embedded device in space coded with memory leaks like that.
Every newcomer to C after some time notices that the pros instead of passing the entire object to the callee they pass only its pointer. This has always been justified as being faster and more efficient. The larger the structure, it is more effective to pass its pointer vs the entire structure. There is no reason you cannot use that right here.
The Stack is the most used data structure of all in computer technology. It is so important that much has been done to improve its efficiency. CPUs have hardware support for it: a stack pointer, special instructions that move data to and from the stack... If there is anything that you can point to that is in the cash of the CPU, that will probably be the stack. Allocating memory on the heap per object is an invitation to a cash miss when you latter dereference that object.
If one can sacrifice effectiveness for the benefit of a virtual machine, I'm willing to sacrifice effectiveness by copying entire structures on the stack and I bet C can do it faster than whatever the virtual machine does. But, large objects (kilobyte+) on the stack defeats its purpose as you cannot fit many of them in the cash of the CPU.
Function calls are piled up on the stack in the same First-In-Last-Out manner as everything else, which means the local variables of the caller are genuine and you can pass their address to the callee. We are going to add some functions to the point
and rect
class that use this. It's a bastardization on the idea of nested functions in some languages. For instance, Pascal uses a hidden parameter when nesting functions that enable them to access the local variables of the enclosing function. It's a pointer to the caller's stack frame.
Much like that other bastardization which uses a hidden parameter in functions that points to the object so it looks like as if y->f(x)
is something else than f(&y, x)
...
The venerable master Qc Na once told his student, Anton - "objects are a poor man's closures" and "closures are a poor man's objects".
Decomposing to Multiple Files in C
There are two datatypes, the point
and the rect
(rectangle). Each of them is represented by two files respectively. One is the header file with the .h extension, the other with the .c extension is called the source file in C jargon.
The first file is the interface, the second file is the implementation. This is fair play in C and is up to you. You can switch the extensions of the files and everything will work, provided that you switch them in the include
statements and when invoking the compiler.
Header files in C are inserted in the source file just as plain text in the place where the #include
directive is. A new temporary file is produced from the source file with all the header files inserted in it. That is what is translated by the compiler into a binary file. There for it is called a translation unit.
The code in those five files could well fit in one .c file, so why did I split in multiple files?
Although it seems that people want to split programs to separate & group things logically and create some sort of utopia, the most powerful reason is because work can also be split to multiple people. Each person being responsible for one or more source files.
What criteria did I use to split them the way I did and why I didn't put all the _move
functions in one file, with the _new
functions in another file?
The criteria is the data. Data and code are equal parts of the program, but the data is more equal.
You organize your functions and source files around the datatypes and can think of every datatype with its functionality as a programming library. Whatever new functionality you add to a type (like the point here), it is considered good practice to add that functionality into the point.c file and not to use it in place in whatever other file you are coding at the moment.
The datatype and all the functions that operate on that data is called a class. Working with an object only through its type interface is called encapsulation.
Example 1 has poor encapsulation. Not just that you can peek & poke the datatype directly, but rather that there is not enough support in the functions to avoid direct usage. Also, I have meddled with the encapsulation of point
. At the lines of function rect_new
where I set the value of the rect.point
: t.p.x = p.x
, t.p.y = p.y
Here be dragons. I should have coded t.p = p
.
When a project arises to 1500 files out of which 500 deal with points in this manner and there is a change in the way point
works, instead of making that change only in the implementation of point
I'm going to have to make changes in 500 files.
There is something fishy in the way I use the header files here. Say, if you want to make a rect
datatype for other people to use, you don't want them to bother and edit the source files themselves with the correct order of include directives.
The rect
type is dependent on the point
type and it will be OK if you just include the point.h header file into the rect.h header file, so the person using rectangles wouldn't have to include rect
dependencies himself into his source file. But there is a trap here. C doesn't want to have the same type declarations or definitions more than once in a compilation unit.
Imagine a scenario where one uses the rect
, but also wants to use a point
so he includes the file point.h into his source file, but then the point.h included in the rect.h file kicks in and you have two definitions of the same thing.
Another scenario, one uses a rect
and a circle
so she includes both rect.h and circle.h into her source file. Now both rect.h and circle.h include point.h on their own. ERR: multiple declaration, earlier declaration or redefinition of the thing called point
.
The way to prevent this is by using macro guards also called include guards, header guards... etc. I firmly believe that it is beneficial to know the way to include all the header files without inclusion guards, but inclusion guards are a must.
Also, it is good to know the naked declaration usage of struct
, then again using the typedef
keyword makes things more convenient.
Example 2
point.h
#ifndef POINT_H
#define POINT_H
typedef struct point {
unsigned x;
unsigned y;
} point;
point point_new(unsigned, unsigned);
point point_move(point, int, int);
void point_movep(point *, int, int);
#endif
point.c
#include "point.h"
point point_new(unsigned x, unsigned y) {
struct point t;
t.x = x;
t.y = y;
return t;
}
point point_move(point t, int dx, int dy) {
t.x = t.x + dx > 0 ? t.x + dx : 0;
t.y = t.y + dy > 0 ? t.y + dy : 0;
return t;
}
void point_movep(point * t, int dx, int dy) {
t->x = t->x + dx > 0 ? t->x + dx : 0;
t->y = t->y + dy > 0 ? t->y + dy : 0;
}
rect.h
#include "point.h"
#ifndef RECT_H
#define RECT_H
typedef struct rect {
point p;
unsigned w;
unsigned h;
} rect;
rect rect_new(point, unsigned, unsigned);
rect rect_move(rect, int, int);
rect rect_size(rect, unsigned, unsigned);
void rect_movep(rect *, int dx, int dy);
void rect_sizep(rect *, unsigned w, unsigned h);
#endif
rect.c
#include "rect.h"
rect rect_new(point p, unsigned w, unsigned h) {
rect t;
t.p.x = p.x;
t.p.y = p.y;
t.w = w;
t.h = h;
return t;
}
rect rect_move(rect t, int dx, int dy) {
return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}
rect rect_size(rect t, unsigned w, unsigned h) {
t.w = w;
t.h = h;
return t;
}
void rect_movep(rect * t, int dx, int dy) {
point_movep(&t->p, dx, dy);
}
void rect_sizep(rect * t, unsigned w, unsigned h) {
t->w = w;
t->h = h;
}
example2.c
#include "rect.h"
#include "point.h"
#include <stdio.h>
int main(void) {
rect a = rect_new(point_new(5, 10), 20, 30);
printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
a = rect_size(rect_move(a, 10, 5), 40, 50);
printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
rect_movep(&a, -5, -5);
rect_sizep(&a, 60, 70);
printf("rectangle at (%d, %d), width %d, height %d\n", a.p.x, a.p.y, a.w, a.h);
return 0;
}
output:
rectangle at (5, 10), width 20, height 30
rectangle at (15, 15), width 40, height 50
rectangle at (10, 10), width 60, height 70
The #ifndef
preprocessor directive works like any ordinary if
statement. If the condition is false
, it skips the next block. In preprocessor terms, that means the text from that #ifndef
and up until that #endif
will not enter into the compilation unit. On the other hand, if the macro POINT_H
is not defined, it will immediately get defined and the text with point
type definition will enter the compilation units of: point.c, rect.c and example2.c.
Just to check things, I have reversed the include
directives for point.h and rect.h and now it seems as if rect
is defined before the point
. Primarily, rect.h includes point.h at line 1. Since everything is guarded when the C preprocessor gets to the line #include "point.h"
in example2.c, it will include an empty string
, a void
, a nothing...
Strings
There is open space in the market for an entire book only on C strings. Here, the special case where you pass strings by value is of interest.
To be able to pass strings by value in C, you need two things:
- The string should be of a known fixed size
- It should be embedded into a
struct
.
The former is needed because to pass something by value to a function, it has to be a defined object. The compiler has to know beforehand what's the object size to reserve the required space in the function's stack frame.
To make it look more like a real-world example, I have decided objects like point
and rect
have an uuid string
. What could be more fixed in size than that? If I decided to put something like a full Person
's name instead of an ID
, no problem. I'd just pick up an arbitrary large string to represent it. Something like 80 characters.
There are names that have less than 10 characters, so I'd be wasting 70 bytes. There are names that require more than 80 characters, who cares? You can never get it right. It's all a compromise, especially how you handle string
s in C. The most common way is to have a char
pointer to a null
terminated string
allocated on the heap.
I will used this implementation for an uuid with every respect to the authors: Paul J. Leach and Rich Salz. It is dependent on this document with the RSA Data Security, Inc. MD5 Message-Digest Algorithm by: Ronald L. Rivest.
The source files with the copyright notice are included in the article archive. I will make the uuid.lib static
library. Here, only the usage of their header files will be shown.
Example 3
point.h
#ifndef POINT_H
#define POINT_H
typedef struct {
char id[40];
unsigned x;
unsigned y;
} point;
point point_new(unsigned, unsigned);
point point_move(point, int, int);
#endif
point.c
#include "point.h"
#include "sysdep.h"
#include "uuid.h"
#include <stdio.h>
point point_new(unsigned x, unsigned y) {
long i;
uuid_t u;
point t;
uuid_create(&u);
sprintf(t.id, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
u.time_hi_and_version, u.clock_seq_hi_and_reserved,
u.clock_seq_low);
for (i = 0; i < 6; i++)
sprintf(&t.id[24 + 2 * i], "%2.2x", u.node[i]);
t.x = x;
t.y = y;
return t;
}
point point_move(point t, int dx, int dy) {
t.x = t.x + dx > 0 ? t.x + dx : 0;
t.y = t.y + dy > 0 ? t.y + dy : 0;
return t;
}
rect.h
#include "point.h"
#ifndef RECT_H
#define RECT_H
typedef struct {
char id[40];
point p;
unsigned w;
unsigned h;
} rect;
rect rect_new(point, unsigned, unsigned);
rect rect_move(rect, int, int);
rect rect_size(rect, unsigned, unsigned);
#endif
rect.c
#include "rect.h"
#include "sysdep.h"
#include "uuid.h"
#include <stdio.h>
rect rect_new(point p, unsigned w, unsigned h) {
long i;
uuid_t u;
rect t;
uuid_create(&u);
sprintf(t.id, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
u.time_hi_and_version, u.clock_seq_hi_and_reserved,
u.clock_seq_low);
for (i = 0; i < 6; i++)
sprintf(&t.id[24 + 2 * i], "%2.2x", u.node[i]);
t.p.x = p.x;
t.p.y = p.y;
t.w = w;
t.h = h;
return t;
}
rect rect_move(rect t, int dx, int dy) {
return rect_new(point_move(t.p, dx, dy), t.w, t.h);
}
rect rect_size(rect t, unsigned w, unsigned h) {
t.w = w;
t.h = h;
return t;
}
example3.c
#include "rect.h"
#include "point.h"
#include <stdio.h>
int main(void) {
rect R = rect_new(point_new(5, 10), 20, 30);
printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
printf("rectangle R id %s\n", R.id);
printf("point of rect R id %s\n", R.p.id);
R = rect_size(rect_move(R, 10, 5), 40, 50);
printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
printf("rectangle R id %s\n", R.id);
printf("point of rect R id %s\n", R.p.id);
return 0;
}
compile:
cl example3.c point.c rect.c ..\uuid\uuid.lib wsock32.lib
output:
rectangle R at (5, 10), width 20, height 30
rectangle R id 2afc6b63-e09a-11eb-babc-e837ef16bb95
point of rect R id Ю№
rectangle R at (15, 15), width 40, height 50
rectangle R id 2afeccc2-e09a-11eb-babc-e837ef16bb95
point of rect R id ╝№
It has at least two issues.
The point
id
in the rect
is garbage, I forgot to copy id
. If I keep adding new properties to the point
, I will have to remember to add code for them in the rect
. That's bad! My disrespect for point
's otherwise permissive encapsulation bit me.
To mend this, one doesn't write:
t.p.id = p.id;
t.p.x = p.x;
t.p.y = p.y;
But simply:
t.p = p;
Always code what to do, not how to do it. Let the point take care of itself.
Notice that t.p.id = p.id
will not work as expected. It is an error. You cannot assign an array in C by value, like you would an int
. You will have to embed the array into a struct
or cast it as a struct
.
The second problem is, the id
of the rect
changes. I chose to over-engineer the rect_move
function. In my desire to make it one line, I reuse point_move
and rect_new
, but that's an overkill. Overengineering is a weakness. Things should be done with as less work as possible.
Here, I can use one of those unholy procedures that take the address of a point
and mutate its x
and y
, playing the game as if I would be using a nested function in a higher language. Because the stack frame of the point_movep
procedure would be on top of the rect_move
function's stack frame, it's safe.
Or I can use the function point_move
to copy the entire point from the rect
, modify the copy's x
and y
, then return the copy to replace the old point
in the rect
. Sounds more functional.
Or... can do something more evil, later.
On the question, what should be the desired minimum size of a function? In Example 3, point_move
is two lines, rect_move
is one line... What is the justification of a function being only one/two lines? Other saying that goes well with always code what to do, not how to do it is always program in the language of the domain. Meaning, it's better to see something like "move the point" than t->x = t->x + dx > 0 ? t->x + dx : 0
I'm a bit bored by the ugliness of the code inside point_new
and rect_new
. That code right after the uuid_create
procedure. It is repeating and it will continue to repeat itself into as many datatypes I create that use uuid.lib
. Maybe it's time to put it into its own procedure and put that procedure into a new source file or maybe into one of the source files of the original creators of the uuid
library, which fells a bit unethical.
I use the term procedure for special case of function that does not return anything and mutates object in place. Those void
procedures in Example 2. They may be more useful if I turn them into functions that take the address of the object to be mutated and when done return, it's address back to the caller so you can chain function calls.
Example 4
id40.h
#ifndef ID40_H
#define ID40_H
typedef char id40[40];
typedef struct {
id40 x;
} str40;
void id40_set(char *);
#endif
id40.c
#include "id40.h"
#include "sysdep.h"
#include "uuid.h"
#include <stdio.h>
void id40_set(id40 t) {
long i;
uuid_t u;
uuid_create(&u);
sprintf(t, "%8.8x-%4.4x-%4.4x-%2.2x%2.2x-", u.time_low, u.time_mid,
u.time_hi_and_version, u.clock_seq_hi_and_reserved,
u.clock_seq_low);
for (i = 0; i < 6; i++)
sprintf(&t[24 + 2 * i], "%2.2x", u.node[i]);
}
point.h
#include "id40.h"
#ifndef POINT_H
#define POINT_H
typedef struct {
id40 id;
unsigned x;
unsigned y;
} point;
point point_new(unsigned, unsigned);
int point_equals(point, point);
int point_equalsp(point *, point *);
point point_move(point, int, int);
point * point_movep(point *, int, int);
#endif
point.c
#include "point.h"
point point_new(unsigned x, unsigned y) {
point t;
id40_set(t.id);
t.x = x;
t.y = y;
return t;
}
int point_equals(point a, point b) {
return a.x == b.x && a.y == b.y;
}
int point_equalsp(point * a, point * b) {
return a == b;
}
point point_move(point t, int dx, int dy) {
t.x = t.x + dx > 0 ? t.x + dx : 0;
t.y = t.y + dy > 0 ? t.y + dy : 0;
return t;
}
point * point_movep(point * t, int dx, int dy) {
t->x = t->x + dx > 0 ? t->x + dx : 0;
t->y = t->y + dy > 0 ? t->y + dy : 0;
return t;
}
rect.h
#include "id40.h"
#include "point.h"
#ifndef RECT_H
#define RECT_H
typedef struct {
char id[40];
point p;
unsigned w;
unsigned h;
} rect;
rect rect_new(point, unsigned, unsigned);
int rect_equals(rect, rect);
int rect_equalsp(rect *, rect *);
rect rect_move(rect, int, int);
rect * rect_movep(rect *, int, int);
rect rect_size(rect, unsigned, unsigned);
rect * rect_sizep(rect *, unsigned, unsigned);
#endif
rect.c
#include "rect.h"
rect rect_new(point p, unsigned w, unsigned h) {
rect t;
id40_set(t.id);
t.p = p;
t.w = w;
t.h = h;
return t;
}
int rect_equals(rect a, rect b) {
return point_equals(a.p, b.p) && a.w == b.w && a.h && b.h;
}
int rect_equalsp(rect * a, rect * b) {
return a == b;
}
rect rect_move(rect t, int dx, int dy) {
t.p = point_move(t.p, dx, dy);
return t;
}
rect * rect_movep(rect * t, int dx, int dy) {
point_movep(&t->p, dx, dy);
return t;
}
rect rect_size(rect t, unsigned w, unsigned h) {
t.w = w;
t.h = h;
return t;
}
rect * rect_sizep(rect * t, unsigned w, unsigned h) {
t->w = w;
t->h = h;
return t;
}
example4.c
#include "rect.h"
#include "point.h"
#include <stdio.h>
int main(void) {
rect R = rect_new(point_new(5, 10), 20, 30);
printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
printf("rectangle R id %s\n", R.id);
printf("point of rectangle R id %s\n", R.p.id);
R = rect_size(rect_move(R, 10, 5), 40, 50);
printf("rectangle R at (%d, %d), width %d, height %d\n", R.p.x, R.p.y, R.w, R.h);
printf("rectangle R id %s\n", R.id);
printf("point of rectangle R id %s\n", R.p.id);
{
rect S = rect_new(R.p, 88, 77);
rect * T = &S;
printf("rectangle S at (%d, %d), width %d, height %d\n", S.p.x, S.p.y, S.w, S.h);
printf("rectangle S id %s\n", S.id);
printf("point of rect S id %s\n", S.p.id);
printf("rectangle R and rectangle S are %s\n",
rect_equals(R, S) ? "equal" : "unequal");
printf("points of rectangle R and S are %s\n",
point_equals(R.p, S.p) ? "equal" : "unequal");
printf("rect R and rect S are %s object\n",
rect_equalsp(&R, &S) ? "the same" : "not the same");
printf("points of rect R and S are %s object\n",
point_equalsp(&R.p, &S.p) ? "the same" : "not the same");
*T = rect_new(R.p, R.w, R.h);
printf("rectangle R and rectangle *T are %s\n",
rect_equals(R, *T) ? "equal" : "unequal");
printf("rect R and rect *T are %s object\n",
rect_equalsp(&R, T) ? "the same" : "not the same");
printf("rect S and rect *T are %s object\n",
rect_equalsp(&S, T) ? "the same" : "not the same");
}
return 0;
}
compile:
cl example4.c id40.c point.c rect.c ..\uuid\uuid.lib wsock32.lib
output:
rectangle R at (5, 10), width 20, height 30
rectangle R id 86122912-bf91-11eb-bddc-e1995e751a4c
point of rectangle R id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle R at (15, 15), width 40, height 50
rectangle R id 86122912-bf91-11eb-bddc-e1995e751a4c
point of rectangle R id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle S at (15, 15), width 88, height 77
rectangle S id 86122913-bf91-11eb-bddc-e1995e751a4c
point of rect S id 860fc623-bf91-11eb-bddc-e1995e751a4c
rectangle R and rectangle S are unequal
points of rectangle R and S are equal
rect R and rect S are not the same object
points of rect R and S are not the same object
rectangle R and rectangle *T are equal
rect R and rect *T are not the same object
rect S and rect *T are the same object
typedef
is just a name alias, it does not create a new type in binary form. To illustrate this in rect.h char id[40]
is used instead of id40
, without a complaint from the compiler.
The most interesting thing in Example 4 is the type declaration of str40
, which by itself is obsolete in the code. We need to declare this struct
that embeds an array of 40 char
s just to have the ability to assign one array to another. For instance:
*(str40 *)t.p.id = *(str40 *)p.id;
Instead of working with a real array that is inside a struct
, we type cast the array. First to a pointer of the forementioned struct
, then we dereference that pointer to the real struct
. Now, we have copied an array by value.
To be frank, the identifiers t.p.id
and p.id
in C do not represent an array, but a constant address. That is why we first need to assign that address to a pointer and then dereference that pointer to get to the real meat of the struct/array.
Appendix
Let's finish the job with some command line tools. Be warned, anytime you see a build error mentioning something like htons
, a library called ws2_32
or wsock32
has to be included in the build process.
Embarcadero Free C++ Compiler
This is a modern 32bit C/C++ Clang-based compiler with C11 support. It can be downloaded [here].
Unzip it in some folder, let's say C:\LANG, so that the directory structure will be C:\LANG\BCC102\bin. Next, we need to add it to the PATH
system variable. Right click on My Computer, select Properties, then Advanced system settings. In the System Properties windows on the Advanced tab, down there is a Environment Variables button to click.
If you see a Path entry in the User variables list, click Edit. Add a new value C:\LANG\BCC102\bin. If there is no path variable, click New and add Variable value: C:\LANG\BCC102\bin, Variable name: Path
.
Now you can open Command Prompt and enter bcc32x
. If everything is OK, you will be greeted by the compiler saying that its version is 7.30 for Win32. Don't close the window for now. Download the examples source from this article and unzip it in say the C:\Source folder. It has five folders in it and the directory structure is C:\Source\Example3 etc.
Change the directory in the console to C:\Source\uuid.
cd \Source\uuid
Now you'll have to compile the source and then create the uuid
library.
bcc32c -c md5c.c sysdep.c uuid.c
tlib uuid.lib /u /a /C +md5c +sysdep +uuid
Let's get into a directory of an example and create one of the executables.
cd ..\example4
bcc32c example4.c id40.c point.c rect.c ..\uuid\uuid.lib
That's it. The Borland/Embarcadero compilers don't complain about undefined reference to htons
.
Mingw-w64
The "Minimalist GNU for Windows" is a free and open-source software development environment, a port of the GNU Compiler Collection. For those who want to compile in 64bit, the standalone version of the compiler packaged with some handy tools and libraries can be found on [TCL's page].
Download, execute and tell the self-extracting archive to unzip at the C:\LANG folder. It will create a MinGW subfolder. Add the C:\LANG\MinGW\bin directory to the PATH
environment variable, just like we did before with the Embarcadero compiler.
Open the Command Prompt. To test enter gcc -v
. This will show you the version of the GNU C compiler, at the point of this writing, it is 9.2.0.
Switch to the \Source\uuid directory and let's compile and create the uuid
library.
gcc -c uuid.c md5c.c sysdep.c
ar ru libuuid.a uuid.o md5c.o sysdep.o
Now as we did before, we either need to explicitly include the libuuid.a
static library in the build:
gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -o example4.exe
or the object files that created it:
gcc example4.c id40.c point.c rect.c ..\uuid\uuid.o ..\uuid\md5c.o ..\uuid\sysdep.o
And here, the linker is complaining about some undefined reference called __imp_ntohs
...
To fix this, we'll have to include the compiler's own ws2_32
or wsock32
library in the build process.
gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -lwsock32 -o example4
or:
gcc example4.c id40.c point.c rect.c ..\uuid\libuuid.a -lws2_32 -o example4
Somewhere in the folder hierarchy of the MinGW compiler toolset, there is a file libwsock32.a that does the job.
Visual C++ Toolkit 2003
Last but not least, the same C/C++ compiler shipped with Visual Studio .NET 2003 (without the IDE) that Microsoft made freely available. It's a basic C89 compiler. To use for win32 applications (that work on anything from Windows 95 to the latest Windows 10), you need the Platform SDK
. Borland, MinGW, LCC, Peles's and other windows compilers include platform files of their own.
Let's arm this tool so it can do damage. First, get the compiler. Search Google for a file VCToolkitSetup.exe.
Tell the installation wizard to install in C:\LANG\VS2003\ and this alone sets a environment variable called VCToolkitInstallDir
, but you have to add C:\LANG\VS2003\BIN to the PATH
, so do it.
Now you can compile academic C code that creates: linked lists, binary trees, multiply matrices, writes stuff to the console... but you cannot make win32 GUI apps. Download the Windows Server 2003 R2 Platform SDK in IMG format from [CNET].
Unpack it with something like 7z
and start the Setup. Choose custom installation. Although the Platform SDK is big by y2k standards we actually need very little from it (header files like windows.h and some build tools that are missing in the VCToolkit). Tell the installation wizard to put the files in the same directory where we got the compiler (C:\LANG\VS2003), so we'll don't have to set additional environment variables to Windows.
When it gets to the "Check the options below to select and deselect individual features" window, deselect everything by clicking on the main feature box. That will mark it with a red cross. Now open "Microsoft Windows Core SDK" box, choose: "Build Environment (x86 32-bit)" and "Tools (AMD 64-bit)". Finish the installation.
Technically, now you have two compilers: Visual C++ 2003 32bit compiler and the Visual C++ 2005 Express 64bit compiler. We'll stick to the former. Add the AMD64-bit tools to the PATH
, they should be in C:\LANG\VS2003\Bin\win64\x86\AMD64, but have it come after C:\LANG\VS2003\BIN in the list.
You need to add two new environment variables to the Windows OS. INCLUDE
and LIB
. Their values should be respectively: C:\LANG\VS2003\INCLUDE and C:\LANG\VS2003\LIB.
Once again, open the Command Prompt. Go to the uuid
directory where you have extracted the examples.
cl -c uuid.c md5c.c sysdep.c
lib -nologo -out:uuid.lib uuid.obj md5c.obj sysdep.obj
Let's go to the example4 folder and build it, this time including the wsock32.lib without waiting for the linker to complain.
cl example4.c id40.c point.c rect.c ..\uuid\uuid.lib wsock32.lib
Happy coding!
History
- 9th July, 2021: Initial version
- 10th July, 2021: Update