Programming in C with Type Inference

Leandro T C Melo

3.86/5 (7 votes)

10 Sep 2022BSD6 min read

14.9K

In this article, I will present PsycheC, a type inference engine for the C language.

Introduction

[Portuguese version available here.]

The C programming language has been around for 4 decades. Over the years, C has evolved in numerous aspects (C11 being the current standard), but its fundamental characteristics remain intact: a textual #inclusion mechanism, an imperative programming model, scopes are lexically delimited, and, among other things, the typing of expressions happen statically.

The type system of C, besides being static, which means that type-checking is performed during compilation, has a few properties.

It is nominal, meaning that the equivalence of types in C is determined by names, those specified in type declarations (e.g., of a struct). This approach contrasts with that of a structural type system, which relies on the structure of types in order to determine their equivalence.
When we declare a variable in C, it is necessary that a type is explicitly associated to it (e.g., int or double). This is a distinction to languages that feature a so-called type inference, where programmers are freed from the need of annotating types.
Although there is no formal or universally accepted definition for "weak typing" and "strong typing", we colloquially designate C to the first group, due to the permissiveness of its type system. This fact can be observed in implicit conversions like those involving void*.

A complete discussion about type systems is beyond the scope of this article. In particular, topics related to polymorphism and its varieties like ad-hoc polymorphism, parametric polymorphism, and subtyping. Therefore, my goal is restricted to a discussion of type inference. Precisely, I would like to present a C "compiler" that supports such feature. With this tool, you can program in C without writing declarations for a struct or a typedef.

Reconstructing Programs with PsycheC

Before presenting PsycheC, it is worth distinguishing between the notion of global type inference and that of local type inference from an isolated expression. Nowadays, several languages offer an "auto-deduction" functionality, where a variable can be declared without an associated type specifier, provided that it is accompanied by an initialization value.

C++

void f () {auto v = 42; } // auto is the placeholder in C++
void g () {var v = 42; } // in C #, the placeholder is var

In the above snippet, there is indeed the inference of a type going on. However, in a restricted context. The implementation of this functionality is relatively simple and consists of inspecting the types of subexpressions (e.g., a literal or annotated declaration) that make up the complete expression, combined with the imposition of the language rules.

The kind of type inference that I refer to is complex and may involve an entire program. In this formulation, a set of expressions are collected, but complete expressions are not necessarily formed by sub-expressions declared with type annotations. Ultimately, we have a system of constraints composed by many expressions. The final problem is to find the most generic type capable of solving this system. This type is called a principal type; classical examples of languages with such type system, known as Hindley-Milner, are ML and Haskell.

A question pops up: is it possible to integrate an advanced type inference to the benevolent type system of C? Yes, and that is exactly what PsycheC does! And this is not an easy task. C's type system is not a trivial one. Despite inheritance not being present in the language, void* conversions mimic a form of polymorphism; type qualifiers pose interesting inchallenges - non-const pointers can be assigned to const pointers, but the inverse is not permitted; furthermore, in the absence of declarations, the C language becomes both syntactically and semantically ambiguous (e.g., x * y, can be a pointer declaration or a multiplication, while z = 0; may be the initialization of an integral or a NULL pointer).

In a strict sense, PsycheC is not a compiler. And it does not rely on any weird syntax or introduce any new keyword. Its goal is to work on plain C syntax. Essentially, PsycheC is an analysis capable of "discovering" the principle type of expressions that appear in a program, but whose declarations are missing. Therefore, PsycheC is sort of a type declaration producer: If it finds a name T whose declaration is absent, it creates an according struct for T or it makes T a typedef for an existing type.

int main()
{
    T v = 0;
    v->value = 3.14;
    v->next = v;
    return 0;
}

Save the contents of the above snippet to a file test.c and try to compile it with gcc, clang, or the C compiler of your choice. You will receive an error similar to the one shown below. This error is an expected one, since the definition of T does not appear in the program.

$ clang test.c
test.c:3:5: error: use of undeclared identifier 'T'
    T v = 0;
    ^

However, if the C language natively supported type inference, the compiler would be able to determine that one solution to this program is to make T a pointer to a struct, which is initialized with 0; to add a field of name value and type double to T; and to add to T a field of name next, with a type recursively referring to itself. This intelligence is precisely what the PsycheC tool offers, producing, for this program, the following declarations.

typedef struct TYPE_2__ TYPE_1__ ;
struct TYPE_2__ { double value; struct TYPE_2__* next; } ;
typedef TYPE_1__* T ;

If you would like to give a try, without cloning/building the github repository, take a look at this online interface.

Use Cases: PsycheC in Practice

At this point, you must be wondering whether it is worth so much trouble to get a type inference engine for C. Well... it depends. First of all, building PsycheC has been a great adventure. In any case, if you appreciate a style of programming closer to that of Python, it just might be that you have some fun. Nevertheless, PsycheC can be more than a mere toy. The typical uses cases are the following ones:

You want to quickly prototype an algorithm, focusing on its functional aspect, without having to worry about how the types are represented. Suggestion: Try to implement the functional version of mergesort, without declaring a single struct.
You are working on a legacy, embedded, or cross-platform project in which types are declared only for the target architecture, but they do not compile on the platform where you are simulating or testing the program. In this situation, PsycheC can be used to reconstruct incompatible headers.
You need to run an analysis, debug, or test a snippet submitted via bug tracker, but you do not have access or time to compile the original entire program. In certain cases, type stubs might be sufficient to reproduce an issue.

From Academia to Industry

PsycheC is primarily an academic project. In fact, the theory behind it has been published at POPL (Principles of Programming Languages) in this year of 2018. To use it, it is necessary to follow a certain "protocol" of tasks which can be a bit cumbersome. To deal with this gap, the Cnippet tool was created.

Cnippet is a wrapper around PsycheC that integrates with gcc or clang, abstracting away the aformentioned process. In addition, Cnippet understands part of the standard library and recognizes a few typical macros (common in various platforms). To have types inferred for your C program, just invoke Cnippet by forwarding any option that you would pass to the actual C compiler.

$ cnip clang -c test.c -o test.o

I hope you enjoyed the article and give PsycheC a try. Thank you!

License

This article, along with any associated source code and files, is licensed under The BSD License