Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / string

StringLib C: Library Adding String Type to C

4.93/5 (38 votes)
8 Mar 2018MIT8 min read 484.3K  
C library defining string type and string manipulation functions

Image 1

Introduction

StringLib C is a library/set of functions designed for C99+ (or C89 implementations supporting stdint.h) that defines string type and several functions that work with it, allowing easy string manipulation, reading and writing, without having to worry about memory allocation (scary stuff, really).

All dynamic allocation is done automatically: a newly defined string is considered an empty string (though it's safer to initialize it to STRING_NEW), while calling the string_delete(string*) function frees the memory and empties the string once again.

The string type is a structure with no visible members, which prevents the user from modifying it directly, but it still has a well defined size.

C11+ additional features (not required):

The library checks for _Generic() keyword support and (if present) uses it for function macro allocation: this feature allows using different parameters (like pointer to string instead of a string_literal/pointer_to_char) in many functions.

The header also checks for _Atomic and _Thread_local type qualifiers' support, enabling lock functions and thread safe operations.

These options are automatically disabled if not supported, but you can also disable them separately with a macro.

How String Type Works

WARNING: This paragraph will deal with structure obfuscation, which is done to prevent (as much as possible) inexperienced users to edit members that are not intended to be edited separately; if you hate obfuscation and/or obfuscation is against your religion, please close your browser and clear your history before it's too late.

The string type inside stringlib.h is not defined as a pointer to a structure but as an actual structure with no members (hidden members, russian hackers won't mess with my code this time [please ignore this comment if you are Russian]); this is done through unnamed bit fields that use the sizeof keyword.

C++
typedef struct
#ifndef _STRINGLIB_OVERLOADING_WAIT_
{
    uintptr_t:8*sizeof(uintptr_t);
    size_t:8*sizeof(size_t);
    size_t:8*sizeof(size_t);
    uintptr_t:8*sizeof(uintptr_t);
    uintptr_t:8*sizeof(uintptr_t);
    //Thread lock
    size_t:8*sizeof(size_t);
    _Atomic volatile struct
    {
        volatile uintptr_t:8*sizeof(volatile uintptr_t);
        volatile uintptr_t:8*sizeof(volatile uintptr_t);
    };
}_string;

#if (!defined(string) && _STRINGLIB_DEFINE_) || _STRINGLIB_DEFINE_==2
#undef string
#define string _string
#endif // string

While the hypothetical accessible structure should be:

C++
struct _string_accessible
{
    //actual string
    char *getString;
    //string size in bytes
    size_t getSize;
    //string size in characters (=size in bytes-1)
    size_t getSizeChars;
    //should both point to _STRINGLIB_ALLOCATION_TRUE_ when string is allocated
    void *stringAllocation;
    void *stringSignature;
    //Thread lock
    size_t locklevel;
    _Atomic volatile struct
    {
        volatile uintptr_t thread_id_pt1;
        volatile uintptr_t thread_id_pt2;
    } lock;
};

This allows the type to have the given size of a pointer to char (actual string), two size_t indicating the string size and two pointers to void (used for dynamic allocation inside functions), despite not having visible members, which makes the string type uneditable both when declared and called without passing it through the stringlib.h functions or reading its memory by casting to pointer, thus preventing possible memory allocation faults due to wrong editing by inexperienced users (note: string_getString(string*) returns the correspondant char pointer).

The choice of having the char pointer as first member allows the string to be written by simply passing it as a parameter of printf, fwrite, fputs, memcpy, etc.; this action is however discouraged, and I advise passing string_getString(string*) or using the dedicated string_print(string*, ...), string_write(string*, ...), etc. functions instead (if second parameter is false avoids new line at the end).

The stringlib.c file declares five macros for reading and writing each one of the type members (+2 other additional macros for lock members).

C++
#define _STRINGLIB_ACCESS_STRING_(STRING)\
    (*(*((char***)((_string*[]){(&(STRING))}))))
#define _STRINGLIB_ACCESS_SIZE_(STRING)\
    (*((size_t*)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)])))
#define _STRINGLIB_ACCESS_SIZECHARS_(STRING)\
    (*((size_t*)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)+sizeof(size_t)])))
#define _STRINGLIB_ACCESS_ALLOCATION_(STRING)\
    (*((void**)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)+2*sizeof(size_t)])))
#define _STRINGLIB_ACCESS_SIGNATURE_(STRING)\
    (*((void**)&(((uint8_t*)(&(STRING)))[2*sizeof(uintptr_t)+2*sizeof(size_t)])))

In the following headings, the string type will be referred as the hypothetical accessible structure for convenience and major readability.

How String Allocation Works

String allocation is checked by every function in the library via string_isAllocated(string*) function (unnecessary to the user); this behaviour makes it possible to use all the functions with a newly declared string with extremely low risk of generating errors, as said string would be treated as an empty "" string.

C++
int (string_isAllocated)(_string *string_a)
{
    int ret;
    string_lock(string_a);
    ret=(string_a->stringAllocation == _STRINGLIB_ALLOCATION_TRUE_ &&\
	    string_a->stringSignature == _STRINGLIB_ALLOCATION_TRUE_ &&\
	    (string_a->getSize == string_a->getSizeChars+1));
    string_unlock(string_a);
    return ret;
}

The string_isAllocated(string*) function return is then used inside (and in some cases before) string_init(string*) function (again, not necessary to the user), which sets the string to an empty string with the size of 1 byte.

C++
size_t (string_init)(_string *string_a)
{
    int stringMallocFail = 0;
    char *tempStrMalloc = NULL;

    string_lock(string_a);

    //frees string_a if already allocated
    if (string_isAllocated(*string_a))
    {
        free(string_a->getString);
        //unallocates string
        string_a->stringAllocation = _STRINGLIB_ALLOCATION_FALSE_;
        string_a->stringSignature = _STRINGLIB_ALLOCATION_FALSE_;
    }
    string_a->getString = NULL;

    //initializes string, if initialization fails ends function returning 0
    tempStrMalloc = (char*) malloc(1 * sizeof(char));
    while (tempStrMalloc == NULL)
    {
        tempStrMalloc = (char*) malloc(1 * sizeof(char));
        //should never fail in normal circumstances
        if (++stringMallocFail == _STRINGLIB_MAX_ALLOC_FAILS_)
		{string_unlock(string_a); 
         free(tempStrMalloc); printf("_string memory initialization failed\n"); return 0;};
    }
    string_a->getString = tempStrMalloc;
    string_a->getString[0] = '\0';

    string_a->getSize = 1;
    string_a->getSizeChars = 0;

    //allocates string
    string_a->stringAllocation = _STRINGLIB_ALLOCATION_TRUE_;
    string_a->stringSignature = _STRINGLIB_ALLOCATION_TRUE_;

    string_unlock(string_a);
    return 1;
}

A string must have both stringAllocation and stringSignature members set to _STRINGLIB_ALLOCATION_TRUE_ and the getSize member to be higher of getSizeChars by 1 to be allocated, which makes it nearly impossible for a string to be considered as initialized when first declared (world will most certainly end before); it is however a safe and recommended practice (especially for multithreaded operations) to assign a newly declared string to STRING_NEW (defined as an empty string ((string){})) before passing it to any functions, avoiding any possible allocation error; this assignment can only be done before editing the string and will cause memory loss if done afterwards: if you want to re-use a string, just pass it through any library function like string_set(string*) or string_read(string*).

C++
void (string_delete)(_string *string_a)
{
    string_lock(string_a);
    //frees string_a if already allocated
    if (string_isAllocated(*string_a))
    {
        free(string_a->getString);
        //unallocates string
        string_a->stringAllocation = _STRINGLIB_ALLOCATION_FALSE_;
        string_a->stringSignature = _STRINGLIB_ALLOCATION_FALSE_;
    }
    string_a->getString = NULL;
    string_unlock(string_a);
}

Deallocation of a string is done by calling the function string_delete(string*), which frees the char pointer and sets allocation and signature to _STRINGLIB_ALLOCATION_FALSE_; any other function can still be used over a previously deleted string, and the string will be automatically reallocated.

Text and Binary File Reading/Writing

The library also relies on specific dedicated functions for file reading and writing:

C++
size_t (string_write)(_string *string_a, FILE *file_a, ...)
{
    //new version also buffers the whole string if multithreading
    size_t initPos = 0;
    size_t  pos = 0;
    char c_return = '\r';
    va_list valist;
    string_lock(string_a);
    if (string_isAllocated(string_a))
    {
        while (*(string_a.getString+pos)!= '\0')
        {
            //if string has a new line, prints '\0' char before new line
            if (*(string_a->getString+pos)== '\n')
            {
                *(string_a->getString+pos)='\0';
                fputs(string_a->getString+initPos, file_a);
                fwrite(&c_return, sizeof(char), 1, file_a);
                fputc('\n', file_a);
                *(string_a->getString+pos)='\n';
                initPos = pos+1;
            }
            ++pos;
        }
        //prints string last line (could be whole string)
        fputs(string_a.getString+initPos, file_a);
    }
    string_unlock(string_a);
    //print new line if second argument is not 0
    va_start(valist, file_a);
    if (va_arg(valist, int)) fputc('\n', file_a);
    va_end(valist);
    return pos+1;
}

The text file writing function does not simply write the string as-is, but also writes an additional carriage return character (don't worry, it's only visible with an hexadecimal editor) before the new line character for every new line within the string, allowing string_read and string_readAppend functions to determine whether the string continues after a new line.

C++
size_t (string_writeBin)(_string *string_a, FILE *file_a)
{
    //new version also buffers the whole string if multithreading
    string_lock(string_a);
    if (!string_isAllocated(string_a))
	    {string_a->getSize = 0; fwrite(&(string_a->getSize), 
                                sizeof(size_t), 1, file_a); string_unlock(string_a); return 0;}
    //writes string size
    fwrite(&(string_a->getSize), sizeof(size_t), 1, file_a);
    //writes string
    fwrite(string_a->getString, sizeof(char), string_a->getSize, file_a);
    string_unlock(string_a);
    return string_a.getSize;
}

The simpler binary writing function writes the string size followed by the string characters or simply writes 0 for unallocated strings; string_readBin on the other hand reads the first number, then allocates the string.

Function Overloading

This feature works only on standard C11 implementations or compiler versions supporting the _Generic() keyword (GNU 4.9+, Clang 3.0+, xlC 12.01[V2R1]+), can be checked with _STRINGLIB_OVERLOADING_ macro.

The _Generic() keyword allows type checking in C and can be used inside a macro function for overloading purposes (just like C++, but NO, not really), as in the example:

C++
///Allocation checking inside macro should be removed in next update
#define string_set(A, B)\
    _Generic(B,\
    char*: (string_set)((_string*)A, (char*)B),\
    _string*: (string_isAllocated)((_string*)B)?(string_set)((_string*)A, 
         ((string_getString)(*((_string*)((void*)B+0))))+0):(string_set)((_string*)A, ""),\
    default: (string_set)((_string*)A, ""))

In this example, generic selection allows a pointer to string to be passed in the function instead of a pointer to char, thus giving the user more flexibility.

Simple Overloading

This simple overloading consists in macro definition of functions in which only the number of passed parameters (or whether last parameter is passed) is checked, without checking the parameter types; this is done as a replacement for the oveloading method described above if the _Generic() keyword is not supported, and in some functions that only require parameters number checking.

Here are two examples of simple overloading:

C++
//Overloading of string_appendPos
#define string_appendPos(A, B, C...) ((string_appendPos)(A, B, (size_t)C+0))
//Overloading of string_print
#define string_print(A, B...) ((string_print)(A, (sizeof((int[]){B}))?B+0:1))

While the former simply passes the argument C or argument 0 if empty, the latter creates a new array containing the extra parameters and checks its size.

Both generic selection overloading and simple overloading can be avoided by placing brackets on the function name when calling it.

Thread locking

The functions string_lock(string*), string_unlock(string*) and string_multilock(string_count, string*...) lock the string for the calling thread.

They are called by each function in the library - making every action thread safe- and can be nested, allowing multiple functions to be called consecutively.

The locks are similiar to a mutex: each string saves a unique ID, indicating which thread is accessing it and a lock count, indicating the number of nested locks called by the thread; the unlock function decreases the lock count, releasing the lock after it reaches zero; the multilock provides a safe way to lock multiple strings, all eventual nested locks should be singular locks.

Implementation Example

C++
int main()
{
    //unrecommended (unsafe)
    string string_a;
    //recommended (safe)
    string string_b = STRING_NEW;
    
    string_set(&string_a, "hello world\nnew line test");
    string_newline(&string_a, "this is the third line");
    
    string_print(&string_a);
    printf("SIZE: %d\n", string_getSize(&string_a));
    
    FILE *foo = fopen("text.txt", "w");
    string_write(&string_a, foo);
    fclose(foo);
    
    foo = fopen("text.txt", "r");
    string_read(&string_b, foo);
    string_print(&string_b);
    printf("SIZE: %d\n", string_getSize(&string_b));
    string_delete(&string_a);
    string_delete(&string_b);
    return 0;
}

Additional Information

StringLib C is an open source set of functions born for exercise purposes which I then decided to share; any help or suggestion is well accepted.

If you try the library, please give your feedback (here or in the sourceforge page... or both).

Full Documentation

Lock Functions

C++
void string_lock(_string *string_a);

locks the string for current thread, works on multiple levels

C++
void string_unlock(_string *string_a);

unlocks string when locklevel==0 (same number of locks and unlocks)

C++
void string_multilock(int string_count, _string *string_a, ...);

locks multiple strings safely (by address order)

Basic Functions

C++
const char *const string_getString(_string *string_a);

converts string to char pointer.

C++
size_t string_getSize(_string *string_a); 

returns string size; Checking functions:

C++
int string_contains(_string *string_a, const char *string_b, ...);

checks if string contains string literal, returns position where it is found plus one, pass third parameter (optional) to start searching from a different position of the string

C++
int string_containsChars(_string *string_a, int checkall, ...)

checks if string contains all passed characters (if argument checkall is 1 or true) or one of the characters (checkall = 0 or false); characters are passed from the third parameter

C++
int string_equals(_string *string_a, const void *string_v);

checks if string is the same as the string literal

Input/Output Functions

C++
size_t string_set(_string *string_a, const void *string_b); 

sets string to string literal

C++
size_t string_scan(_string *string_a); 

sets string to user input

C++
size_t string_append(_string *string_a, const void *string_v);
size_t string_appendPos(_string *string_a, const void *string_v, ...);
size_t string_appendCiclePos(_string *string_a, const void *string_v, unsigned int repeatTimes, ...);

appends string literal to string

C++
size_t string_scanAppend(_string *string_a);
size_t string_scanAppendPos(_string *string_a, ...); 

appends user input to string

C++
size_t string_newline(_string *string_a, ...); 

creates new line, eventually appends string literal (can pass stdin to append user input)

C++
size_t string_cut(_string *string_a, size_t pos, ...); 

cuts string from position ‘pos’ to eventual end position

C++
size_t string_override(_string *string_a, const void* string_v, ...); 

overrides string literal over the string starting from eventual position (if specified)

C++
void string_swap(_string *string_a, _string *string_b); 

swaps two strings

C++
void string_delete(_string *string_a); 

deletes a string, deallocating memory

C++
void string_print(_string *string_a, ...); 

prints a string to output console

File Input/Output Functions

C++
size_t string_write(_string *string_a, FILE *file_a, ...);

write to text file

C++
size_t string_writeBin(_string *string_a, FILE *file_a);

write to binary file

C++
size_t string_read(_string *string_a, FILE *file_a, ...); 
size_t string_readAppend(_string *string_a, FILE *file_a, ...);
size_t string_readBuffered(_string *string_a, FILE *file_a, size_t buffersize, ...);
size_t string_readAppendBuffered(_string *string_a, FILE *file_a, size_t buffersize, ...);

read from text file

C++
size_t string_readBin(_string *string_a, FILE *file_a);

read from binary file

Unnecessary Functions (Used By Other Functions But Not Necessary for User)

C++
size_t string_init(_string *string_a);

initializes string

C++
int string_isAllocated(_string *string_a);

checks if the string is allocated

History

  • [21/03/2017] First beta upload
  • [30/03/2017] Major changes to structure access methods
  • [09/04/2017] Changed string access method, optimized algorithm complexity
  • [19/04/2017] Added simple overloading (prevents bugs in linux)
  • [09/05/2017] Split string_contains function; string_contains now returns position+1
  • [13/05/2017] String_contains can now check from any starting position
  • [28/05/2017] Added string_appendCiclePos, cleaned code
  • [08/07/2017] Defined STRING_NEW and added buffered read functions
  • [15/01/2018] Thread safe functions and locks
  • [08/03/2018] Multilock function and minor changes

License

This article, along with any associated source code and files, is licensed under The MIT License