Introduction
StringLib
C is a library/set of functions designed for C99+ (or C89 implementations supporting stdint.h) that defines string
type and several functions that work with it, allowing easy string manipulation, reading and writing, without having to worry about memory allocation (scary stuff, really).
All dynamic allocation is done automatically: a newly defined string
is considered an empty string
(though it's safer to initialize it to STRING_NEW
), while calling the string_delete(string*)
function frees the memory and empties the string
once again.
The string
type is a structure with no visible members, which prevents the user from modifying it directly, but it still has a well defined size.
C11+ additional features (not required):
The library checks for _Generic()
keyword support and (if present) uses it for function macro allocation: this feature allows using different parameters (like pointer to string
instead of a string_literal
/pointer_to_char
) in many functions.
The header also checks for _Atomic
and _Thread_local
type qualifiers' support, enabling lock functions and thread safe operations.
These options are automatically disabled if not supported, but you can also disable them separately with a macro.
How String Type Works
WARNING: This paragraph will deal with structure obfuscation, which is done to prevent (as much as possible) inexperienced users to edit members that are not intended to be edited separately; if you hate obfuscation and/or obfuscation is against your religion, please close your browser and clear your history before it's too late.
The string
type inside stringlib.h is not defined as a pointer to a structure but as an actual structure with no members (hidden members, russian hackers won't mess with my code this time [please ignore this comment if you are Russian]); this is done through unnamed bit fields that use the sizeof
keyword.
typedef struct
#ifndef _STRINGLIB_OVERLOADING_WAIT_
{
uintptr_t:8*sizeof(uintptr_t);
size_t:8*sizeof(size_t);
size_t:8*sizeof(size_t);
uintptr_t:8*sizeof(uintptr_t);
uintptr_t:8*sizeof(uintptr_t);
size_t:8*sizeof(size_t);
_Atomic volatile struct
{
volatile uintptr_t:8*sizeof(volatile uintptr_t);
volatile uintptr_t:8*sizeof(volatile uintptr_t);
};
}_string;
#if (!defined(string) && _STRINGLIB_DEFINE_) || _STRINGLIB_DEFINE_==2
#undef string
#define string _string
#endif // string
While the hypothetical accessible structure should be:
struct _string_accessible
{
char *getString;
size_t getSize;
size_t getSizeChars;
void *stringAllocation;
void *stringSignature;
size_t locklevel;
_Atomic volatile struct
{
volatile uintptr_t thread_id_pt1;
volatile uintptr_t thread_id_pt2;
} lock;
};
This allows the type to have the given size of a pointer to char
(actual string
), two size_t
indicating the string
size and two pointers to void
(used for dynamic allocation inside functions), despite not having visible members, which makes the string
type uneditable both when declared and called without passing it through the stringlib.h functions or reading its memory by casting to pointer, thus preventing possible memory allocation faults due to wrong editing by inexperienced users (note: string_getString(string*)
returns the correspondant char
pointer).
The choice of having the char
pointer as first member allows the string
to be written by simply passing it as a parameter of printf
, fwrite
, fputs
, memcpy
, etc.; this action is however discouraged, and I advise passing string_getString(string*)
or using the dedicated string_print(string*, ...)
, string_write(string*, ...)
, etc.
functions instead (if second parameter is false
avoids new line at the end).
The stringlib.c file declares five macros for reading and writing each one of the type members (+2 other additional macros for lock members).
#define _STRINGLIB_ACCESS_STRING_(STRING)\
(*(*((char***)((_string*[]){(&(STRING))}))))
#define _STRINGLIB_ACCESS_SIZE_(STRING)\
(*((size_t*)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)])))
#define _STRINGLIB_ACCESS_SIZECHARS_(STRING)\
(*((size_t*)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)+sizeof(size_t)])))
#define _STRINGLIB_ACCESS_ALLOCATION_(STRING)\
(*((void**)&(((uint8_t*)(&(STRING)))[sizeof(uintptr_t)+2*sizeof(size_t)])))
#define _STRINGLIB_ACCESS_SIGNATURE_(STRING)\
(*((void**)&(((uint8_t*)(&(STRING)))[2*sizeof(uintptr_t)+2*sizeof(size_t)])))
In the following headings, the string
type will be referred as the hypothetical accessible structure for convenience and major readability.
How String Allocation Works
String
allocation is checked by every function in the library via string_isAllocated(string*)
function (unnecessary to the user); this behaviour makes it possible to use all the functions with a newly declared string
with extremely low risk of generating errors, as said string
would be treated as an empty "" string
.
int (string_isAllocated)(_string *string_a)
{
int ret;
string_lock(string_a);
ret=(string_a->stringAllocation == _STRINGLIB_ALLOCATION_TRUE_ &&\
string_a->stringSignature == _STRINGLIB_ALLOCATION_TRUE_ &&\
(string_a->getSize == string_a->getSizeChars+1));
string_unlock(string_a);
return ret;
}
The string_isAllocated(string*)
function return is then used inside (and in some cases before) string_init(string*)
function (again, not necessary to the user), which sets the string
to an empty string
with the size of 1 byte.
size_t (string_init)(_string *string_a)
{
int stringMallocFail = 0;
char *tempStrMalloc = NULL;
string_lock(string_a);
if (string_isAllocated(*string_a))
{
free(string_a->getString);
string_a->stringAllocation = _STRINGLIB_ALLOCATION_FALSE_;
string_a->stringSignature = _STRINGLIB_ALLOCATION_FALSE_;
}
string_a->getString = NULL;
tempStrMalloc = (char*) malloc(1 * sizeof(char));
while (tempStrMalloc == NULL)
{
tempStrMalloc = (char*) malloc(1 * sizeof(char));
if (++stringMallocFail == _STRINGLIB_MAX_ALLOC_FAILS_)
{string_unlock(string_a);
free(tempStrMalloc); printf("_string memory initialization failed\n"); return 0;};
}
string_a->getString = tempStrMalloc;
string_a->getString[0] = '\0';
string_a->getSize = 1;
string_a->getSizeChars = 0;
string_a->stringAllocation = _STRINGLIB_ALLOCATION_TRUE_;
string_a->stringSignature = _STRINGLIB_ALLOCATION_TRUE_;
string_unlock(string_a);
return 1;
}
A string
must have both stringAllocation
and stringSignature
members set to _STRINGLIB_ALLOCATION_TRUE_
and the getSize
member to be higher of getSizeChars
by 1 to be allocated, which makes it nearly impossible for a string
to be considered as initialized when first declared (world will most certainly end before); it is however a safe and recommended practice (especially for multithreaded operations) to assign a newly declared string
to STRING_NEW
(defined as an empty string
((string){})
) before passing it to any functions, avoiding any possible allocation error; this assignment can only be done before editing the string
and will cause memory loss if done afterwards: if you want to re-use a string
, just pass it through any library function like string_set(string*)
or string_read(string*)
.
void (string_delete)(_string *string_a)
{
string_lock(string_a);
if (string_isAllocated(*string_a))
{
free(string_a->getString);
string_a->stringAllocation = _STRINGLIB_ALLOCATION_FALSE_;
string_a->stringSignature = _STRINGLIB_ALLOCATION_FALSE_;
}
string_a->getString = NULL;
string_unlock(string_a);
}
Deallocation of a string
is done by calling the function string_delete(string*)
, which frees the char
pointer and sets allocation and signature to _STRINGLIB_ALLOCATION_FALSE_;
any other function can still be used over a previously deleted string
, and the string
will be automatically reallocated.
Text and Binary File Reading/Writing
The library also relies on specific dedicated functions for file reading and writing:
size_t (string_write)(_string *string_a, FILE *file_a, ...)
{
size_t initPos = 0;
size_t pos = 0;
char c_return = '\r';
va_list valist;
string_lock(string_a);
if (string_isAllocated(string_a))
{
while (*(string_a.getString+pos)!= '\0')
{
if (*(string_a->getString+pos)== '\n')
{
*(string_a->getString+pos)='\0';
fputs(string_a->getString+initPos, file_a);
fwrite(&c_return, sizeof(char), 1, file_a);
fputc('\n', file_a);
*(string_a->getString+pos)='\n';
initPos = pos+1;
}
++pos;
}
fputs(string_a.getString+initPos, file_a);
}
string_unlock(string_a);
va_start(valist, file_a);
if (va_arg(valist, int)) fputc('\n', file_a);
va_end(valist);
return pos+1;
}
The text file writing function does not simply write the string
as-is, but also writes an additional carriage return character (don't worry, it's only visible with an hexadecimal editor) before the new line character for every new line within the string
, allowing string_read
and string_readAppend
functions to determine whether the string
continues after a new line.
size_t (string_writeBin)(_string *string_a, FILE *file_a)
{
string_lock(string_a);
if (!string_isAllocated(string_a))
{string_a->getSize = 0; fwrite(&(string_a->getSize),
sizeof(size_t), 1, file_a); string_unlock(string_a); return 0;}
fwrite(&(string_a->getSize), sizeof(size_t), 1, file_a);
fwrite(string_a->getString, sizeof(char), string_a->getSize, file_a);
string_unlock(string_a);
return string_a.getSize;
}
The simpler binary writing function writes the string
size followed by the string
characters or simply writes 0 for unallocated string
s; string_readBin
on the other hand reads the first number, then allocates the string
.
Function Overloading
This feature works only on standard C11 implementations or compiler versions supporting the _Generic()
keyword (GNU 4.9+, Clang 3.0+, xlC 12.01[V2R1]+), can be checked with _STRINGLIB_OVERLOADING_
macro.
The _Generic()
keyword allows type checking in C and can be used inside a macro function for overloading purposes (just like C++, but NO, not really), as in the example:
#define string_set(A, B)\
_Generic(B,\
char*: (string_set)((_string*)A, (char*)B),\
_string*: (string_isAllocated)((_string*)B)?(string_set)((_string*)A,
((string_getString)(*((_string*)((void*)B+0))))+0):(string_set)((_string*)A, ""),\
default: (string_set)((_string*)A, ""))
In this example, generic selection allows a pointer to string
to be passed in the function instead of a pointer to char
, thus giving the user more flexibility.
Simple Overloading
This simple overloading consists in macro definition of functions in which only the number of passed parameters (or whether last parameter is passed) is checked, without checking the parameter types; this is done as a replacement for the oveloading method described above if the _Generic()
keyword is not supported, and in some functions that only require parameters number checking.
Here are two examples of simple overloading:
#define string_appendPos(A, B, C...) ((string_appendPos)(A, B, (size_t)C+0))
#define string_print(A, B...) ((string_print)(A, (sizeof((int[]){B}))?B+0:1))
While the former simply passes the argument C
or argument 0
if empty, the latter creates a new array containing the extra parameters and checks its size.
Both generic selection overloading and simple overloading can be avoided by placing brackets on the function name when calling it.
Thread locking
The functions string_lock(string*)
, string_unlock(string*)
and string_multilock(string_count, string*...)
lock the string for the calling thread.
They are called by each function in the library - making every action thread safe- and can be nested, allowing multiple functions to be called consecutively.
The locks are similiar to a mutex: each string saves a unique ID, indicating which thread is accessing it and a lock count, indicating the number of nested locks called by the thread; the unlock function decreases the lock count, releasing the lock after it reaches zero; the multilock provides a safe way to lock multiple strings, all eventual nested locks should be singular locks.
Implementation Example
int main()
{
string string_a;
string string_b = STRING_NEW;
string_set(&string_a, "hello world\nnew line test");
string_newline(&string_a, "this is the third line");
string_print(&string_a);
printf("SIZE: %d\n", string_getSize(&string_a));
FILE *foo = fopen("text.txt", "w");
string_write(&string_a, foo);
fclose(foo);
foo = fopen("text.txt", "r");
string_read(&string_b, foo);
string_print(&string_b);
printf("SIZE: %d\n", string_getSize(&string_b));
string_delete(&string_a);
string_delete(&string_b);
return 0;
}
Additional Information
StringLib
C is an open source set of functions born for exercise purposes which I then decided to share; any help or suggestion is well accepted.
If you try the library, please give your feedback (here or in the sourceforge page... or both).
Full Documentation
Lock Functions
void string_lock(_string *string_a);
locks the string
for current thread, works on multiple levels
void string_unlock(_string *string_a);
unlocks string
when locklevel==0
(same number of locks and unlocks)
void string_multilock(int string_count, _string *string_a, ...);
locks multiple strings
safely (by address order)
Basic Functions
const char *const string_getString(_string *string_a);
converts string
to char
pointer.
size_t string_getSize(_string *string_a);
returns string
size; Checking functions:
int string_contains(_string *string_a, const char *string_b, ...);
checks if string
contains string
literal, returns position where it is found plus one, pass third parameter (optional) to start searching from a different position of the string
int string_containsChars(_string *string_a, int checkall, ...)
checks if string
contains all passed characters (if argument checkall is 1
or true
) or one of the characters (checkall = 0
or false
); characters are passed from the third parameter
int string_equals(_string *string_a, const void *string_v);
checks if string
is the same as the string
literal
Input/Output Functions
size_t string_set(_string *string_a, const void *string_b);
sets string
to string
literal
size_t string_scan(_string *string_a);
sets string
to user input
size_t string_append(_string *string_a, const void *string_v);
size_t string_appendPos(_string *string_a, const void *string_v, ...);
size_t string_appendCiclePos(_string *string_a, const void *string_v, unsigned int repeatTimes, ...);
appends string
literal to string
size_t string_scanAppend(_string *string_a);
size_t string_scanAppendPos(_string *string_a, ...);
appends user input to string
size_t string_newline(_string *string_a, ...);
creates new line, eventually appends string
literal (can pass stdin
to append user input)
size_t string_cut(_string *string_a, size_t pos, ...);
cuts string
from position ‘pos
’ to eventual end position
size_t string_override(_string *string_a, const void* string_v, ...);
overrides string
literal over the string
starting from eventual position (if specified)
void string_swap(_string *string_a, _string *string_b);
swaps two string
s
void string_delete(_string *string_a);
deletes a string
, deallocating memory
void string_print(_string *string_a, ...);
prints a string
to output console
File Input/Output Functions
size_t string_write(_string *string_a, FILE *file_a, ...);
write to text file
size_t string_writeBin(_string *string_a, FILE *file_a);
write to binary file
size_t string_read(_string *string_a, FILE *file_a, ...);
size_t string_readAppend(_string *string_a, FILE *file_a, ...);
size_t string_readBuffered(_string *string_a, FILE *file_a, size_t buffersize, ...);
size_t string_readAppendBuffered(_string *string_a, FILE *file_a, size_t buffersize, ...);
read from text file
size_t string_readBin(_string *string_a, FILE *file_a);
read from binary file
Unnecessary Functions (Used By Other Functions But Not Necessary for User)
size_t string_init(_string *string_a);
initializes string
int string_isAllocated(_string *string_a);
checks if the string is allocated
History
- [21/03/2017] First beta upload
- [30/03/2017] Major changes to structure access methods
- [09/04/2017] Changed
string
access method, optimized algorithm complexity - [19/04/2017] Added simple overloading (prevents bugs in linux)
- [09/05/2017] Split
string_contains
function; string_contains
now returns position+1
- [13/05/2017]
String_contains
can now check from any starting position - [28/05/2017] Added
string_appendCiclePos
, cleaned code - [08/07/2017] Defined
STRING_NEW
and added buffered read functions - [15/01/2018] Thread safe functions and locks
- [08/03/2018] Multilock function and minor changes