Represents a parser that uses regular expressions and doesn't use any collections. It can parse different formats of the configuration file (also called as the INI file) and keeps the original file formatting when you edit entries.
Introduction
Parsing INI files is a fairly common task in programming when working with configurations. INI files are simple and easy to read by both humans and machines. There are several main ways to implement this:
- Manual parsing using string manipulation functions. This approach allows for maximum flexibility in handling various INI file formats, but requires more effort to implement.
- Using modules of various APIs. They provide ready-made functions for reading, writing, and processing data in the INI format. This is a simpler and faster way, but it is limited by the capabilities of the libraries themselves, and it also makes the project platform-dependent.
- Parsing using common libraries for working with configuration files, such as configparser in Python or .NET's ConfigurationManager. This approach is universal, but may be less flexible than specialized solutions.
- Processing using regular expression.
In this article, I will talk about parsing INI files using regular expressions in C#. This approach provides greater flexibility and customization of processing logic.
Regular expressions require a deeper understanding of syntax, but with them you can modify existing entries and add new ones without collections. This approach provides high performance, flexibility and preservation of original formatting, which makes it an effective solution for working with INI files in various formats using a single tool. However, before I get into the article, I would like to answer a question you may have.
Why do you need to parse INI files in the 21st century?
There is an opinion that the ini file is outdated and is not suitable for storing parameters. I will not argue with this statement, but I will give several examples when using an INI file is justified and is the optimal solution:
- If your software works with the command line, it is more convenient to group and transfer a large number of parameters in bulk via an INI file.
- If your software uses third-party utilities that accept parameters via INI.
- If your software has a small number of parameters and does not have a "settings" window or any graphical interface at all.
- If you need the ability to export and import parameters to a file in a format understandable to the user.
- Just as a backup option or as a nod to the past when the grass was greener and we were younger, when Doom was installed from a floppy disk, and parameters were transferred via INI.
INI file format
This format is quite simple and has long been known to most developers. In general, it is a list of key-value pairs separated by an equal sign, called parameters. For convenience, parameters are grouped into sections, which are enclosed by square brackets. However, despite this, there are still a number of nuances and small differences, since a single standard is not strictly defined. If I create a new parser, my goal is to make it universal, so that it extracts information as efficiently as possible, so when writing a universal parser for working with INI files, these features must be taken into account.
For example, different symbols can be used to indicate comments, the most common options are a hash or a semicolon, as well as various separators between the key and value. In addition to the usual equal sign, a colon is sometimes used in such cases. There are also files in which there are no sections, only key-value pairs. Different systems may use different characters to terminate a line. It is not strongly defined whether the keys "Key" and "key" should be considered different or treated as the same, regardless of case. The file might include syntax errors or any undefined data, but that shouldn't prevent from correctly parsing the valid part of the content.
There is also no consensus on storing arrays of strings. Some standards allow multiple keys with the same name, others - the use of escaped characters to separate strings within the parameter value. Although most often the parser extracts the single value that found first. Our parser can handle all these tasks equally well.
Here is an example of syntax highlighting using a popular text editor. As you can see, its format does not provide for a comment after the section name or entry value. Why not?
Regular expression
After much research, I came up with the following regular expression that allows you to determine the meaning of each character in an ini file. In its entirety it looks like this:
(?=\S)(?<text>(?<comment>(?<open>[#;]+)(?:[^\S\r\n]*)(?<value>.+))|(?<section>(?<open>\[)(?:\s*)(?<value>[^\]]*\S+)(?:[^\S\r\n]*)(?<close>\]))|(?<entry>(?<key>[^=\r\n\ [\]]*\S)(?:[^\S\r\n]*)(?<delimiter>:|=)(?:[^\S\r\n]*)(?<value>[^#;\r\n]*))|(?<undefined>.+))(?<=\S)|(?<linebreaker>\r\n|\n)|(?<whitespace>[^\S\r\n]+)
Before we move on to writing the code, I want to break down the parsing regular expression itself and explain what each piece is for.
-
(?=\S)
is a positive lookahead condition that checks that the next character is not a whitespace. This is necessary to skip leading whitespace in the line.
-
(?<text>....)
is a named group that captures the text block of the file.. This will allow us to get the entire content of the file for further analysis.
-
(?<comment>(?<open>[#;]+)(?:[^\S\r\n]*)(?<value>.+))
is a named group that captures comments in the file. It consists of:
(?<open>[#;]+)
is a group that captures one or more "#" or ";" characters, denoting the beginning of a comment. (?:[^\S\r\n]*)
- is group of characters captures all whitespaces except newlines. This is how indentation within a line is handled. (?<value>.+)
- a group that captures all characters up to the end of the line, i.e. the entire comment text.
-
(?<section>(?<open>\[)(?:\s*)(?<value>[^\]]*\S+)(?:[^\S\r\n]*)(?<close>\]))
- is a named group that captures sections. It consists of:
(?<open>\[)
- a group that captures the "[" character, which denotes the beginning of a section. (?:\s*)
- a group that captures zero or more whitespace characters. (?<value>[^\]]*\S+)
- a group that captures one or more non-whitespace characters, not including the "]" character. (?:[^\S\r\n]*)
- again, a group that captures indents. (?<close>\])
is a group that captures the "]" character, which marks the end of a section.
-
(?<entry>(?<key>[^=\r\n\[\]]*\S)(?:[^\S\r\n]*)(?<delimiter>:|=)(?:[^\S\r\n]*)(?<value>[^#;\r\n]*))
is a named group that captures entries (key-value). It consists of:
(?<key>[^=\r\n\[\]]*\S)
is a group that captures one or more non-whitespace characters, not including the "=", newline, and "[" characters. (?:[^\S\r\n]*)
- indents, see above. (?<delimiter>:|=)
is a group that captures the ":" or "=" character separating the key and value. (?:[^\S\r\n]*)
- indents. (?<value>[^#;\r\n]*)
is a group that captures zero or more characters, not including "#", ";", newline characters.
-
(?<undefined>.+)
is a named group that captures any undefined parts of the text that did not match the previous groups.
-
(?<=\S)
is a positive lookahead condition that checks that the preceding character is not a whitespace character. This is necessary to skip trailing whitespace in the line.
-
(?<linebreaker>\r\n|\n)
is a named group that captures newline characters ("\r\n" or "\n").
-
(?<whitespace>[^\S\r\n]+)
is a named group that captures indents at the beginning and end of a line.
This is a very detailed and carefully designed regular expression designed to accurately parse the structure of an INI file and extract all the necessary components (sections, keys, values, comments, etc.) from it. It can handle various formatting variations of INI files and provides a robust and flexible way of parsing.
Take a look at how this regular expression parses the configuration file above:
You can experiment with this regular expression using this link.
C-Sharp coding
To solve the problem of parsing INI files using regular expressions, I created the IniFile class. This class will be responsible for reading and parsing the contents of an INI file using regular expressions to extract keys, values, and sections. The class has methods for loading a file, getting a list of sections, getting values by keys, and writing changes back to the file. Using regular expressions, IniFile will be able to handle various configuration file formats, including files with comments, indents, spaces, syntax errors, and other features. This will make the parser more flexible and universal. To use the class, you need to pass it a string or stream containing the INI file data and parsing settings.
See how to read the same configuration file from the previous example using the IniFile class.
var iniFile = IniFile.Load("config.ini");
string keyAboveSection = iniFile.ReadString(null, "key");
int num1 = iniFile.ReadInt32("Section1", "number1");
int num2 = iniFile.Read<int>("Section1", "Number2");
double pi = iniFile.ReadDouble("Section2", "NumberPI");
string singleString = iniFile["Section2", "SingleString"];
string multiString = iniFile.ReadString("Section2", "MultiString");
string[] arrayString = iniFile.ReadStrings("Section2", "ArrayString");
Encoding encoding = iniFile.Read<Encoding>("Section3", "encoding",
Encoding.UTF8, new CustomEncodingConverter);
CultureInfo culture = iniFile.Read<CultureInfo>("Section3", "culture",
CultureInfo.InvariantCulture);
Uri uri = iniFile.Read<Uri>("Section3", "url");
Amazing, right? Let's take a look at how this was possible and what other features this class offers.
Key features of the class
- Support for various loading and saving methods: the class provides methods for loading INI files from a string, stream, or file, as well as saving them to a stream or file.
- Using regular expressions: Using regular expressions allows for flexible and efficient handling of various INI file formats, including support for comments, sections, and key-values. This makes the code more compact and easily extensible compared to using manual string processing.
- No dependence on collections: the class does not use collections to store INI file data, which makes it more memory efficient and simplifies working with large files.
- Preserving original formatting: When modifying existing entries or adding new ones, the class preserves the original INI file formatting, including the location of comments, spaces, and line breaks. This helps maintain the readability and structure of the file.
- Support for escape characters: the class provides the ability to work with escape characters in key values. This allows for the correct handling of special characters such as tabs, line feeds, etc.
- Automatic detection of line break characters: the class automatically detects the type of line break characters (CRLF, LF, or CR) in the INI file and uses them when saving changes. If no new lines are found, the default choice for the current operating system will be used.
- Automatic detection of encoding: the class automatically detects the encoding used in a file based on the first 4 bytes, also known as the byte order mark (BOM). This future may not be the most powerful, but it's better than not having any detection at all.
- Flexible customization of string comparison: the class allows you to customize the string comparison rules (case sensitivity, cultural specificity) according to the requirements of the application.
- Support for various loading and saving methods: the class provides methods for loading INI files from a string, stream, or file, as well as saving them to a stream or file.
- Convenient API for working with INI files: the class offers a simple and intuitive API for reading and writing values to INI files, including support for various data types.
Thus, using the IniFile class allows you to efficiently and flexibly work with INI files, preserving their structure and formatting, and also provides ample opportunities for customization and expansion of functionality.
Structure of the class
The class stores fields that allow you to more precisely customize the file analysis process.
- Storing the INI file contents. The class has a private field
_content
to store the INI file contents. - Regular expression for parsing. The class uses a regular expression stored in the
_regex
field to parse the INI file. - Support for escaped characters. The
_allowEscapeChars
flag determines whether escaped characters are allowed in the INI file. - Defining the type of line breaker. Different operating systems use different methods to mark the end of a line. Before we can process a file, we must determine which method is used in the file. The
_lineBreaker
field contains the string used to represent line breaks in the INI file. - Culture information. The
_culture
field contains information about the culture used for parsing. - String comparison rules. The
_comparison
field determines how string comparisons are performed in the INI file.
Class methods
The IniFile class provides convenient methods for reading and writing values of various types to INI files:
- Reading values:
ReadString
, ReadStrings
- for reading string values, including string arrays. ReadObject
, Read<T>
- for reading values of arbitrary types using TypeConverter
. ReadArray
- for reading arrays of values of arbitrary types. - Methods for reading primitive data types:
ReadBoolean
, ReadChar
, ReadSByte
, ReadByte
, ReadInt16
, ReadUInt16
, ReadInt32
, ReadUInt32
, ReadInt64
, ReadUInt64
, ReadSingle
, ReadDouble
, ReadDecimal
, ReadDateTime
.
- Writing values:
WriteString
, WriteStrings
- for writing string values, including string arrays. WriteObject
, Write<T>
- for writing values of arbitrary types using TypeConverter. WriteArray
- for writing arrays of values of arbitrary types. - Methods for writing primitive data types:
WriteBoolean
, WriteChar
, WriteSByte
, WriteByte
, WriteInt16
, WriteUInt16
, WriteInt32
, WriteUInt32
, WriteInt64
, WriteUInt64
, WriteSingle
, WriteDouble
, WriteDecimal
, WriteDateTime
. These methods allow you to easily read and write values to INI files, automatically performing type conversion using TypeConverter
. This simplifies working with INI files and makes the code more readable and reliable. -
- In addition, the IniFile class provides convenient methods for automatically initializing object properties based on data stored in an INI file. This significantly simplifies and speeds up the process of reading and writing settings to INI files. The
ReadSettings
and WriteSettings
methods allow you to automatically read and write all static properties of a given type, including nested types. This is very useful when an application has many settings distributed across different classes. The ReadProperty
and WriteProperty
methods allow you to read and write the values of individual properties of objects. In doing so, they automatically determine the section and key for the property based on its name and type, which eliminates the need for the developer to manually specify this information. Thus, using these methods greatly simplifies working with INI files, making it more efficient and less prone to errors compared to manually managing reading and writing settings. They also support various data types, including arrays, and provide the ability to use custom type converters.
These methods use a regular expression stored in the _regex field to process the contents of the INI file. The class also provides a number of helper methods for working with regular expressions, strings, and the file system.
The class provides static Load
methods for loading INI files from various sources (string, stream, file) and Save
methods for saving the contents of the INI file to various output streams.
The IniFile class contains a number of additional helper methods for working with parser settings, regular expressions, strings, and the file system:
GetCultureInfo
: returns a CultureInfo
object that defines the string comparison rules for the specified StringComparison
. GetRegexOptions
: sets or clears the RegexOptions
flags based on the specified StringComparison
, returning the modified value. GetComparer
: returns a StringComparer
object based on the specified StringComparison. ToEscape
: escapes special characters in the input string using backslashes. UnHex
: converts a hexadecimal number to a Unicode character. UnEscape
: converts any escaped characters in the input string. MoveIndexToEndOfLinePosition
: moves the index to the end of the current line in the StringBuilder
. InsertLine
: inserts the specified string at the specified index in the StringBuilder
, followed by the specified new line separator, and updates the index. AutoDetectLineBreaker
: determines the type of line separator ("\r\n"
, "\n"
, or "\r"
) in the specified string. AutoDetectLineEncoding
: tries to determine the text encoding based on the first four bytes of the file content (BOM). MayBeToLower
: converts a string to lowercase if necessary according to the specified StringComparison
. IsInvalidPath
: checks if a file name string contains invalid characters for a path. ValidateFileName
: checks if a file name is valid and, optionally, whether the file exists. GetFullPath
: returns the full path to the file with the given name, checking its validity. GetDeclaringPath
: returns the declaration path of the specified type using the specified separator.
These helper methods are used inside the IniFile
class to ensure correct work with regular expressions, strings, file system and parser settings.
How it works
This class does not use collections to store data. Instead, it uses regular expressions to parse the contents of the INI file. This allows you to do without collections and preserve the original file format when editing.
The general algorithm for traversing the contents of an INI file, used in the GetSections
, GetKeys
, GetValue
, GetValues
, SetValue
, and SetValues
methods, is as follows:
- A regular expression is initialized that splits the file contents into sections, keys, and values.
- Iterates over all matches of the regular expression in the file contents.
- For each match, it is checked whether it is a section, key, or value.
- Depending on the match type, information about it is saved and used in the corresponding methods.
- For the
GetValue
, GetValues
, SetValue
, and SetValues
methods, it is additionally monitored which section the current match is in in order to return or set the value in the correct section. - The results of processing all matches are returned or used to modify the file contents. This approach allows you to efficiently work with the contents of an INI file without the need to use collections, while preserving the original file format.
The core of all this methods looks like this:
Regex regex;
string content;
for (Match match = regex.Match(content); match.Success; match = match.NextMatch())
{
if (match.Groups["section"].Success)
{
}
if (match.Groups["entry"].Success)
{
}
}
If we need to get a list of all sections in a file, it is very simple:
string section;
string key;
HashSet<string> sections = new HashSet<string>();
for (Match match = _regex.Match(Content); match.Success; match = match.NextMatch())
{
if (match.Groups["section"].Success)
{
sections.Add(match.Groups["value"].Value);
}
}
In other cases where we need to process keys, we first check if we are in a section using an additional flag inSection
.
string section;
string key;
HashSet<string> keys = new HashSet<string>();
bool inSection = false;
for (Match match = _regex.Match(Content); match.Success; match = match.NextMatch())
{
if (match.Groups["section"].Success)
{
inSection = match.Groups["value"].Value.Equals(section);
continue;
}
if (inSection && match.Groups["entry"].Success)
{
string key = match.Groups["key"].Value;
keys.Add(key);
}
}
The general algorithm for writing data in the SetValue
and SetValues
methods is as follows:
- A string builder instance is created, which will be used to modify the contents of the INI file.
- The contents of the file are iterated over using a regular expression.
- If section an entry is found that matches the searched key, then:
- A group representing the value is obtained.
- The index and length of the group representing the value are calculated.
- The old value is removed from the string builder.
- If the new value is not empty, it is inserted into the string builder.
- If after the iteration the flag indicating that the value is not set is still set, then:
- The index is calculated where the new entry should be inserted.
- If this is not a global section and the section has not yet been encountered, a new section is inserted into the string builder.
- A new entry with the key and value is inserted into the string builder.
- The contents of the string builder are written back to the content of the
IniFile
instance.
Updating an existing key value is done by replacing the found value by its index and length. The new value is inserted in the same place, instead of the previous one, preserving the indent. The implementation is quite simple:
string section;
string key;
string value;
StringBuilder sb = new StringBuilder(content);
for (Match match = _regex.Match(Content); match.Success; match = match.NextMatch())
{
if (match.Groups["section"].Success)
{
inSection = match.Groups["value"].Value.Equals(section);
continue;
}
if (inSection && match.Groups["entry"].Success)
{
if (!match.Groups["key"].Value.Equals(key))
continue;
Group group = match.Groups["value"];
int index = group.Index;
int length = group.Length;
sb.Remove(index, length);
sb.Insert(index, value);
}
}
content = sb.ToString();
Using the code
Basic operations
Here are some examples of using the IniFile
class.
Opening a file:
IniFile ini = IniFile.Load("config.ini", Encoding.UTF8,
StringComparison.InvariantCultureIgnoreCase, true);
ini = IniFile.Load("config.ini");
Reading a parameter into a string variable:
string value = ini.ReadString("Section1", "Key1", "default value");
value = ini.ReadString("", "Key0", "default value");
value = ini.ReadString(null, "Key0", "default value");
Reading a parameter into an integer variable:
int intValue = ini.ReadInt32("Section1", "IntKey", 42);
Reading an array of strings into a new variable:
string[] values = ini.ReadStrings("Section1", "ArrayKey", "default1", "default2");
Reading using an indexer:
string value = ini["Section1", "Key1", "default"];
Reading various types of objects:
CultureInfo culture1 = (CultureInfo)ini.ReadObject("Settings", "Culture",
typeof(CultureInfo),
CultureInfo.InvariantCulture,
new CultureInfoTypeConverter());
Uri uri = ini.Read<Uri>("Settings", "Culture");
Writing using the indexer:
ini["Section1", "Key1"] = "new value";
Writing a string:
ini.WriteString("Section1", "Key1", "new value");
Writing an array of strings:
ini.WriteStrings("Section1", "ArrayKey", "value1", "value2", "value3");
Writing using an indexer:
ini["Section1", "Key1"] = "new value";
Saving a file:
ini.Save("config.ini");
Initializing custom classes
First, let's create the Person
class:
public class Person
{
public string Name { get; set; }
public int Age { get; set; }
public DateTime Birthday { get; set; }
}
Now, let's look at an example of using the ReadSettings
method for the class we created:
IniFile ini = new IniFile("person.ini");
Person person = new Person();
ini.ReadSettings(person);
Console.WriteLine($"Name: {person.Name}");
Console.WriteLine($"Age: {person.Age}");
Console.WriteLine($"Birthday: {person.Birthday.ToString("yyyy-MM-dd")}");
Contents of person.ini used by this code:
[Person]
Name=John
Age=35
Birthdyay=1989-04-25
At the same time, to read a Person
object from the INI file parameters, you can use the following approach using type converter. It is necessary to create a PersonTypeConverter
for the Person
class:
public class PersonTypeConverter : TypeConverter
{
public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
{
return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
}
public override object ConvertFrom(ITypeDescriptorContext context,
CultureInfo culture, object value)
{
if (value is string str)
{
string[] parts = str.Split(',');
if (parts.Length == 3)
{
return new Person
{
Name = parts[0].Trim(),
Age = int.Parse(parts[1].Trim()),
Birthday = DateTime.Parse(parts[2].Trim())
};
}
}
return base.ConvertFrom(context, culture, value);
}
public override object ConvertTo(ITypeDescriptorContext context,
CultureInfo culture,
object value,
Type destinationType)
{
if (value is Person person)
{
return $"{person.Name},
{person.Age},
{person.Birthday.ToString("yyyy-MM-dd")}";
}
return base.ConvertTo(context, culture, value, destinationType);
}
}
This way we can store and read Person
objects in an INI file using a custom TypeConverter
:
var iniFile = new IniFile();
DateTime birthDay = DateTime.ParseExact(
"25-04-1989",
"dd-MM-yyyy",
CultureInfo.InvariantCulture
);
Person person = new Person { Name = "John", Age = 35, Birthday = birthDay };
iniFile.WriteObject("Section1", "Person", person, new PersonTypeConverter());
var person = iniFile.Read<Person>("Section1", "Person", null, new PersonTypeConverter());
iniFile.Save("persons.ini:);
Contents of persons.ini generated by this code:
[Section1]
Person=John,35,1989-04-25
As you can see from the examples, IniFile
provides many advantages when working with various data types.
By using regular expressions and without collections, IniFile provides fast reading and writing of data to the INI file, with preservation of formatting. This allows working with large amounts of data without performance degradation.
IniFile supports automatic initialization of object properties based on data from the INI file. This simplifies setting up an application and eliminates the need to manually extract and assign values to properties.
IniFile provides convenient methods for reading and writing data of different types, including standard .NET types. It also allows the use of custom types using TypeConverter, which reduces the chance of errors when converting types. Thus, IniFile
is a powerful and flexible tool for working with INI files, providing high performance, ease of use and extensibility when working with arbitrary data types.
Conclusion
Using regular expressions to parse INI files in C# provides an efficient and flexible way to handle configuration data. This approach allows not only to parse the contents correctly, but also to preserve the original formatting, which can be critical in some applications.
Although the class does not use collections, if you need to add data caching, it can be easily modified to use MatchCollection
.
I hope this article helps you better understand and use regular expressions to work with INI files!