This Tip demonstrates a solution using Qt. The regular expression itself can be used with any PERL-compatible regex engine. You can find the discussion which lead me to write this Tip at the
RegEx Forum.
Introduction
Parsing a Key-Value pair ain't that hard, you say? I'm sure it isn't if you keep it as simple as possible and define that a Key can't contain anything else than characters and numbers, a value can't contain anything else than characters and numbers and they are separated by a '=':
Key1=Value1 Key2=Value2
But what if we advance that a bit? I want a Value to contain white-spaces, optional ones:
Key1=Value 1 Key2=Value Key3=Value 3
We get somewhere, as you can see. But I want more. I want the value to possibly contain everything. Of course this leads us to a problem, because the '=' is already reserved as separator between Key and Value - It's solvable by escaping the '=' if it occurs in a value. I made up this practical example where this pattern might be of use:
ErrorMessage=The file was not found. Path\=C:/Temp/File.txt ErrorNumber=12312
Using the regular expression
The solution presented here is feasible in any PERL-compatible regex engine, even though I will use Qt to demonstrate it's use. The regex looks rather distracting if you look at it for the first time:
^((\b[^\s=]+)=(([^=]|\\=)+))*$
The regex looks strange at first, but as soon as you put it into Expresso you can see what it means more clear:
The regex essentially contains two different capture groups, one being the key ((\b[^\s=]+)) and the other one being the value ((([^=]|\\=)+)). These two captures must be separated by a '='. A key can contain anything but a white-space or a '=' and a value can contain anything but an unescaped '='. Each sequence can occur with any number of repetitions.
Now that you know what the regex essentially does, you also need to be able to parse a string using the previously described regex. Something important also remains to be said: The proposed regex does only return the last Key-Value pair, therefore we need to process the input string multiple times.
QRegularExpression regex("^((\\b[^\\s=]+)=(([^=]|\\\\=)+))*$");
QString example = "ErrorMessage=File wasn't found_Path\=C:/Temp/File.txt ErrorNumber=12312";
while(example.length() > 0){
QRegularExpressionMatch keyValueRegexMatch = keyValueRegex.match(keyValueRawData);
qDebug()<<"Key="<<keyValueRegexMatch.captured(2);
qDebug()<<"Value="<<keyValueRegexMatch.captured(3);
example = example.replace(keyValueRegexMatch.captured(1), "");
}
The above solution isn't far from perfect, yet it needs some tweaking: If a captured group is not at the end of a string, it happens that the space between Value and the following key isn't removed.
Points of interest
It's fascinating how powerful regular expressions are. But this example has also showed me that not all regex engines are working the same way, and sometimes you need to tweak a regex to get it work on a specific engine, even though it has perfectly worked with another engine. I tested this regex with the Qt regex engine (QRegularExpression, to be exact - See here for a distinction to QRegExp) and the .Net regex engine, yet I'm confident that it will work well with most of the popular regex engines out there.