Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / QT

Parsing a Key-Value pair with a Regular Expression

5.00/5 (4 votes)
6 Sep 2014CPOL2 min read 54.9K  
A walk-through on how a Key-Value pair can be parsed using a PERL-compatible regex engine
This Tip demonstrates a solution using Qt. The regular expression itself can be used with any PERL-compatible regex engine. You can find the discussion which lead me to write this Tip at the RegEx Forum.

Introduction

Parsing a Key-Value pair ain't that hard, you say? I'm sure it isn't if you keep it as simple as possible and define that a Key can't contain anything else than characters and numbers, a value can't contain anything else than characters and numbers and they are separated by a '=':

Key1=Value1 Key2=Value2

But what if we advance that a bit? I want a Value to contain white-spaces, optional ones:

Key1=Value 1 Key2=Value Key3=Value 3

We get somewhere, as you can see. But I want more. I want the value to possibly contain everything. Of course this leads us to a problem, because the '=' is already reserved as separator between Key and Value - It's solvable by escaping the '=' if it occurs in a value. I made up this practical example where this pattern might be of use:

ErrorMessage=The file was not found. Path\=C:/Temp/File.txt ErrorNumber=12312

Using the regular expression

The solution presented here is feasible in any PERL-compatible regex engine, even though I will use Qt to demonstrate it's use. The regex looks rather distracting if you look at it for the first time:

^((\b[^\s=]+)=(([^=]|\\=)+))*$

The regex looks strange at first, but as soon as you put it into Expresso you can see what it means more clear:

Image 1

The regex essentially contains two different capture groups, one being the key ((\b[^\s=]+)) and the other one being the value ((([^=]|\\=)+)). These two captures must be separated by a '='. A key can contain anything but a white-space or a '=' and a value can contain anything but an unescaped '='. Each sequence can occur with any number of repetitions.

Now that you know what the regex essentially does, you also need to be able to parse a string using the previously described regex. Something important also remains to be said: The proposed regex does only return the last Key-Value pair, therefore we need to process the input string multiple times.

//Regular expression as descripted at 
// http://www.codeproject.com/Tips752372/Parsing-a-Key-Value-pair-with-a-Regular-Expression
QRegularExpression regex("^((\\b[^\\s=]+)=(([^=]|\\\\=)+))*$");
QString example = "ErrorMessage=File wasn't found_Path\=C:/Temp/File.txt ErrorNumber=12312";
while(example.length() > 0){//As long there is stuff in example
   //Get the last Key-Value pair from the RegEx
   QRegularExpressionMatch keyValueRegexMatch = keyValueRegex.match(keyValueRawData); 

   //Output the found Key-Value Pair
   qDebug()<<"Key="<<keyValueRegexMatch.captured(2);
   qDebug()<<"Value="<<keyValueRegexMatch.captured(3);

   //Remove the replaced Key-Value pair from the input
   example = example.replace(keyValueRegexMatch.captured(1), "");
}

The above solution isn't far from perfect, yet it needs some tweaking: If a captured group is not at the end of a string, it happens that the space between Value and the following key isn't removed.

Points of interest

It's fascinating how powerful regular expressions are. But this example has also showed me that not all regex engines are working the same way, and sometimes you need to tweak a regex to get it work on a specific engine, even though it has perfectly worked with another engine. I tested this regex with the Qt regex engine (QRegularExpression, to be exact - See here for a distinction to QRegExp) and the .Net regex engine, yet I'm confident that it will work well with most of the popular regex engines out there.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)