Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Scanf in C# using Regex

0.00/5 (No votes)
28 Sep 2004 2  
An article on scanf functionality for C# implemented using Regex

Introduction

A friend asked me if there is a scanf method in .NET? I said, "I don't know." After looking around I didn't find such a method. So, I said, "I can't find one, but you can write something using regular expressions."

Background

Understanding of regular expressions and Regex is essential. There are many excellent sources on these topics. I will briefly introduce some important concepts.

A regular expression is a pattern that matches various text strings.

\w matches any word character

\w+ matches zero or more occurrences of any word character

\w? matches zero or ONE occurrences of any word character

Here are some more complex patterns:

[0-9]+ matches any unsigned integer string

true|false matches the string "true" or the string "false"

[0-9]{1,3} matches any integer string containing 1 or up to 3 digits.

The scanf method that many have used since the C programming days is very handy for parsing a string looking for specific values.

I often used the string version of scanf known as sscanf.

Here is some C++ code:

char buff[128];

sprintf(buff,"Hello there. 1 2 3.0"); //load the buff


printf("%s\n",buff);

int i, j;

float k;

char buff2[128];

sscanf(buff,"Hello %s %d %d %f",buff2,&i,&j,&k); 

printf("%s %d %d %f\n",buff2,i,j,k);

The output is:

Hello there. 1 2 3.0

there. 1 2 3.000000

Look up the details on scanf for more information.

I have developed a class called Scanner with two Scan methods.

public object[] Scan(string text, string fieldSpecification)

public void Scan(string text, string fieldSpecification,
 params object[] targets)

The respective call format is:

targets = Scan("Hello there. -1 2 3.0 ",
 "Hello {String} {Int16} {UInt32} {Single}");
Scan("Hello there. -1 2 3.0","Hello {0} {1} {2} {3}",targets);

The first Scan method returns an array of objects that box or refer to the results for each specified target.

The second method receives an array of objects that are used to define the type for each numbered place holder and then that object array is updated with the results for each target.

This is a simple implementation and does not handle such things as embedded regular expression key words and such. I do not escape such characters as . * ? ( { and such when those are part of the string and are not regular expression directives.

Using the code

Here is an example Main to use the Scanner class:

static void Main(string[] args)
{
object[] targets = new object[4];
targets[0] = "";
targets[1] = new Int16();
targets[2] = new UInt32();
targets[3] = new Single();

Scanner scanner = new Scanner();
scanner.Scan("Hello there. -1 2 3.0",
 "Hello {0} {1} {2} {3}",targets);

Console.WriteLine("Results:");
foreach(object o in targets)
{
Console.WriteLine(o.ToString());
}

targets = scanner.Scan("Hello there. -1 2 3.0 ",
 "Hello {String} {Int16} {UInt32} {Single}");
Console.WriteLine("Results:");
foreach(object o in targets)
{
Console.WriteLine(o.ToString());
}

}

Points of Interest

Creating regular expressions to match the built in types are defined in a hast table as follows:

typePatterns.Add("String",@"[\w\d\S]+");

typePatterns.Add("Int16", @"-[0-9]+|[0-9]+");

typePatterns.Add("UInt16", @"[0-9]+");

typePatterns.Add("Int32", @"-[0-9]+|[0-9]+");

typePatterns.Add("UInt32", @"[0-9]+");

typePatterns.Add("Int64", @"-[0-9]+|[0-9]+");

typePatterns.Add("UInt64", @"[0-9]+");

typePatterns.Add("Single", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");

typePatterns.Add("Double", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");

typePatterns.Add("Boolean", @"true|false");

typePatterns.Add("Byte", @"[0-9]{1,3}");

typePatterns.Add("SByte", @"-[0-9]{1,3}|[0-9]{1,3}");

typePatterns.Add("Char", @"[\w\S]{1}");

typePatterns.Add("Decimal", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");

The program takes an input string and creates a regular expression that will match each word of the input string and place the results in regular expression groups so that the items can be selected and placed into the targets.

If the input text is "Hello true 6.5" with the field specifications of {String} {Boolean} {Double} then the program generates this regular expression:

([\w\d\S]+)\s+(true|false)\s+([-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+)

This expression is ran against the input string and each group is placed into each target.

I found that the returned objects from Regex.Matches are a bit confusing.

The number of Matches returned represent how many substrings match the pattern. Since I generate a regular expression that matches the entire input string then there should only be one match returned.

The Groups of each match correspond to more than just the groups defined in the regular expression using the grouping parentheses.

The first Group or Group[0] is the fully captured match string. In this example that would give you "Hello true 6.5". So you rarely use Group[0].

Group[0] = "Hello true 6.5"

Group[1] = "Hello"

Group[2] = "true"

Group[3] = "6.5"

Also, what are these Capture things in each group? It takes a little more complex example to show this. I found a tutorial on the internet that used a string abracadabra1abracadabra2abracadabr3 and it used the regular expression:

(abra(cad)?)+

I found that the string itself confused my simple mind because of the "abra"s that are surrounding the "cad"s.

So, I changed the input string to be:

abracadxxxx1abracadabra2abracadabra3

and the regular expression to be:

((abra|xxxx)(cad)?)+

Here is the output of each match, group, and capture (notice each Group1 to see multiple captures):

match0

Group0=[abracadxxxx]

Capture0=[abracadxxxx] Index=0 Length=11

Group1=[xxxx]

Capture0=[abracad] Index=0 Length=7

Capture1=[xxxx] Index=7 Length=4

Group2=[xxxx]

Capture0=[abra] Index=0 Length=4

Capture1=[xxxx] Index=7 Length=4

Group3=[cad]

Capture0=[cad] Index=4 Length=3

match1

Group0=[abracadabra]

Capture0=[abracadabra] Index=12 Length=11

Group1=[abra]

Capture0=[abracad] Index=12 Length=7

Capture1=[abra] Index=19 Length=4

Group2=[abra]

Capture0=[abra] Index=12 Length=4

Capture1=[abra] Index=19 Length=4

Group3=[cad]

Capture0=[cad] Index=16 Length=3

match2

Group0=[abracadabra]

Capture0=[abracadabra] Index=24 Length=11

Group1=[abra]

Capture0=[abracad] Index=24 Length=7

Capture1=[abra] Index=31 Length=4

Group2=[abra]

Capture0=[abra] Index=24 Length=4

Capture1=[abra] Index=31 Length=4

Group3=[cad]

Capture0=[cad] Index=28 Length=3

A capture collection is all of the captured strings matched in a Group. By default they are in inner-most-leftmost-first order.

Maybe it would help to understand this concept if you understand that groups in regular expressions can be capturing groups or non-capturing groups. To specify a non-capturing group you do this:

(?:aaa) matches aaa but doesn't return it or in other words it doesn't capture it.

The above is based on this tutorial and I think it explains things better than I can:

Here is the source code for the Scanner class.

///

using System;
using System.Collections;
using System.Text.RegularExpressions;
using System.Runtime.Serialization;

namespace Scanning
{
/// <summary>

/// Summary description for Scanner.

/// </summary>

public class Scanner
{
protected readonly Hashtable typePatterns;
public Scanner()
{
typePatterns = new Hashtable();

typePatterns.Add("String",@"[\w\d\S]+");
typePatterns.Add("Int16",  @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt16",  @"[0-9]+");
typePatterns.Add("Int32",  @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt32",  @"[0-9]+");
typePatterns.Add("Int64",   @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt64",   @"[0-9]+");
typePatterns.Add("Single",  
 @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Double",  
 @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Boolean",   @"true|false");
typePatterns.Add("Byte",  @"[0-9]{1,3}");
typePatterns.Add("SByte",  @"-[0-9]{1,3}|[0-9]{1,3}");
typePatterns.Add("Char",  @"[\w\S]{1}");
typePatterns.Add("Decimal",
 @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
}

/// <summary>

/// Scan memics scanf.

/// A master regular expression pattern is created that 

/// will group each "word" in the text and using regex grouping

/// extract the values for the field specifications.

/// Example text: "Hello true 6.5"  fieldSpecification:

/// "{String} {Boolean} {Double}"

/// The fieldSpecification will result in the generation 

/// of a master Pattern:

/// ([\w\d\S]+)\s+(true|false)\s+([-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+)

/// This masterPattern is ran against the text string 

/// and the groups are extracted.

/// </summary>

/// <param name="text"></param>

/// <param name="fieldSpecification">A string that may contain

/// simple field specifications of the form {Int16}, {String}, etc</param>

/// <returns>object[] that contains values for each field</returns>

public object[] Scan(string text, string fieldSpecification)
{
object[] targets = null;
try
{
ArrayList targetMatchGroups = new ArrayList();
ArrayList targetTypes = new ArrayList();

string matchingPattern = "";
Regex reggie = null;
MatchCollection matches = null;

//masterPattern is going to hold a "big" regex pattern

/// that will be ran against the original text

string masterPattern = fieldSpecification.Trim();
matchingPattern =  @"(\S+)";
masterPattern = Regex.Replace(masterPattern,
 matchingPattern,"($1)");//insert grouping parens


//store the group location of the format tags so that 

//we can select the correct group values later.

matchingPattern = @"(\([\w\d\S]+\))";
reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);
for(int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
string sVal = m.Groups[1].Captures[0].Value;

//is this value a {n} value. We will determine this by checking for {

if(sVal.IndexOf('{') >= 0)
{
targetMatchGroups.Add(i);
string p = @"\(\{(\w*)\}\)";//pull out the type

sVal = Regex.Replace(sVal,p,"$1");
targetTypes.Add(sVal);
}
}

//Replace all of the types with the pattern 

//that matches that type

masterPattern = Regex.Replace(masterPattern,@"\{String\}",  
(String)typePatterns["String"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int16\}",  
(String)typePatterns["Int16"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt16\}",  
(String)typePatterns["UInt16"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int32\}",  
(String)typePatterns["Int32"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt32\}",  
(String)typePatterns["UInt32"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int64\}",  
(String)typePatterns["Int64"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt64\}",   
(String)typePatterns["UInt64"]);
masterPattern = Regex.Replace(masterPattern,@"\{Single\}",  
 (String)typePatterns["Single"]);
masterPattern = Regex.Replace(masterPattern,@"\{Double\}",  
 (String)typePatterns["Double"]);
masterPattern = Regex.Replace(masterPattern,@"\{Boolean\}",  
 (String)typePatterns["Boolean"]);
masterPattern = Regex.Replace(masterPattern,@"\{Byte\}",  
(String)typePatterns["Byte"]);
masterPattern = Regex.Replace(masterPattern,@"\{SByte\}",  
(String)typePatterns["SByte"]);
masterPattern = Regex.Replace(masterPattern,@"\{Char\}",  
(String)typePatterns["Char"]);
masterPattern = Regex.Replace(masterPattern,@"\{Decimal\}", 
(String)typePatterns["Decimal"]);

masterPattern = Regex.Replace(masterPattern,@"\s+","\\s+");
 //replace the white space with the pattern for white space


//run our generated pattern against the original text.

reggie = new Regex(masterPattern);
matches = reggie.Matches(text);
//PrintMatches(matches);


//allocate the targets

targets = new object[targetMatchGroups.Count];
for(int x = 0; x < targetMatchGroups.Count; x++)
{
int i = (int)targetMatchGroups[x];
string tName = (string)targetTypes[x];
if(i < matches[0].Groups.Count)
{
//add 1 to i because i is a result of serveral matches 

// each resulting in one group.

//this query is one match resulting in serveral groups.

string sValue = matches[0].Groups[i+1].Captures[0].Value;
targets[x] = ReturnValue(tName,sValue);
}
}
}
catch(Exception ex)
{
throw new ScanExeption("Scan exception",ex);
}

return targets;
}//Scan


/// Scan memics scanf.

/// A master regular expression pattern is created that will group

/// each "word" in the text and using regex grouping

/// extract the values for the field specifications.

/// Example text: "Hello true 6.5"  fieldSpecification: "{0} {1} {2}"

/// and the target array has objects of these types: 

/// "String, ,Boolean, Double"

/// The targets are scanned and each target type is extracted

/// in order to build a master pattern based on these types

/// The fieldSpecification and target types will result

///  in the generation of a master Pattern:

/// ([\w\d\S]+)\s+(true|false)\s+([-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+)

/// This masterPattern is ran against the text string and the

/// groups are extracted and placed back into the targets

/// <param name="text"></param>

/// <param name="fieldSpecification"></param>

/// <param name="targets"></param>

public void Scan(string text, string fieldSpecification, 
   params object[] targets)
{
try
{
ArrayList targetMatchGroups = new ArrayList();

string matchingPattern = "";
Regex reggie = null;
MatchCollection matches = null;

//masterPattern is going to hold a "big" regex 

//pattern that will be ran against the original text

string masterPattern = fieldSpecification.Trim();
matchingPattern =  @"(\S+)";
masterPattern = Regex.Replace(masterPattern,matchingPattern,
  "($1)");//insert grouping parens


//store the group location of the format tags so that we can 

//select the correct group values later.

matchingPattern = @"(\([\w\d\S]+\))";
reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);
for(int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
string sVal = m.Groups[1].Captures[0].Value;

//is this value a {n} value. We will determine this by checking for {

if(sVal.IndexOf('{') >= 0)
{
targetMatchGroups.Add(i);
}
}

matchingPattern = @"(\{\S+\})";//match each paramter 

//tag of the format {n} where n is a digit

reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);

for(int i = 0; i < targets.Length && i < matches.Count; i++)
{
string groupID = String.Format("${0}",(i+1));
string innerPattern = "";

Type t = targets[i].GetType();
innerPattern = ReturnPattern(t.Name);

//replace the {n} with the type's pattern

string groupPattern = "\\{" + i + "\\}";
masterPattern = Regex.Replace(masterPattern,
groupPattern,innerPattern);
}

masterPattern = Regex.Replace(masterPattern,@"\s+","\\s+");
//replace white space with the whitespace pattern


//run our generated pattern against the original text.

reggie = new Regex(masterPattern);
matches = reggie.Matches(text);
for(int x = 0; x < targetMatchGroups.Count; x++)
{
int i = (int)targetMatchGroups[x];
if(i < matches[0].Groups.Count)
{
//add 1 to i because i is a result of serveral matches 

//each resulting in one group.

//this query is one match resulting in serveral groups.

string sValue = matches[0].Groups[i+1].Captures[0].Value;
Type t = targets[x].GetType();
targets[x] = ReturnValue(t.Name,sValue);
}
}
}
catch(Exception ex)
{
throw new ScanExeption("Scan exception",ex);
}
}//Scan


/// <summary>

/// Return the Value inside of an object that boxes the 

/// built in type or references the string

/// </summary>

/// <param name="typeName"></param>

/// <param name="sValue"></param>

/// <returns></returns>

private object ReturnValue(string typeName, string sValue)
{
object o = null;
switch(typeName)
{
case "String":
o = sValue;
break;

case "Int16":
o = Int16.Parse(sValue);
break;

case "UInt16":
o = UInt16.Parse(sValue);
break;

case "Int32":
o = Int32.Parse(sValue);
break;

case "UInt32":
o = UInt32.Parse(sValue);
break;

case "Int64":
o = Int64.Parse(sValue);
break;

case "UInt64":
o = UInt64.Parse(sValue);
break;

case "Single":
o = Single.Parse(sValue);
break;

case "Double":
o = Double.Parse(sValue);
break;

case "Boolean":
o = Boolean.Parse(sValue);
break;

case "Byte":
o = Byte.Parse(sValue);
break;

case "SByte":
o = SByte.Parse(sValue);
break;

case "Char":
o = Char.Parse(sValue);
break;

case "Decimal":
o = Decimal.Parse(sValue);
break;
}
return o;
}//ReturnValue


/// <summary>

/// Return a pattern for regular expressions that will 

/// match the built in type specified by name

/// </summary>

/// <param name="typeName"></param>

/// <returns></returns>

private string ReturnPattern(string typeName)
{
string innerPattern = "";
switch(typeName)
{
case "Int16":
innerPattern = (String)typePatterns["Int16"];
break;

case "UInt16":
innerPattern = (String)typePatterns["UInt16"];
break;

case "Int32":
innerPattern = (String)typePatterns["Int32"];
break;

case "UInt32":
innerPattern = (String)typePatterns["UInt32"];
break;

case "Int64":
innerPattern = (String)typePatterns["Int64"];
break;

case "UInt64":
innerPattern = (String)typePatterns["UInt64"];
break;

case "Single":
innerPattern = (String)typePatterns["Single"];
break;

case "Double":
innerPattern = (String)typePatterns["Double"];
break;

case "Boolean":
innerPattern = (String)typePatterns["Boolean"];
break;

case "Byte":
innerPattern = (String)typePatterns["Byte"];
break;

case "SByte":
innerPattern = (String)typePatterns["SByte"];
break;

case "Char":
innerPattern = (String)typePatterns["Char"];
break;

case "Decimal":
innerPattern = (String)typePatterns["Decimal"];
break;

case "String":
innerPattern = (String)typePatterns["String"];
break;
}
return innerPattern;
}//ReturnPattern


static void PrintMatches(MatchCollection matches)
{
Console.WriteLine("===---===---===---===");
int matchCount = 0;
Console.WriteLine("Match Count = " + matches.Count);
foreach(Match m in matches)
{
if(m == Match.Empty) Console.WriteLine("Empty match");
Console.WriteLine("Match"+ (++matchCount));
for (int i = 0; i < m.Groups.Count; i++) 
{
Group g = m.Groups[i];
Console.WriteLine("Group"+i+"='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++) 
{
Capture c = cc[j];
System.Console.Write("Capture"+j+"='" + c + "', 
  Position="+c.Index + "   <");
for(int k = 0; k < c.ToString().Length; k++)
{
Console.Write(((Int32)(c.ToString()[k])));
}
Console.WriteLine(">");
}
}
}
}
}

/// <summary>

/// Exceptions that are thrown by this 

/// namespace and the Scanner Class

/// </summary>

class ScanExeption : Exception
{
public ScanExeption() : base()
{
}

public ScanExeption(string message) : base(message)
{
}

public ScanExeption(string message, Exception inner) : base(message, inner)
{
}

public ScanExeption(SerializationInfo info, 
  StreamingContext context) : base(info, context)
{
}
}
}

This is a simple scanf type class for C#. It does not handle all types of input strings, but it is a good start.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here