Introduction
A friend asked me if there is a scanf method in .NET? I said, "I don't
know." After looking around I didn't find such a method. So, I said, "I can't
find one, but you can write something using regular expressions."
Background
Understanding of regular expressions and Regex is essential. There are
many excellent sources on these topics. I will briefly introduce some
important concepts.
A regular expression is a pattern that matches various text strings.
\w matches any word character
\w+ matches zero or more occurrences of any word character
\w? matches zero or ONE occurrences of any word character
Here are some more complex patterns:
[0-9]+ matches any unsigned integer string
true|false matches the string "true" or the string "false"
[0-9]{1,3} matches any integer string containing 1 or up to 3 digits.
The scanf method that many have used since the C programming days is very
handy for parsing a string looking for specific values.
I often used the string version of scanf known as sscanf.
Here is some C++ code:
char buff[128];
sprintf(buff,"Hello there. 1 2 3.0");
printf("%s\n",buff);
int i, j;
float k;
char buff2[128];
sscanf(buff,"Hello %s %d %d %f",buff2,&i,&j,&k);
printf("%s %d %d %f\n",buff2,i,j,k);
The output is:
Hello there. 1 2 3.0
there. 1 2 3.000000
Look up the details on scanf for more information.
I have developed a class called Scanner with two Scan methods.
public object[] Scan(string text, string fieldSpecification)
public void Scan(string text, string fieldSpecification,
params object[] targets)
The respective call format is:
targets = Scan("Hello there. -1 2 3.0 ",
"Hello {String} {Int16} {UInt32} {Single}");
Scan("Hello there. -1 2 3.0","Hello {0} {1} {2} {3}",targets);
The first Scan method returns an array of objects that box or refer to the
results for each specified target.
The second method receives an array of objects that are used to define the
type for each numbered place holder and then that object array is updated
with the results for each target.
This is a simple implementation and does not handle such things as
embedded regular expression key words and such. I do not escape such
characters as . * ? ( { and such when those are part of the string and are
not regular expression directives.
Using the code
Here is an example Main to use the Scanner class:
static void Main(string[] args)
{
object[] targets = new object[4];
targets[0] = "";
targets[1] = new Int16();
targets[2] = new UInt32();
targets[3] = new Single();
Scanner scanner = new Scanner();
scanner.Scan("Hello there. -1 2 3.0",
"Hello {0} {1} {2} {3}",targets);
Console.WriteLine("Results:");
foreach(object o in targets)
{
Console.WriteLine(o.ToString());
}
targets = scanner.Scan("Hello there. -1 2 3.0 ",
"Hello {String} {Int16} {UInt32} {Single}");
Console.WriteLine("Results:");
foreach(object o in targets)
{
Console.WriteLine(o.ToString());
}
}
Points of Interest
Creating regular expressions to match the built in types are defined in a
hast table as follows:
typePatterns.Add("String",@"[\w\d\S]+");
typePatterns.Add("Int16", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt16", @"[0-9]+");
typePatterns.Add("Int32", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt32", @"[0-9]+");
typePatterns.Add("Int64", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt64", @"[0-9]+");
typePatterns.Add("Single", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Double", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Boolean", @"true|false");
typePatterns.Add("Byte", @"[0-9]{1,3}");
typePatterns.Add("SByte", @"-[0-9]{1,3}|[0-9]{1,3}");
typePatterns.Add("Char", @"[\w\S]{1}");
typePatterns.Add("Decimal", @"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
The program takes an input string and creates a regular expression that
will match each word of the input string and place the results in regular
expression groups so that the items can be selected and placed into the
targets.
If the input text is "Hello true 6.5" with the field specifications of
{String} {Boolean} {Double} then the program generates this regular
expression:
([\w\d\S]+)\s+(true|false)\s+([-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+)
This expression is ran against the input string and each group is placed
into each target.
I found that the returned objects from Regex.Matches
are a bit
confusing.
The number of Matches returned represent how many substrings match the
pattern. Since I generate a regular expression that matches the entire input
string then there should only be one match returned.
The Groups of each match correspond to more than just the groups defined
in the regular expression using the grouping parentheses.
The first Group or Group[0] is the fully captured match string. In this
example that would give you "Hello true 6.5". So you rarely use Group[0].
Group[0] = "Hello true 6.5"
Group[1] = "Hello"
Group[2] = "true"
Group[3] = "6.5"
Also, what are these Capture things in each group? It takes a little more
complex example to show this. I found a tutorial on the internet that used a
string abracadabra1abracadabra2abracadabr3 and it used the regular
expression:
(abra(cad)?)+
I found that the string itself confused my simple mind because of the
"abra"s that are surrounding the "cad"s.
So, I changed the input string to be:
abracadxxxx1abracadabra2abracadabra3
and the regular expression to be:
((abra|xxxx)(cad)?)+
Here is the output of each match, group, and capture (notice each Group1
to see multiple captures):
match0
Group0=[abracadxxxx]
Capture0=[abracadxxxx] Index=0 Length=11
Group1=[xxxx]
Capture0=[abracad] Index=0 Length=7
Capture1=[xxxx] Index=7 Length=4
Group2=[xxxx]
Capture0=[abra] Index=0 Length=4
Capture1=[xxxx] Index=7 Length=4
Group3=[cad]
Capture0=[cad] Index=4 Length=3
match1
Group0=[abracadabra]
Capture0=[abracadabra] Index=12 Length=11
Group1=[abra]
Capture0=[abracad] Index=12 Length=7
Capture1=[abra] Index=19 Length=4
Group2=[abra]
Capture0=[abra] Index=12 Length=4
Capture1=[abra] Index=19 Length=4
Group3=[cad]
Capture0=[cad] Index=16 Length=3
match2
Group0=[abracadabra]
Capture0=[abracadabra] Index=24 Length=11
Group1=[abra]
Capture0=[abracad] Index=24 Length=7
Capture1=[abra] Index=31 Length=4
Group2=[abra]
Capture0=[abra] Index=24 Length=4
Capture1=[abra] Index=31 Length=4
Group3=[cad]
Capture0=[cad] Index=28 Length=3
A capture collection is all of the captured strings matched in a Group. By
default they are in inner-most-leftmost-first order.
Maybe it would help to understand this concept if you understand that
groups in regular expressions can be capturing groups or non-capturing
groups. To specify a non-capturing group you do this:
(?:aaa) matches aaa but doesn't return it or in other words it doesn't
capture it.
The above is based on this tutorial and I think it explains things better
than I can:
Here is the source code for the Scanner class.
using System;
using System.Collections;
using System.Text.RegularExpressions;
using System.Runtime.Serialization;
namespace Scanning
{
public class Scanner
{
protected readonly Hashtable typePatterns;
public Scanner()
{
typePatterns = new Hashtable();
typePatterns.Add("String",@"[\w\d\S]+");
typePatterns.Add("Int16", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt16", @"[0-9]+");
typePatterns.Add("Int32", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt32", @"[0-9]+");
typePatterns.Add("Int64", @"-[0-9]+|[0-9]+");
typePatterns.Add("UInt64", @"[0-9]+");
typePatterns.Add("Single",
@"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Double",
@"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
typePatterns.Add("Boolean", @"true|false");
typePatterns.Add("Byte", @"[0-9]{1,3}");
typePatterns.Add("SByte", @"-[0-9]{1,3}|[0-9]{1,3}");
typePatterns.Add("Char", @"[\w\S]{1}");
typePatterns.Add("Decimal",
@"[-]|[.]|[-.]|[0-9][0-9]*[.]*[0-9]+");
}
public object[] Scan(string text, string fieldSpecification)
{
object[] targets = null;
try
{
ArrayList targetMatchGroups = new ArrayList();
ArrayList targetTypes = new ArrayList();
string matchingPattern = "";
Regex reggie = null;
MatchCollection matches = null;
string masterPattern = fieldSpecification.Trim();
matchingPattern = @"(\S+)";
masterPattern = Regex.Replace(masterPattern,
matchingPattern,"($1)");
matchingPattern = @"(\([\w\d\S]+\))";
reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);
for(int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
string sVal = m.Groups[1].Captures[0].Value;
if(sVal.IndexOf('{') >= 0)
{
targetMatchGroups.Add(i);
string p = @"\(\{(\w*)\}\)";
sVal = Regex.Replace(sVal,p,"$1");
targetTypes.Add(sVal);
}
}
masterPattern = Regex.Replace(masterPattern,@"\{String\}",
(String)typePatterns["String"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int16\}",
(String)typePatterns["Int16"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt16\}",
(String)typePatterns["UInt16"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int32\}",
(String)typePatterns["Int32"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt32\}",
(String)typePatterns["UInt32"]);
masterPattern = Regex.Replace(masterPattern,@"\{Int64\}",
(String)typePatterns["Int64"]);
masterPattern = Regex.Replace(masterPattern,@"\{UInt64\}",
(String)typePatterns["UInt64"]);
masterPattern = Regex.Replace(masterPattern,@"\{Single\}",
(String)typePatterns["Single"]);
masterPattern = Regex.Replace(masterPattern,@"\{Double\}",
(String)typePatterns["Double"]);
masterPattern = Regex.Replace(masterPattern,@"\{Boolean\}",
(String)typePatterns["Boolean"]);
masterPattern = Regex.Replace(masterPattern,@"\{Byte\}",
(String)typePatterns["Byte"]);
masterPattern = Regex.Replace(masterPattern,@"\{SByte\}",
(String)typePatterns["SByte"]);
masterPattern = Regex.Replace(masterPattern,@"\{Char\}",
(String)typePatterns["Char"]);
masterPattern = Regex.Replace(masterPattern,@"\{Decimal\}",
(String)typePatterns["Decimal"]);
masterPattern = Regex.Replace(masterPattern,@"\s+","\\s+");
reggie = new Regex(masterPattern);
matches = reggie.Matches(text);
targets = new object[targetMatchGroups.Count];
for(int x = 0; x < targetMatchGroups.Count; x++)
{
int i = (int)targetMatchGroups[x];
string tName = (string)targetTypes[x];
if(i < matches[0].Groups.Count)
{
string sValue = matches[0].Groups[i+1].Captures[0].Value;
targets[x] = ReturnValue(tName,sValue);
}
}
}
catch(Exception ex)
{
throw new ScanExeption("Scan exception",ex);
}
return targets;
}
public void Scan(string text, string fieldSpecification,
params object[] targets)
{
try
{
ArrayList targetMatchGroups = new ArrayList();
string matchingPattern = "";
Regex reggie = null;
MatchCollection matches = null;
string masterPattern = fieldSpecification.Trim();
matchingPattern = @"(\S+)";
masterPattern = Regex.Replace(masterPattern,matchingPattern,
"($1)");
matchingPattern = @"(\([\w\d\S]+\))";
reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);
for(int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
string sVal = m.Groups[1].Captures[0].Value;
if(sVal.IndexOf('{') >= 0)
{
targetMatchGroups.Add(i);
}
}
matchingPattern = @"(\{\S+\})";
reggie = new Regex(matchingPattern);
matches = reggie.Matches(masterPattern);
for(int i = 0; i < targets.Length && i < matches.Count; i++)
{
string groupID = String.Format("${0}",(i+1));
string innerPattern = "";
Type t = targets[i].GetType();
innerPattern = ReturnPattern(t.Name);
string groupPattern = "\\{" + i + "\\}";
masterPattern = Regex.Replace(masterPattern,
groupPattern,innerPattern);
}
masterPattern = Regex.Replace(masterPattern,@"\s+","\\s+");
reggie = new Regex(masterPattern);
matches = reggie.Matches(text);
for(int x = 0; x < targetMatchGroups.Count; x++)
{
int i = (int)targetMatchGroups[x];
if(i < matches[0].Groups.Count)
{
string sValue = matches[0].Groups[i+1].Captures[0].Value;
Type t = targets[x].GetType();
targets[x] = ReturnValue(t.Name,sValue);
}
}
}
catch(Exception ex)
{
throw new ScanExeption("Scan exception",ex);
}
}
private object ReturnValue(string typeName, string sValue)
{
object o = null;
switch(typeName)
{
case "String":
o = sValue;
break;
case "Int16":
o = Int16.Parse(sValue);
break;
case "UInt16":
o = UInt16.Parse(sValue);
break;
case "Int32":
o = Int32.Parse(sValue);
break;
case "UInt32":
o = UInt32.Parse(sValue);
break;
case "Int64":
o = Int64.Parse(sValue);
break;
case "UInt64":
o = UInt64.Parse(sValue);
break;
case "Single":
o = Single.Parse(sValue);
break;
case "Double":
o = Double.Parse(sValue);
break;
case "Boolean":
o = Boolean.Parse(sValue);
break;
case "Byte":
o = Byte.Parse(sValue);
break;
case "SByte":
o = SByte.Parse(sValue);
break;
case "Char":
o = Char.Parse(sValue);
break;
case "Decimal":
o = Decimal.Parse(sValue);
break;
}
return o;
}
private string ReturnPattern(string typeName)
{
string innerPattern = "";
switch(typeName)
{
case "Int16":
innerPattern = (String)typePatterns["Int16"];
break;
case "UInt16":
innerPattern = (String)typePatterns["UInt16"];
break;
case "Int32":
innerPattern = (String)typePatterns["Int32"];
break;
case "UInt32":
innerPattern = (String)typePatterns["UInt32"];
break;
case "Int64":
innerPattern = (String)typePatterns["Int64"];
break;
case "UInt64":
innerPattern = (String)typePatterns["UInt64"];
break;
case "Single":
innerPattern = (String)typePatterns["Single"];
break;
case "Double":
innerPattern = (String)typePatterns["Double"];
break;
case "Boolean":
innerPattern = (String)typePatterns["Boolean"];
break;
case "Byte":
innerPattern = (String)typePatterns["Byte"];
break;
case "SByte":
innerPattern = (String)typePatterns["SByte"];
break;
case "Char":
innerPattern = (String)typePatterns["Char"];
break;
case "Decimal":
innerPattern = (String)typePatterns["Decimal"];
break;
case "String":
innerPattern = (String)typePatterns["String"];
break;
}
return innerPattern;
}
static void PrintMatches(MatchCollection matches)
{
Console.WriteLine("===---===---===---===");
int matchCount = 0;
Console.WriteLine("Match Count = " + matches.Count);
foreach(Match m in matches)
{
if(m == Match.Empty) Console.WriteLine("Empty match");
Console.WriteLine("Match"+ (++matchCount));
for (int i = 0; i < m.Groups.Count; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group"+i+"='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.Write("Capture"+j+"='" + c + "',
Position="+c.Index + " <");
for(int k = 0; k < c.ToString().Length; k++)
{
Console.Write(((Int32)(c.ToString()[k])));
}
Console.WriteLine(">");
}
}
}
}
}
class ScanExeption : Exception
{
public ScanExeption() : base()
{
}
public ScanExeption(string message) : base(message)
{
}
public ScanExeption(string message, Exception inner) : base(message, inner)
{
}
public ScanExeption(SerializationInfo info,
StreamingContext context) : base(info, context)
{
}
}
}
This is a simple scanf type class for C#. It does not handle all types of
input strings, but it is a good start.