Introduction
This article describes how new version of string.Format()
method could be implemented with new more readable syntax of format string.
Background
Personally I like string.Format
(or StringBuilder.AppendFormat
) very much. I use it frequently and think that it is great if there are not too many arguments in your format string. But if it is not the case things look not so bright.
Lets consider the following code generating some SQL query:
var sql = string.Format("SELECT {0} FROM [{1}].[{2}].[{3}] INNER JOIN [{1}].[{2}].[{4}]{5}{6}",
GetColumns(),
GetDatabaseName(),
GetSchemaName(),
GetFirstTable(),
GetSecondTable(),
GetWhereClause(),
GetGroupByClause());
For me it looks little messy. It can take some time to understand what corresponds to e.g. the argument #5. But worse thing happens if I need to change the query and add some new argument in the beginning of string. E.g. I'd like to add "TOP" expression. I can do it like this:
var sql = string.Format("SELECT {7}{0} FROM [{1}].[{2}].[{3}] INNER JOIN [{1}].[{2}].[{4}]{5}{6}",
GetColumns(),
GetDatabaseName(),
GetSchemaName(),
GetFirstTable(),
GetSecondTable(),
GetWhereClause(),
GetGroupByClause(),
GetTopClause());
Now I have argument #7 before argument #0 in my format string. It looks ugly for me. Another approach is to enumerate all arguments, but it is very error prone.
What I want to have is something like this:
var sql = StringEx.Format("SELECT {TopClause}{Columns} FROM [" +
"{Database}].[{Schema}].[{Table1}] INNER JOIN [{Database}]." +
"[{Schema}].[{Table2}]{WhereClause}{GroupByClause}",
new {
TopClause = GetTopClause(),
Columns = GetColumns(),
Database = GetDatabaseName(),
Schema = GetSchemaName(),
Table1 = GetFirstTable(),
Table2 = GetSecondTable(),
WhereClause = GetWhereClause(),
GroupByClause = GetGroupByClause()
});
Let's see how we can do it.
Using the code
The basic idea behind the code is simple. I change format string in new format into format string in old format. E.g. something like "{Value} and {Score} or {Value}"
I replace with "{0} and {1} or {0}"
. While doing it one should remember 2 things:
- I should not process format items with double curly brackets:
"{{Value}}"
- I should preserve formatting components. It means that strings like
"{Value,5:D3}"
should be converted into "{0,5:D3}"
Here is the method converting new format into old format:
public ConvertedFormat Convert(string format)
{
var placeholders = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
var regex = new Regex("{[^{}]+}");
StringBuilder formatBuilder = new StringBuilder(format);
foreach (var match in regex.Matches(format).OfType<Match>().OrderByDescending(m => m.Index))
{
if (!ShouldBeReplaced(formatBuilder, match))
{ continue; }
var memberInfo = GetMemberInfo(match);
if (!placeholders.ContainsKey(memberInfo.MemberName))
{
placeholders[memberInfo.MemberName] = placeholders.Count;
}
var memberIndex = placeholders[memberInfo.MemberName];
formatBuilder.Replace(match.Value, string.Format("{{{0}{1}}}",
memberIndex, memberInfo.Formatting), match.Index, match.Length);
}
var convertedFormat = new ConvertedFormat(formatBuilder.ToString(),
placeholders.OrderBy(p => p.Value).Select(p => p.Key).ToArray());
return convertedFormat;
}
First of all I find all possible candidates for replacement using regular expression "{[^{}]+}"
(it means "something in curly brackets"). I replace them in the initial format string with new format items. To keep correct positions of unprocessed candidates I use OrderByDescending
to replace candidates from the end to the beginning. Then method ShouldBeReplaced
checks if this is a valid candidate for replacement ("{Value}"
not "{{Value}}"
). Then method GetMemberInfo
extracts from format item with all components ("{Value,5:D3}"
) name component ("Value"
) and other components (",5:D3"
). After this I check if I already have format item with this name. For this purpose I use dictionary placeholders
where for each name component of format items I store position in the array of argument I'll send to string.Format
later. And final step is the replacement of candidate itself.
In the end I have format string in old format and array of names of members of my data object. It is very easy to extract values of these members using Reflection.
One more point of interest is how I determine if a candidate is valid for replacement. For example in the following texts "{Value}"
must be replaced: "{Value}"
, "{{{Value}}}"
, "{{{{{Value}}}"
. And in the following must not: "{{Value}}"
, "{{{{Value}}"
. Here is the code solving this problem:
private static bool ShouldBeReplaced(StringBuilder formatBuilder, Match match)
{
var bracketsBefore = 0;
var index = match.Index - 1;
while (index >= 0 && formatBuilder[index] == '{')
{
bracketsBefore++;
index--;
}
return ((bracketsBefore % 2) == 0);
}
I just count number of curly brackets before format item.
Points of Interest
Although my code only creates new Format
method which can be used instead of string.Format
, you can easily write same method for StringBuilder
class. You may create it in form of extension method to be more convenient.
History