Introduction
This tip is an alternative to
Advanced String Split and Joiner.
Why the heck this alternative?
I attempt to provide a clearer problem description compared to the original post. As a consequence, I provide some alternatives to encode/decode or to serialize/deserialize
List<string>
to/from a string.
Example usage
List<string> source = ...
...
ICodecStringList codec = new ...
string encoded = codec.Encode(source);
...
List<string> target = codec.Decode(encoded);
...
With:
public interface ICodecStringList
{
string Encode(List<string> item);
List<string> Decode(string encoded);
}
I will provide two
codec
implementations for the sample code above.
The Problem Statement
- We need a class that encodes a
List<string>
into a string
. - That very class shall provide a decoding function to revert that encoded
string
into a List<string>
again.
Below I show two approaches:
- XML serialization
- hand crafted encode/decode
Finally, the summary shows that the XML serialization is far less effort than doing that with a hand crafted version.
XML Serializing
The most straight forward solution to write the
List<string>
into a string is using
XML serialization. The following
codec
(
coder/
decoder) implementation employs XML Serializing.
public class CodecXml: ICodecStringList
{
private XmlSerializer<List<string>> _xml = new XmlSerializer<List<string>>();
public string Encode(List<string> item) { return _xml.Serialize(item); }
public List<string> Decode(string encoded) { return _xml.Deserialize(encoded); }
}
public class XmlSerializer<T> where T: class
{
private static XmlSerializerNamespaces NoXmlNamespaces
{ get { var ns = new XmlSerializerNamespaces(); ns.Add(string.Empty, string.Empty); return ns; } }
private static XmlWriterSettings NoXmlEncoding
{ get { return new XmlWriterSettings() { OmitXmlDeclaration = true }; } }
public string Serialize(T item)
{
StringBuilder xml = new StringBuilder();
using (var writer = XmlWriter.Create(xml, NoXmlEncoding))
{
new XmlSerializer(typeof(T)).Serialize(writer, item, NoXmlNamespaces);
}
return xml.ToString();
}
public T Deserialize(string xml)
{
using (var reader = new StringReader(xml))
{
return new XmlSerializer(typeof(T)).Deserialize(reader) as T;
}
}
}
So much for the standard way of serializing/deserializing.
The next approach shows how one could encode by a hand crafted
codec
. Not sure why one would do so, though... ;-)
Hand crafted Codec
There may be situations where you do not like to have the standard XML serialization for encoding an object into one string.
If so, the following class may inspire you how this could be achieved.
The grammar
If you generate code to be parsed again, you must have an idea on the grammar that is encoded in the string. Here, I suggest a
CSV like grammar, where the
List<string>
represents one record with each element of the list is a field in that record.
The grammar:
If we assume BEGIN_CHAR =
"
, END_CHAR =
"
, SEP_CHAR =
;
(or
,
), we get CSV grammar for one record.
The codec
The following implementation provides first a generic encoder/decoder for the grammar above, plus the concrete CSV-like parametrization of the generic one.
public class StringContainerToStringCoDec
{
private string _begin, _end, _sep, _esc;
private Regex _tokenizer;
private static string GetSep(char begin, char end, string sep)
{
if (sep != null && sep.Length == 1)
foreach (char c in string.Format("{0},;\t", sep[0]))
if (c != begin && c != end) return new string(c, 1);
return sep ?? ";";
}
public StringContainerToStringCoDec(char fieldBegin, char fieldEnd, string fieldSep)
{
_begin = new string(fieldBegin, 1);
_end = new string(fieldEnd, 1);
_esc = _end + _end;
_sep = GetSep(fieldBegin, fieldEnd, fieldSep);
_tokenizer = new Regex(string.Format(@"{0}((?:{1}|[^{2}])*){2}(?:{3})?",
Regex.Escape(_begin), Regex.Escape(_esc), Regex.Escape(_end), Regex.Escape(_sep)),
RegexOptions.Compiled | RegexOptions.Singleline);
}
public string Encode(IEnumerable<string> items)
{ return string.Join(_sep, items.Select(
item => _begin + (item ?? string.Empty).Replace(_end, _esc) + _end));
}
public IEnumerable<string> Decode(string s)
{
return _tokenizer.Matches(s).Cast<Match>().Where(m => m.Groups[1].Success)
.Select(m => m.Groups[1].Value.Replace(_esc, _end));
}
}
public class CsvCoDec : ICodecStringList
{
public StringContainerToStringCoDec _codec
= new StringContainerToStringCoDec('"', '"', CultureInfo.CurrentCulture.TextInfo.ListSeparator);
public string Encode(List<string> item) { return _codec.Encode(item); }
public List<string> Decode(string encoded) { return _codec.Decode(encoded).ToList(); }
}
The code cannot distinguish between empty and null entries while decoding. If that is needed, a special field token must be invented. E.g. beside the normal fields (
"..."
) a special field (e.g.
#
).
A null-aware codec
The adjusted grammar:
The adjusted codec which is null-value ready is shown below (see highlighted sections):
public class StringContainerToStringCoDec
{
private string _begin, _end, _sep, _esc, _nil;
private Regex _tokenizer;
private static string GetSep(char begin, char end, char nil, string sep)
{
if (sep != null && sep.Length == 1)
foreach (char c in string.Format("{0},;\t|", sep[0]))
if (c != begin && c != end && c != nil) return new string(c, 1);
return sep ?? ";";
}
public StringContainerToStringCoDec(char fieldBegin, char fieldEnd, char fieldNil, string fieldSep)
{
_begin = new string(fieldBegin, 1);
_end = new string(fieldEnd, 1);
_esc = _end + _end;
_nil = new String(fieldNil, 1);
_sep = GetSep(fieldBegin, fieldEnd, fieldNil, fieldSep);
_tokenizer = new Regex(string.Format(@"(?:{0}((?:{1}|[^{2}])*){2}|({4}))(?:{3})?",
Regex.Escape(_begin), Regex.Escape(_esc), Regex.Escape(_end), Regex.Escape(_sep),
Regex.Escape(_nil)),
RegexOptions.Compiled | RegexOptions.Singleline);
}
public string Encode(IEnumerable<string> items)
{ return string.Join(_sep, items.Select(item => item == null ? _nil : _begin + item.Replace(_end, _esc) + _end)); }
public IEnumerable<string> Decode(string s)
{
return _tokenizer.Matches(s).Cast<Match>().Where(m => m.Groups[1].Success||m.Groups[2].Success)
.Select(m => m.Groups[2].Success ? null : m.Groups[1].Value.Replace(_esc, _end));
}
}
public class CsvCoDec : ICodecStringList
{
public StringContainerToStringCoDec _codec
= new StringContainerToStringCoDec('"', '"', '#', CultureInfo.CurrentCulture.TextInfo.ListSeparator);
public string Encode(List<string> item) { return _codec.Encode(item); }
public List<string> Decode(string encoded) { return _codec.Decode(encoded).ToList(); }
}
Summary
The original post on using "advanced" Join/Split to encode/decode List<string> into a string is in my eyes not a useful approach to solve the encoding/decoding problem. Either one uses established means (e.g. XML Serialization) or he defines the problem carefully enough to cover the encoding/decoding in a reasonable way: define the grammar for the encoding.
Once the grammar is defined, the encoding is usually easy (the only challenge is to handle null values and embedded end-of-character characters). The decoder is a bit of a challenge: you must tokenize the string and parse it. This is achieved either hand crafted again or be means of Regex (which may be for many admittedly a bit of a challenge on its own...;-)).
I would go for XML serialized data unless I had a real issue with that. The hand crafted veriant is simply to much of maintenance effort...
History