Introduction
Many if not all enterprise project use serializers, and there are alot to choose from, but we tend to default to the built in kinds, then we tend to lean towards good old XML,
cause it's more readable and editable, and what is more simple than just throwing out object at a XmlSerializer and getting a quick result?
Serializers are also used inside framework objects, and in many 3rd parties, for configuration, state persistance, and sending data over communication channels.
But then you get to performance...
Background
I am a long time performance tweaker in the field of .net, and have recently tackled (yet again)
the problem of slow software startup times, in which XmlSerializer's temp assembly generation took a big part.
So I thought - why not put it on paper (so to speek) and have a resource I (and others) can go back to...
The Problem
XmlSerializer generates and compiles a temporart assembly for each assembly which has one or more types that need to be serialized.
This only happens once per target assembly, but that may amount to a pesky performance penalty on your application startup time (and other first time operation measurements).
Using another serializer
there are alot of good serializers out there, both built in and 3rd party
- Protobuf.net
- Fast JSON serializers (service stack)
- BinaryFormatter
Some nice benchmarks can be found here:
Cons:
- You might not be able to use them, since you don't want to break support from pervious versions, or you are forced to use this specific format by some other reason.
If this point is not an issue for you - you should replace XmlSerializer with one of the above and be done with it.
Sgen
sgen sounds like a good option since it allows you to keep the XmlSerializer, and have it's flexability, with no need to change code when the serialized objects change.
You just add a post build step, which creates the Serialization assemblies in compile time.
problems + solutions:
- You can't sgen an assembly (whole) unless you have all the assemblies it depends on (references)
- solution: use /t: and specify only the types you need to sgen
- solution: use sgen on a full bin dir after the build is complete.
- solution: attach a debugger (visual studio is fine) to the sgen console and listen for exceptions
- this can be done by:
- create a new commandline project,
- go to proje properties -> debug
- add the sgen command. (with fullpaths)
- allow all exceptions (CLR + cpp)
- run (F5)
- You can't effectively sgen types which are standard .NET collections since to sgen List<string> you would need to sgen
mscorlib.dll which has a MS signature. - this can be handled by creating a custom collection class which
inherits
List<string>
- solution: create a custom collection (non generic)
- You have to sgen all the assemblies containing the types which are serialized
- solution: just do it with a post build step / jenkins commandline build step.
- If you are using strong names, you have to sign the XmlSerializers assembly with the same key as the original assembly. (this can be problematic with 3rd party assemblies unless they are OpenSource)
- solution:
- In vs2010 use a compiler flag: /c:/keyfile:"C:\somewhere\keyfile.snk"
- In vs2012 use built in cmdline flag: /keyfile:"C:\somewhere\keyfile.snk"
- XAML generated namespace problem (ildasm, rename, ilasm with key / use specific types)
- The type 'XamlGeneratedNamespace.GeneratedInternalTypeHelper' exists in both 'lib1.dll' and 'lib2.dll'
- Using ilasm ildasm or ilMerge won't help you (XmlSerializer performs an id check)
- solution: Choosing types with /t: will help when you avoid this problem.
- Generic types are not supported at all.
- solution: You will have to live without serializing Generic types if you want to use XmlSerializer and avoid temp
assembly generation.
- Private/internal setter problem
- solution: Use specific types /t:
- solution: Make them public if you can bring yourself to allow it
- solution: Use
internalsVisible
to, and add the XmlSerializer assembly as a
friend assembly - Works only for internals (privates are still private)
- use sn.exe to get the full public key (from your assembly):
- This is ok since the XmlSerializer assembly will be signed with the same key. see (strong names)
- Open visual studio command prompt
- Get public key from assembly: sn.exe -Tp <assembly>
- In the project's
assemblyInfo.cs add:
[assembly: InternalsVisibleTo("YourAssembly.XmlSerializers,
PublicKey=0024000004800000940000000602000000240000525341310004000001000100f1844bc8cbdc
3779b0e5970a30d800668414128135f5d6cd274e726f7c84f234234324234c64c11d0f6a9edbbe7b32
b6f19d8f734e1c130814d40df54ff9d063ce29bf7af86b46a69f0e2342343241b52a2ae443648e199a0
9547e74663cbe1e72e89365034ff53b6a3ce281415cbe7e2dfb5e40e54667f35dc04ca")]
Sgen in MSBuild
Adding sgen to a simple project can be as simple as writing a post build step and adding:
sgen /a:MyAssemblyName.dll /t:MyNamespace.MyType /c:/keyfile:"c:\directory\keyfile.snk" /f
But Adding sgen into a corporate build process can be tricky... Since most .net build environments eventually use MSBuild, there is a built in msbuild task for this you can add:
<Target Name="PostCompile">
<Sgen ShouldGenerateSerializer="true" UseProxyTypes="false" BuildAssemblyName="MYAssembly.dll" BuildAssemblyPath="..\bin\" Types="MynameSpace.MyTypeName" KeyFile="..\dir\keyfile.snk" />
</Target>
If more than one type exists:
<ItemGroup>
<SgenTypes Include="MyNamespace1.MyTypename1" />
<SgenTypes Include="MyNamespace2.MyTypename2" />
</ItemGroup>
<Target Name="PostCompile">
<Sgen ShouldGenerateSerializer="true" UseProxyTypes="false" BuildAssemblyName="MYAssembly.dll" BuildAssemblyPath="..\bin\" Types="@(SgenTypes)" KeyFile="..\dir\keyfile.snk" />
</Target>
Problem:
The default ToolsVersion for use in msbuild is almost always 2.0, and the Sgen task which comes with this toolsversion has no Types="" attribute. As we saw earlier, choosing types is the key to solving a host of problems, so we need to use a newer tools version...
- soluition: Changing the Version by adding ToolsVersion="4.0" to the project tag in the top of the file.
- problem: may cause trouble with pre-existing steps which are used to old tools and behaviors.
- solution: Change only the sgen target to use the new toolsversion
- To do this we need to take the pervious lines out to another project file
- The import command will not help us since it will ignore the toolsversion in the new file.
- Use the following command instead:
In the new file (sgen.proj)
<Project DefaultTargets="PostCompile" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" ToolsVersion="4.0">
<ItemGroup>
<SgenTypes Include="MyNamespace1.MyTypename1" />
<SgenTypes Include="MyNamespace2.MyTypename2" />
</ItemGroup>
<Target Name="PostCompile" >
<Sgen ShouldGenerateSerializer="true" UseProxyTypes="false" BuildAssemblyName="MYAssembly.dll" BuildAssemblyPath="..\bin\" Types="@(SgenTypes)" KeyFile="..\dir\keyfile.snk" />
</Target>
</Project>
In main msproject file:
<MSBuild Projects="sgen.proj" ToolsVersion="4.0" Targets="PostCompile" />
This will tell msbuild to run another instance, load the new toolsversion and call the target, this time with the 4.0 version of the tools, which include the Types attribute on the <sgen> tag.
Implementing IXmlSerializable
Although I heard using this will prevent
XmlSerializer
from generating the temp assembly, and it made perfect sense to me that a manual override will come before anything is generated for the type, I found out this is not the case, instead it creates the temp assembly and then runs the
ReadXml
/WriteXml
from the generated code.
Using this method, you will actually not gain performance nor would you use
XmlSerializer
at all for the given type...
Looking into the framework code with ilSpy, we can take the functions which call the IXmlSerializable implementation and use them to bypass & imitate the framework.
To gain performance call the IXmlSerializable implementation directly from your Infrastructure code for objects which implement it:
ReadSerializable:
protected IXmlSerializable ReadSerializable(IXmlSerializable serializable, bool wrappedAny)
{
string b = null;
string b2 = null;
if (wrappedAny)
{
b = this.r.LocalName;
b2 = this.r.NamespaceURI;
this.r.Read();
this.r.MoveToContent();
}
serializable.ReadXml(this.r);
if (wrappedAny)
{
while (this.r.NodeType == XmlNodeType.Whitespace)
{
this.r.Skip();
}
if (this.r.NodeType == XmlNodeType.None)
{
this.r.Skip();
}
if (this.r.NodeType == XmlNodeType.EndElement &&
this.r.LocalName == b && this.r.NamespaceURI == b2)
{
this.Reader.Read();
}
}
return serializable;
}
WriteSerializable:
protected void WriteSerializable(IXmlSerializable serializable, string name, string ns, bool isNullable, bool wrapped)
{
if (serializable == null)
{
if (isNullable)
{
this.WriteNullTagLiteral(name, ns);
}
return;
}
if (wrapped)
{
this.w.WriteStartElement(name, ns);
}
serializable.WriteXml(this.w);
if (wrapped)
{
this.w.WriteEndElement();
}
}
protected void WriteNullTagLiteral(string name, string ns)
{
if (name == null || name.Length == 0)
{
return;
}
this.WriteStartElement(name, ns, null, false);
this.w.WriteAttributeString("nil", "http://www.w3.org/2001/XMLSchema-instance", "true");
On closer inspection of ReadSerializable we can see that wrappedAny is usually false, as can be seen in the function that is actually called (ReadSerializable with one parameter):
return this.ReadSerializable(serializable, false);
so code is reduced to a simple call to the IXmlSerializable's ReadXml function.
WriteSeializable has a bit more code, after checking around, the default generation created in a generated assembly looks like this:
WriteSerializable((System.Xml.Serialization.IXmlSerializable)((global::MyNamespace.MyClass)a[ia]), @"GeneratedSomethingLikeMyClassname", @"", true, true);
- wrapped = true
- nullable = true
- ns = ""
- name = ???
My version of the WriteSerializable (below) includes these defaults for wrapped, nullable and namespace, and some detection work which is done for getting the name, including XmlRoot attribute and naming for lists and arrays, but you can change this as you see fit, adding support for XmlArray, XmlArrayItem attributes (or just give the name to the funtion and delete all of the name creation code):
public void WriteSerializable(IXmlSerializable serializable, XmlWriter writer, string name = null, string ns = "", bool isNullable = true, bool wrapped = true)
{
if (name == null)
{
name = serializable.GetType().Name;
Type t = serializable.GetType();
if (typeof(IList).IsAssignableFrom(t) && t.IsGenericType)
{
name = "ArrayOf" + t.GetGenericArguments()[0].Name;
}
else if (t == typeof(Array) || t == typeof(ArrayList))
{
name = "ArrayOfObject";
}
else
{
object[] attribs = serializable.GetType().GetCustomAttributes(typeof(XmlRootAttribute), false);
if (attribs.Length > 0)
{
XmlRootAttribute xmlRoot = attribs[0] as XmlRootAttribute;
name = xmlRoot.ElementName;
}
}
}
if (serializable == null)
{
if (isNullable)
{
if (name == null || name.Length == 0)
{
return;
}
writer.WriteStartElement(name, ns);
writer.WriteAttributeString("nil",
"http://www.w3.org/2001/XMLSchema-instance", "true");
writer.WriteEndElement();
}
return;
}
if (wrapped)
{
writer.WriteStartElement(name, ns);
}
serializable.WriteXml(writer);
if (wrapped)
{
writer.WriteEndElement();
}
}
Create Xml by hand
If all you need is a small amount of data you can always create/read the xml by using
XElement
/XmlWriter
, this has the benefit of
being fast, 100% backward compatible with previous formats, and totally customizable.
cons
- More code more bugs
- Has to be maintained when data fields are added
- Will not be
feasible if you (along with other consumers in your project) are using a common framework which uses
XmlSerializer
under the hood.
Points of Interest
Config XmlSerializer to show logs and leave temp assemblies + .cs files on disk
in case we want more data about it's actions XmlSerializer
can be
configured to leave configuration (add this to your app.config / web.config file under
the configuration tag)
<system.diagnostics>
<switches>
<add name="XmlSerialization.PregenEventLog" value="1" />
<add name="XmlSerialization.Compilation" value="1" />
</switches>
</system.diagnostics>
Using these flags will give you the ability to get the XmlSerializer
code and temporary assembly which will be in the temp folder or in the
configured target folder
(just type: %temp%)
XmlSerializer
will write it's log in the windows event log, so to view it all you need to do is:
- Start menu, run (WinKey+r):
- eventvwr
- Go to application log
To change the target folder of the temp assembly generation add this to your
configuration (app.config/web.config):
<system.xml.serialization>
<xmlSerializer tempFilesLocation="c:\\foo"/>
</system.xml.serialization>
Discovering if Xmlserializer.dll is being loaded
You can use fuslogvw to track the
DLL loading process:
- Open a visual studio command prompt
- Type: fuslogvw
- An application will launch
- Press settings
- Select Log All binds to disk (you can also use "log bind failures")
- Check enable custom log path
- Add your own path in the textbox (make sure it exists)
Other Tools for the task
there are several tools that try to make life easier when optimizing XML generation in C#, but all of them are quite old (2007-8) and look stale.
If you find anything better online feel free to tell me about it in the comments. Also, if you have other problems/solutions for sgen issues tell me about it and
I'll add them here.