Introduction
This article demonstrates a simple means of automating object schema migration using a serialization surrogate and reflection. Or, in other words, what to do when .NET says "Possible Version mismatch. Type [whatever] has x members, number of members deserialized is y."
Background
After years of writing custom ORM-type solutions, or contenting myself with using ADO for business entities, I finally hit my breaking point and decided it was time to enter the shadowy world of object persistence. And so began my first foray into .NET's Serialization namespace.
I was pleasantly surprised with the ease with which I could begin serializing my business objects directly to disk or any persistent store (documented plenty elsewhere). Everything was sunshine and rainbows. I had traded what probably would have been a few days of work setting up tables and mapping fields between tables and objects, with just a couple of hours of work implementing serialization, as follows:
[ MyClass.vb ]
<Serializable()> Public Class MyClass
. create some members
.
.
Public Sub Save()
Dim util As New CouldBeAnotherClass
Dim bytes As Byte() = util.SerializeObject(Me)
End Sub
End Class
[ CouldBeAnotherClass.vb ]
Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary
Public Class CouldBeAnotherClass
Public Function SerializeObject(obj As Object) As Byte()
Dim stream As New MemoryStream
Dim bf As New BinaryFormatter
Dim bytes As Byte()
bf.Serialize(stream, obj)
stream.Seek(0, SeekOrigin.Begin)
bytes = stream.ToArray
stream.Close()
Return bytes
End Function
End Class
The Problem
Then the rain came and my parade disbanded. I added a field to one of my objects. You know what happened then: the dreaded "Possible Version mismatch. Type [whatever] has x members, number of members deserialized is y.". Crap.
To make matters worse, Google wasn't turning up an answer for me (gasp). I found lots of reasons why maybe I should have used an existing product like db4o or Bamboo Prevalence, but those, though easier than ORMish solutions, were still overkill for my purpose. And besides, I wanted to figure this out!
My searches led me to implementing ISerializable
on my root objects as a means of creating custom mappings between serialized fields and the objects' fields with a custom constructor and GetObjectData()
. I was dismayed, though, to be spending time creating the very sorts of mappings I was trying to avoid. If I was going to write all that tedious code, why not just setup traditional database tables and map the fields from there? I felt like I had come full circle. I was now wasting time doing stuff like this:
Imports System.Security.Permissions
Imports System.Runtime.Serialization
<Serializable()> Public Class MyClass
Implements ISerializable
. create some members
.
.
Public Sub Save()
Dim util As New CouldBeAnotherClass
Dim bytes As Byte() = util.SerializeObject(Me)
End Sub
Public Sub New()
End Sub
Private Sub New(ByVal info As SerializationInfo, _
ByVal context As StreamingContext)
Me.SomeField = DirectCast(info.GetValue("SomeField", _
Me.SomeField.GetType), [SomeType])
Me.AnotherField = info.GetString("AnotherField")
.
. and on and on with the possibility of conditional mappings
.
End Sub
<SecurityPermissionAttribute(SecurityAction.Demand, _
SerializationFormatter:=True)> _
Public Sub GetObjectData(ByVal info As SerializationInfo, _
ByVal context As StreamingContext _
Implements ISerializable.GetObjectData
info.AddValue("SomeField", Me.SomeField)
info.AddValue("AnotherField", Me.AnotherField)
.
. and on and on with the possibility of conditional mappings
.
End Sub
End Class
But, umbrella in hand, I trudged on, confident there had to be a break in the clouds. I found it in the Bamboo source�some orphaned methods, long left out of that project's test plan, part of the author's early efforts now supplanted by a different approach, but a start for me. Why hadn't Google found that?
The Solution
The current Bamboo approach to object schema migration is to read in an XML file that defines the object and field mappings between what was serialized and the current objects, then use that information to create appropriate initializers called within an implementation of ISerializationSurrogate.SetObjectData
. An object implementing ISerializationSurrogate
is simply one that does that tedious field mapping on behalf of other objects, so your business objects themselves don't have to implement ISerializable
. It's a clever solution, and one that the Java equivalent to Bamboo apparently lacks. But it's a lot more than I need.
I took my direction from another place in the Bamboo source, in a class apparently written prior to the XML mapping approach, and now largely abandoned. It looked something like this (the key method being SetObjectData()
):
Imports System.Reflection
Imports System.Runtime.Serialization
Public Class MySurrogate
Implements ISerializationSurrogate
Implements ISurrogateSelector
Private _assemblyToMigrate As System.Reflection.Assembly
Public Sub New(ByVal assemblyToMigrate As System.Reflection.Assembly)
_assemblyToMigrate = assemblyToMigrate
End Sub
Function SetObjectData(ByVal obj As Object, ByVal info As SerializationInfo, _
ByVal context As StreamingContext, _
ByVal selector As ISurrogateSelector) As Object _
Implements ISerializationSurrogate.SetObjectData
Dim entityType As Type = obj.GetType
For Each entry As SerializationEntry In info
Dim members As MemberInfo() = _
entityType.GetMember(fieldName, MemberTypes.Field, _
BindingFlags.NonPublic Or BindingFlags.Public _
Or BindingFlags.Instance)
If members.Length > 0 Then
Dim newField As FieldInfo = CType(members(0), FieldInfo)
Dim value As Object = entry.Value
If Not value Is Nothing Then
If Not newField.FieldType.IsInstanceOfType(value) Then
value = Convert.ChangeType(value, newField.FieldType)
End If
End If
newField.SetValue(obj, value)
End If
Next
Return Nothing
End Function
Sub GetObjectData(ByVal entity As Object, _
ByVal info As SerializationInfo, _
ByVal context As StreamingContext) Implements _
ISerializationSurrogate.GetObjectData
Throw New NotImplementedException
End Sub
Function GetSurrogate(ByVal type As System.Type, _
ByVal context As StreamingContext, _
ByRef selector As ISurrogateSelector) As ISerializationSurrogate _
Implements ISurrogateSelector.GetSurrogate
If type.Assembly Is _assemblyToMigrate Then
selector = Me
Return Me
Else
selector = Nothing
Return Nothing
End If
End Function
Function GetNextSelector() As ISurrogateSelector _
Implements ISurrogateSelector.GetNextSelector
Return Nothing
End Function
Sub ChainSelector(ByVal selector As _
System.Runtime.Serialization.ISurrogateSelector) _
Implements ISurrogateSelector.ChainSelector
Throw New NotImplementedException("ChainSelector not supported")
End Sub
End Class
The ISurrogateSelector
implementation that you see here is required when constructing the BinaryFormatter
(which we'll do next) that is used to serialize and deserialize your business objects, and with which we want to use an ISerializationSurrogate
so we can customize the field mappings to avoid version mismatch errors.
An ISurrogateSelector
could be used to choose among many ISerializationSurrogate
implementations, if your various business objects need different serialization formats. In this case, however, we specifically want to create an ISerializationSurrogate
that works with all of our objects, so the ISurrogateSelector
is written to return that or nothing, based on a trivial condition. As such, subsequent code blocks will omit the ISurrogateSelector
implementation, necessary though it is.
Unfortunately, while that MySurrogate
class (name changed to protect the innocent) looked promising, it failed when I attempted to deserialize with it, even when testing with object schemas that hadn't actually changed! Before we get into that, I'll show you how to use a surrogate. There was a bit of code above showing how to serialize. Deseralizing is just as easy.
Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary
Public Class CouldBeAnotherClass
Public Function DeserializeObject(ByVal type As System.Type) As Object
Dim stream As FileStream = file.OpenRead
Dim selector As New MySurrogate(type.Assembly)
Dim bf As BinaryFormatter(selector, _
New StreamingContext(StreamingContextStates.All))
Dim obj As Object = bf.Deserialize(stream)
stream.Close
Return obj
End Function
Public Function SerializeObject(obj As Object) As Byte()
.
. as above
.
End Function
End Class
In this case, I'm deserializing from a file (and some syntax for that is missing), but you could deserialize from memory, from a database field, or a variety of sources. Here, we are attempting to use the surrogate created above. If we wanted to deserialize without using our surrogate, we could omit the dimensioning of selector
and create BinaryFormatter
without arguments. Once you have a functioning surrogate, using or not using it with a given BinaryFormatter
is easy.
Overcoming The New Problem
The problem with our surrogate, the one following a pattern in legacy Bamboo code, is that it only works with very simple objects. If our business object uses a field from a base class, this surrogate will fail. The reason it will fail is simply because the Type.GetMember()
method does not return the private members of base classes, even though the standard BinaryFormatter
has successfully serialized those same members. So, as that code loops through the entries in the deserialized information, it won't find a match in our target object, and that field of our object will be left uninitialized.
One way we might avoid that problem is to make those base class members non-private, such as Protected
. Indeed, that will work. Those members are then visible to GetMember()
on the derived type, and will get the value from the matching serialization entry. But, if like me, you've created some collections by inheriting from CollectionBase
, then you don't have the option to change the accessibility of its private list
. Getting a serialized collection back without any of its members is a bummer. No doubt, this applies to myriad other classes that you might inherit from. So, what to do?
Since I was new to this namespace, the first thing I did was feel much consternation. The whole purpose of these efforts was to come up with a pattern of object persistence that would avoid tedious field mapping. I couldn't seem to get there. So, I did what any good programmer with Intellisense would do, and began hitting "." on the stuff in SetObjectData()
to see what options I had. Without detailing those many adventures, I'll get (finally!) to the solution.
Imports System.Reflection
Imports System.Runtime.Serialization
Public Class MySurrogate
Implements ISerializationSurrogate
Implements ISurrogateSelector
Function SetObjectData(ByVal obj As Object, _
ByVal info As SerializationInfo, _
ByVal context As StreamingContext, _
ByVal selector As ISurrogateSelector) As Object _
Implements ISerializationSurrogate.SetObjectData
Dim fieldName As String = String.Empty
Dim entityType As Type
For Each entry As SerializationEntry In info
fieldName = entry.Name
If fieldName.IndexOf("+") <> -1 Then
Dim name As String() = fieldName.Split("+".ToCharArray)
Dim baseType As String = name(0)
fieldName = name(1)
entityType = obj.GetType
Do While entityType.Name <> baseType
entityType = entityType.BaseType
Loop
Else
entityType = obj.GetType
End If
Dim members As MemberInfo() = _
entityType.GetMember(fieldName, MemberTypes.Field, _
BindingFlags.NonPublic Or BindingFlags.Public _
Or BindingFlags.Instance)
If members.Length > 0 Then
Dim newField As FieldInfo = CType(members(0), FieldInfo)
Dim value As Object = entry.Value
If Not value Is Nothing Then
If Not newField.FieldType.IsInstanceOfType(value) Then
value = Convert.ChangeType(value, newField.FieldType)
End If
End If
newField.SetValue(entity, value)
End If
Next
Return Nothing
End Function
End Class
Why It Works
You can see that this is the same ISerializationSurrogate
implementation as above, but with this new block of code within SetObjectData()
:
fieldName = entry.Name
If fieldName.IndexOf("+") <> -1 Then
Dim name As String() = fieldName.Split("+".ToCharArray)
Dim baseType As String = name(0)
fieldName = name(1)
entityType = obj.GetType
Do While entityType.Name <> baseType
entityType = entityType.BaseType
Loop
Else
entityType = obj.GetType
End If
During the many times I stepped through this code in debug mode, I noticed that private fields from base classes always had a .Name
of [BaseClass]+[Field] rather than just [Field]. For example, while deserializing my CollectionBase
derived object, I would see CollectionBase+list
go by ... and not match anything. Then Intellisense showed me Type.BaseType
.
The approach I've arrived at is a bit brute-force, but has worked well for me so far. If I run across a field that belongs to a base class, indicated by the presence of "+", I split out the name of its base type and use that name to drill into my target object type, with Type.BaseType
, until I find the match. This type, and the split-off field name, then become the type and name used by the field matching logic in the remainder of the For
-loop.
Using The Code
Without doing any performance testing, I have nonetheless imagined that this surrogate is slower than using the standard BinaryFormatter
for object deserialization. So, I wanted only to use this ISerializationSurrogate
when the standard deserialization threw the "Possible Version mismatch" error. I handled it as follows:
Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary
Public Class MyPersistenceClass
Public Function Load(ByVal filename As String, ByVal type As System.Type, _
ByRef schemaChange As Boolean) As Object
Dim obj As Object
schemaChange = False
Dim file As New FileInfo(filename)
If file.Exists Then
Dim stream As FileStream = file.OpenRead
Dim bf As BinaryFormatter
bf = Me.CreateFormatter()
Try
obj = bf.Deserialize(stream)
Catch ex As SerializationException
stream.Seek(0, SeekOrigin.Begin)
bf = Me.CreateFormatter(type)
obj = bf.Deserialize(stream)
schemaChange = True
Finally
stream.Close()
End Try
End If
Return obj
End Function
Private Function CreateFormatter(ByVal type _
As System.Type) As BinaryFormatter
Dim selector As New MySurrogate(type.Assembly)
Return New BinaryFormatter(selector, _
New StreamingContext(StreamingContextStates.All))
End Function
Private Function CreateFormatter() As BinaryFormatter
Dim formatter As New BinaryFormatter
formatter.Context = _
New StreamingContext(StreamingContextStates.Persistence)
Return formatter
End Function
End Class
As you can see, this example also assumes file-based persistence. Any data store would work, though. When Load()
is called, it first attempts to deserialize the specified object type from the specified file using the standard BinaryFormatter
, created by a private method. For my use, this will work 99.9% of the time, or more. The business object schemas for the project I developed this for change only a couple of times per year.
But if it fails, it loads the alternate BinaryFormatter
from the overloaded CreateFormatter()
, which uses our ISurrogateSelector
to grab our implementation of ISerializationSurrogate
. This formatter is then used for a second deserialization attempt.
A boolean schemaChange
variable is passed by reference to Load()
so that the calling method can decide what to do when a schema change is detected. In my case, the calling method immediately calls a new .Save
on the object so that the serialized version will subsequently match the new schema.
Caveats
The ISerializationSurrogate
implementation arrived at above will not handle all schema changes. The Bamboo approach is more robust in this regard, handling a range of field mappings with custom initializers. But it too, as far as I can tell, would require writing custom field initializers for many kinds of schema changes.
The changes handled by this implementation are field deletions and additions, and simple type changes. For me, that means, it will automatically handle almost every schema change I anticipate. And since a schema change means moving new assemblies, I can easily, at the same time, add additional logic into ISurrogateSelector.SetObjectData()
to handle any odd field mapping requirements. Such an approach may not be appropriate for your application, and you should certainly consider alternatives if you expect frequent and complex object schema changes.
Conclusion
All this talk around object persistence, and little mention of important features like concurrency and resilience. I do have something working there, a possible follow-up article, but I'm waiting to see if it goes up in flames before I talk of it. Here too, db4o and Bamboo Prevalence have strong solutions, but I didn't wish to deal with a third-party layer, wanted to keep it simple, and wanted to figure it out for myself. So stay tuned.