Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Automated Object Schema Migration

0.00/5 (No votes)
8 Dec 2004 1  
Use reflection and a serialization surrogate in .NET to automate basic object schema migration.

Introduction

This article demonstrates a simple means of automating object schema migration using a serialization surrogate and reflection. Or, in other words, what to do when .NET says "Possible Version mismatch. Type [whatever] has x members, number of members deserialized is y."

Background

After years of writing custom ORM-type solutions, or contenting myself with using ADO for business entities, I finally hit my breaking point and decided it was time to enter the shadowy world of object persistence. And so began my first foray into .NET's Serialization namespace.

I was pleasantly surprised with the ease with which I could begin serializing my business objects directly to disk or any persistent store (documented plenty elsewhere). Everything was sunshine and rainbows. I had traded what probably would have been a few days of work setting up tables and mapping fields between tables and objects, with just a couple of hours of work implementing serialization, as follows:

[ MyClass.vb ]
<Serializable()> Public Class MyClass
    . create some members
    .
    .
    Public Sub Save()
        Dim util As New CouldBeAnotherClass
        Dim bytes As Byte() = util.SerializeObject(Me)
        ' now save it to disk, database, cache, whatever

    End Sub
End Class

[ CouldBeAnotherClass.vb ]
Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary

Public Class CouldBeAnotherClass
    Public Function SerializeObject(obj As Object) As Byte()
        Dim stream As New MemoryStream ' or file stream, any stream

        Dim bf As New BinaryFormatter
        Dim bytes As Byte()

        bf.Serialize(stream, obj)
        stream.Seek(0, SeekOrigin.Begin)
        bytes = stream.ToArray
        stream.Close()
        Return bytes
    End Function
End Class

The Problem

Then the rain came and my parade disbanded. I added a field to one of my objects. You know what happened then: the dreaded "Possible Version mismatch. Type [whatever] has x members, number of members deserialized is y.". Crap.

To make matters worse, Google wasn't turning up an answer for me (gasp). I found lots of reasons why maybe I should have used an existing product like db4o or Bamboo Prevalence, but those, though easier than ORMish solutions, were still overkill for my purpose. And besides, I wanted to figure this out!

My searches led me to implementing ISerializable on my root objects as a means of creating custom mappings between serialized fields and the objects' fields with a custom constructor and GetObjectData(). I was dismayed, though, to be spending time creating the very sorts of mappings I was trying to avoid. If I was going to write all that tedious code, why not just setup traditional database tables and map the fields from there? I felt like I had come full circle. I was now wasting time doing stuff like this:

Imports System.Security.Permissions
Imports System.Runtime.Serialization

<Serializable()> Public Class MyClass
    Implements ISerializable
    . create some members
    .
    .
    Public Sub Save()
        Dim util As New CouldBeAnotherClass
        Dim bytes As Byte() = util.SerializeObject(Me)
        ' now save it to disk, database, cache, whatever

    End Sub
    
    Public Sub New()
        ' an empty constructor for standard object initialization

    End Sub
    
    Private Sub New(ByVal info As SerializationInfo, _
                    ByVal context As StreamingContext)
        ' a private constructor used automatically

        ' by the deserialization process

        Me.SomeField = DirectCast(info.GetValue("SomeField", _
                                  Me.SomeField.GetType), [SomeType])
        Me.AnotherField = info.GetString("AnotherField")
        .
        .    and on and on with the possibility of conditional mappings
        .
    End Sub

    <SecurityPermissionAttribute(SecurityAction.Demand, _
     SerializationFormatter:=True)> _
    Public Sub GetObjectData(ByVal info As SerializationInfo, _
        ByVal context As StreamingContext _
        Implements ISerializable.GetObjectData
        
        info.AddValue("SomeField", Me.SomeField)
        info.AddValue("AnotherField", Me.AnotherField)
        .
        .    and on and on with the possibility of conditional mappings
        .
    End Sub
End Class

But, umbrella in hand, I trudged on, confident there had to be a break in the clouds. I found it in the Bamboo source�some orphaned methods, long left out of that project's test plan, part of the author's early efforts now supplanted by a different approach, but a start for me. Why hadn't Google found that?

The Solution

The current Bamboo approach to object schema migration is to read in an XML file that defines the object and field mappings between what was serialized and the current objects, then use that information to create appropriate initializers called within an implementation of ISerializationSurrogate.SetObjectData. An object implementing ISerializationSurrogate is simply one that does that tedious field mapping on behalf of other objects, so your business objects themselves don't have to implement ISerializable. It's a clever solution, and one that the Java equivalent to Bamboo apparently lacks. But it's a lot more than I need.

I took my direction from another place in the Bamboo source, in a class apparently written prior to the XML mapping approach, and now largely abandoned. It looked something like this (the key method being SetObjectData()):

Imports System.Reflection
Imports System.Runtime.Serialization

Public Class MySurrogate
    Implements ISerializationSurrogate
    Implements ISurrogateSelector

    Private _assemblyToMigrate As System.Reflection.Assembly

    Public Sub New(ByVal assemblyToMigrate As System.Reflection.Assembly)
        _assemblyToMigrate = assemblyToMigrate
    End Sub
    
    Function SetObjectData(ByVal obj As Object, ByVal info As SerializationInfo, _
        ByVal context As StreamingContext, _
        ByVal selector As ISurrogateSelector) As Object _
        Implements ISerializationSurrogate.SetObjectData

        Dim entityType As Type = obj.GetType

        For Each entry As SerializationEntry In info
            Dim members As MemberInfo() = _
                entityType.GetMember(fieldName, MemberTypes.Field, _
                BindingFlags.NonPublic Or BindingFlags.Public _
                Or BindingFlags.Instance)

            If members.Length > 0 Then
                Dim newField As FieldInfo = CType(members(0), FieldInfo)
                Dim value As Object = entry.Value
                If Not value Is Nothing Then
                    If Not newField.FieldType.IsInstanceOfType(value) Then
                        value = Convert.ChangeType(value, newField.FieldType)
                    End If
                End If
                newField.SetValue(obj, value)
            End If
        Next
        Return Nothing
    End Function
    
    Sub GetObjectData(ByVal entity As Object, _
        ByVal info As SerializationInfo, _
        ByVal context As StreamingContext) Implements _
        ISerializationSurrogate.GetObjectData

        Throw New NotImplementedException
    End Sub

    Function GetSurrogate(ByVal type As System.Type, _
        ByVal context As StreamingContext, _
        ByRef selector As ISurrogateSelector) As ISerializationSurrogate _
        Implements ISurrogateSelector.GetSurrogate

        If type.Assembly Is _assemblyToMigrate Then
            selector = Me
            Return Me
        Else
            selector = Nothing
            Return Nothing
        End If
    End Function
    
    Function GetNextSelector() As ISurrogateSelector _
             Implements ISurrogateSelector.GetNextSelector
        Return Nothing
    End Function

    Sub ChainSelector(ByVal selector As _
        System.Runtime.Serialization.ISurrogateSelector) _
        Implements ISurrogateSelector.ChainSelector

        Throw New NotImplementedException("ChainSelector not supported")
    End Sub
End Class

The ISurrogateSelector implementation that you see here is required when constructing the BinaryFormatter (which we'll do next) that is used to serialize and deserialize your business objects, and with which we want to use an ISerializationSurrogate so we can customize the field mappings to avoid version mismatch errors.

An ISurrogateSelector could be used to choose among many ISerializationSurrogate implementations, if your various business objects need different serialization formats. In this case, however, we specifically want to create an ISerializationSurrogate that works with all of our objects, so the ISurrogateSelector is written to return that or nothing, based on a trivial condition. As such, subsequent code blocks will omit the ISurrogateSelector implementation, necessary though it is.

Unfortunately, while that MySurrogate class (name changed to protect the innocent) looked promising, it failed when I attempted to deserialize with it, even when testing with object schemas that hadn't actually changed! Before we get into that, I'll show you how to use a surrogate. There was a bit of code above showing how to serialize. Deseralizing is just as easy.

Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary

Public Class CouldBeAnotherClass

    Public Function DeserializeObject(ByVal type As System.Type) As Object
        Dim stream As FileStream = file.OpenRead
        Dim selector As New MySurrogate(type.Assembly)
        Dim bf As BinaryFormatter(selector, _
                  New StreamingContext(StreamingContextStates.All))
        Dim obj As Object = bf.Deserialize(stream)
        stream.Close
        Return obj
    End Function

    Public Function SerializeObject(obj As Object) As Byte()
        .
        .    as above
        .
    End Function
End Class

In this case, I'm deserializing from a file (and some syntax for that is missing), but you could deserialize from memory, from a database field, or a variety of sources. Here, we are attempting to use the surrogate created above. If we wanted to deserialize without using our surrogate, we could omit the dimensioning of selector and create BinaryFormatter without arguments. Once you have a functioning surrogate, using or not using it with a given BinaryFormatter is easy.

Overcoming The New Problem

The problem with our surrogate, the one following a pattern in legacy Bamboo code, is that it only works with very simple objects. If our business object uses a field from a base class, this surrogate will fail. The reason it will fail is simply because the Type.GetMember() method does not return the private members of base classes, even though the standard BinaryFormatter has successfully serialized those same members. So, as that code loops through the entries in the deserialized information, it won't find a match in our target object, and that field of our object will be left uninitialized.

One way we might avoid that problem is to make those base class members non-private, such as Protected. Indeed, that will work. Those members are then visible to GetMember() on the derived type, and will get the value from the matching serialization entry. But, if like me, you've created some collections by inheriting from CollectionBase, then you don't have the option to change the accessibility of its private list. Getting a serialized collection back without any of its members is a bummer. No doubt, this applies to myriad other classes that you might inherit from. So, what to do?

Since I was new to this namespace, the first thing I did was feel much consternation. The whole purpose of these efforts was to come up with a pattern of object persistence that would avoid tedious field mapping. I couldn't seem to get there. So, I did what any good programmer with Intellisense would do, and began hitting "." on the stuff in SetObjectData() to see what options I had. Without detailing those many adventures, I'll get (finally!) to the solution.

Imports System.Reflection
Imports System.Runtime.Serialization

Public Class MySurrogate
    Implements ISerializationSurrogate
    Implements ISurrogateSelector
    
    Function SetObjectData(ByVal obj As Object, _
             ByVal info As SerializationInfo, _
             ByVal context As StreamingContext, _
             ByVal selector As ISurrogateSelector) As Object _
             Implements ISerializationSurrogate.SetObjectData

        Dim fieldName As String = String.Empty
        Dim entityType As Type

        For Each entry As SerializationEntry In info
            ' for each member that was serialized,

            ' get matching member in new type

            fieldName = entry.Name
            If fieldName.IndexOf("+") <> -1 Then
                ' serialized field comes from a base class

                Dim name As String() = fieldName.Split("+".ToCharArray)
                Dim baseType As String = name(0)

                fieldName = name(1)
                entityType = obj.GetType

                ' drill into base classes until type found

                Do While entityType.Name <> baseType
                    entityType = entityType.BaseType
                Loop
            Else
                entityType = obj.GetType
            End If

            Dim members As MemberInfo() = _
                entityType.GetMember(fieldName, MemberTypes.Field, _
                BindingFlags.NonPublic Or BindingFlags.Public _
                Or BindingFlags.Instance)

            If members.Length > 0 Then
                ' entity has a member matching the serialized info

                Dim newField As FieldInfo = CType(members(0), FieldInfo)
                Dim value As Object = entry.Value
                If Not value Is Nothing Then
                    ' don't bother adding serialized members with null values

                    If Not newField.FieldType.IsInstanceOfType(value) Then
                        ' convert type if changed in new member

                        value = Convert.ChangeType(value, newField.FieldType)
                    End If
                End If
                newField.SetValue(entity, value)
            End If
        Next
        Return Nothing
    End Function
    
    ' ISurrogateSelector implementations not shown

End Class

Why It Works

You can see that this is the same ISerializationSurrogate implementation as above, but with this new block of code within SetObjectData():

fieldName = entry.Name
If fieldName.IndexOf("+") <> -1 Then
    Dim name As String() = fieldName.Split("+".ToCharArray)
    Dim baseType As String = name(0)

    fieldName = name(1)
    entityType = obj.GetType

    Do While entityType.Name <> baseType
        entityType = entityType.BaseType
    Loop
Else
    entityType = obj.GetType
End If

During the many times I stepped through this code in debug mode, I noticed that private fields from base classes always had a .Name of [BaseClass]+[Field] rather than just [Field]. For example, while deserializing my CollectionBase derived object, I would see CollectionBase+list go by ... and not match anything. Then Intellisense showed me Type.BaseType.

The approach I've arrived at is a bit brute-force, but has worked well for me so far. If I run across a field that belongs to a base class, indicated by the presence of "+", I split out the name of its base type and use that name to drill into my target object type, with Type.BaseType, until I find the match. This type, and the split-off field name, then become the type and name used by the field matching logic in the remainder of the For-loop.

Using The Code

Without doing any performance testing, I have nonetheless imagined that this surrogate is slower than using the standard BinaryFormatter for object deserialization. So, I wanted only to use this ISerializationSurrogate when the standard deserialization threw the "Possible Version mismatch" error. I handled it as follows:

Imports System.Runtime.Serialization
Imports System.Runtime.Serialization.Formatters.Binary

Public Class MyPersistenceClass

    Public Function Load(ByVal filename As String, ByVal type As System.Type, _
        ByRef schemaChange As Boolean) As Object

        Dim obj As Object
        schemaChange = False

        Dim file As New FileInfo(filename)
        If file.Exists Then
            Dim stream As FileStream = file.OpenRead
            Dim bf As BinaryFormatter

            bf = Me.CreateFormatter()

            Try
                obj = bf.Deserialize(stream)
            Catch ex As SerializationException
                ' standad deserialization

                ' didn't work so attempt schema migration

                stream.Seek(0, SeekOrigin.Begin)
                bf = Me.CreateFormatter(type)
                obj = bf.Deserialize(stream)
                schemaChange = True
            Finally
                stream.Close()
            End Try
        End If

        Return obj
    End Function
    
    Private Function CreateFormatter(ByVal type _
                     As System.Type) As BinaryFormatter
        Dim selector As New MySurrogate(type.Assembly)
        Return New BinaryFormatter(selector, _
            New StreamingContext(StreamingContextStates.All))
    End Function

    Private Function CreateFormatter() As BinaryFormatter
        Dim formatter As New BinaryFormatter
        formatter.Context = _
            New StreamingContext(StreamingContextStates.Persistence)
        Return formatter
    End Function
End Class

As you can see, this example also assumes file-based persistence. Any data store would work, though. When Load() is called, it first attempts to deserialize the specified object type from the specified file using the standard BinaryFormatter, created by a private method. For my use, this will work 99.9% of the time, or more. The business object schemas for the project I developed this for change only a couple of times per year.

But if it fails, it loads the alternate BinaryFormatter from the overloaded CreateFormatter(), which uses our ISurrogateSelector to grab our implementation of ISerializationSurrogate. This formatter is then used for a second deserialization attempt.

A boolean schemaChange variable is passed by reference to Load() so that the calling method can decide what to do when a schema change is detected. In my case, the calling method immediately calls a new .Save on the object so that the serialized version will subsequently match the new schema.

Caveats

The ISerializationSurrogate implementation arrived at above will not handle all schema changes. The Bamboo approach is more robust in this regard, handling a range of field mappings with custom initializers. But it too, as far as I can tell, would require writing custom field initializers for many kinds of schema changes.

The changes handled by this implementation are field deletions and additions, and simple type changes. For me, that means, it will automatically handle almost every schema change I anticipate. And since a schema change means moving new assemblies, I can easily, at the same time, add additional logic into ISurrogateSelector.SetObjectData() to handle any odd field mapping requirements. Such an approach may not be appropriate for your application, and you should certainly consider alternatives if you expect frequent and complex object schema changes.

Conclusion

All this talk around object persistence, and little mention of important features like concurrency and resilience. I do have something working there, a possible follow-up article, but I'm waiting to see if it goes up in flames before I talk of it. Here too, db4o and Bamboo Prevalence have strong solutions, but I didn't wish to deal with a third-party layer, wanted to keep it simple, and wanted to figure it out for myself. So stay tuned.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here