Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

R language S4Object Serialization to .NET Object

0.00/5 (No votes)
24 Mar 2015 1  
I want to develop a simple wrapper operation to do the hybrid programming data conversion job automatically. This makes my laboratory scientific research job happy!

Download Links

The entire source code of the Shoal Shell language can be download from SourceForge SVN server:

svn checkout svn://svn.code.sf.net/p/shoal/Source/ shoal-Source

Testing example source code in this article:

Related links about Shoal Shell and hybrids programming:

Introduction

Doing the hybrid scripting between VB and R language is painful when you read the calculation data of the R expression, so I wanted to develop a simple wrapper operation to do this data conversion job automatically.

In my recent laboratory scientific research job, I wanted to analyse the gene expression regulation signal from the virtual cell real-time gene chip data. And the R version of the wavelets library could finish this job perfectly, so the code in this article made this hybrid programming happy and simple.

Wavelets analysis using the VB/R hybrids programming code example in this article shows the gene expression regulation signal changes in the bacteria genome.

Picture 1. Steps overview of the Vb/C#/R hybrids programming

Using the Code

Quote:
Steps overviews of the hybrids programming:

1. Create mapping between the .NET class object property and the S4Object attribute

2. R expression evaluation

3. Serialize the R symbolic expression into a.NET object instance.

So, that’s it, just 3 simple steps for you to hybrid programming between the VB/C# and R language. Let’s learn how to step by step.

1. Create Mapping Between the .NET Class Object Property and the S4Object Attribute

This step is the step of creating the schema mapping between the R object and your .NET object, as the same as the XML serialization, before you are going to create an XML document using the XML serialization, you should define a class object to describe the document XML format; after the type definition, then you will be able to create an XML document.

So that in this steps are the same as how you did in the XML serialization, but the difference between the XML serialization and this R object serialization is that we are just using a different custom attribute.

Before we create the mapping, let’s learn the types in R language:

In my opinion, the R object can be divided into 3 types:

  1. S4Object, the s4object is just like the class object in .NET language. The property in a .NET object is equal to the s4object attribute (or slot) in the R language. The main function in this article’s code is to implement the mapping between our .NET class object and the R s4object.
  2. Function, the function object in R language is just like the lambda expression or delegate in.NET language, the declaration of the function in R is just like the lambda expression declaration in.NET.
  3. Generic vector, the generic vector is the most used object in R language because almost all of the objects in R language are vectors. Like the array or list in .NET, the vector can be a property (or attribute) of an s4object in R language and it can also consist of a collection of s4object.

So, as you can see in .NET language, our class object is equal to the s4object in R language, so that the mapping we created in these steps is on the class property. The mapping between the s4object attribute and the property in .NET class is using the DataFrameColumnAttribute, it is in the namespace of Microsoft.VisualBasic.ComponentModel.DataSourceModel, as you can see from the class definition of the customers attribute DataFrameColumnAttribute, this attribute only can be applied on the property or field:

Namespace ComponentModel.DataSourceModel

    ''' <summary>
    ''' Represents a column of certain data frames. The mapping between to schema 
    ''' is also can be represent by this attribute.
    ''' (也可以使用这个对象来完成在两个数据源之间的属性的映射,由于对于一些列名称的属性值缺失的映射而言,
    ''' 其是使用属性名来作为列映射名称的,故而在修改这些没有预设的列名称的映射属性的属性名的时候,请注意
    ''' 要小心维护这种映射关系)
    ''' </summary>
    <AttributeUsage(AttributeTargets.[Property] Or AttributeTargets.Field, Inherited:=True, 
     AllowMultiple:=False)> _
    Public Class DataFrameColumnAttribute : Inherits Attribute     

Here is an example code to create the mapping using this attribute:

Imports Microsoft.VisualBasic.ComponentModel.DataSourceModel

    Public Class Filter
        <DataFrameColumn> Public Property L As Integer
        <DataFrameColumn("level")> Public Property level As Integer
        <DataFrameColumn("h")> Public Property h As Double()
        <DataFrameColumn("g")> Public Property g As Double()
        <DataFrameColumn("wt.class")> Public Property wtclass As String
        <DataFrameColumn("wt.name")> Public Property wtname As String
        <DataFrameColumn("transform")> Public Property transform As String
        <DataFrameColumn("class")> Public Property [class] As String
    End Class

as you can see, the first property:

<DataFrameColumn> Public Property L As Integer

Their mappings have no column name, so that when we create a mapping, the serializes will be using its property name as the mapping name automatically.

The mapping needs a name property because some attribute in the R s4object is illegal in.NET language, such as wt.class in .NET property name is not allowed, so that you can use the DataFrameColumn mapping attribute to accomplish this job.

2. R Expression Evaluation

We are going to get result from R using RDotNET; this library is the most perfect solution by which we can implement the hybrid programming between our VB/C# .NET language and the R language.

You can download the RDotNET library from codeplex home page:

Just two simple steps to hybrid programming between the .NET language and the R language:

First, start the R engine services, for example:

If Not String.IsNullOrEmpty(R_HOME) Then
    Wavelets.R = RDotNET.REngine.StartEngineServices(R_HOME)
Else
    Wavelets.R = RDotNET.REngine.StartEngineServices
End If

Call Wavelets.R.Library(PackageName:="wavelets")

Start a R engine services needs a R_HOME value which is the directory where your R program is installed, such as the default location of the R installer.

C:\Program Files\R\R-3.1.3\bin

If your R program is properly installed on your computer, then the RDotNET can search for the R_HOME automatically based on the registry value of the R program, then you can just using the non-parameter version of the RDotNET.REngine.StartEngineServices to create instance. If not, then you can use the RDotNET.REngine.StartEngineServices(R_HOME) to manually setup the R install location.

After you have created a R engine services instance using RDotNET, then you can code in your .NET program. The thing to note in your hybrid programming is that many of the analysis programs in R are not originally included in the base package, so that before you are going to run the program, you should install the required R package in R terminal. When you have finished and successfully installed the R package, then you can use the Library function in the REngine to load the required library package.

Call Wavelets.R.Library(PackageName:="wavelets")

or you also can put this step in the scripting steps:

Dim STDOUT = Wavelets.R <= "library(""wavelets"")"

Then, you can just simply invoke the R calculation using the R.Evaluate function, this function returns the RDotNET symbolic expression object which exposes the R memory into your .NET program. Unlike the <= operator in RDotNET, <= operator returns the STDOUT string collection which was displayed on the terminal console.

3. Serialize the R Symbolic Expression into a .NET Object Instance

In this step, we can just serialize an RDotNET symbolic expression into a .NET object with just one statement, your hybrids programming with R language just keeps things simple and happy :-).

We assume that you have properly created the mapping class object in your program, and then you have got a result value from the R invoked evaluation, so than you can just do the serialization job simply like the operation shown below:

Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream_
             (Of Wavelets.Waveletmodwt)(TestResultRS4Object)

How Does This Code Work?

This Serialization operation can be found at the namespace location: RDotNET.Extensions.ShellScriptAPI.Serialization. And there are two interfaces to invoke this serialization:

Imports RDotNET.SymbolicExpressionExtension

''' <summary>
''' Convert the R object into a .NET object from the specific type schema information.
''' (将R之中的对象内存数据转换为.NET之中指定的对象实体)
''' </summary>
''' <remarks></remarks>
Public Module Serialization

    ''' <summary>
    ''' Deserialize the R object into a specific .NET object. 
    ''' <see cref="RDotNET.SymbolicExpression"></see>  =====> <see cref="T"></see>
    ''' </summary>
    ''' <typeparam name="T"></typeparam>
    ''' <param name="RData"></param>
    ''' <returns></returns>
    ''' <remarks>
    ''' 反序列化的规则:
    ''' 1. S4对象里面的Slot为对象类型之中的属性
    ''' 2. 任何对象属性都会被表示为数组
    ''' </remarks>
    Public Function LoadFromStream(Of T As Class)(RData As RDotNET.SymbolicExpression) As T
        Dim value As Object = InternalLoadFromStream(RData, GetType(T))
        Return DirectCast(value, T)
    End Function

    ''' <summary>
    ''' Needs your manual type casting in your program.
    ''' </summary>
    ''' <param name="RData"></param>
    ''' <param name="Type"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Function LoadRStream(RData As RDotNET.SymbolicExpression, Type As Type) As Object
        Dim value As Object = InternalLoadFromStream(RData, Type)
        Return value
    End Function  

Since the s4object in R maybe has some vector in its attribute and the element in the vector is possibly an s4object type, the serialization of the s4object is a recursive operation. So at first, we start this recursive operation from this function:

''' <summary>
''' Load the R symbolic expression data recursivly start from here.
''' </summary>
''' <param name="RData"></param>
''' <param name="TypeInfo"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalLoadFromStream(RData As RDotNET.SymbolicExpression, _
      TypeInfo As System.Type) As Object
    Select Case RData.Type

        Case Internals.SymbolicExpressionType.S4

            'Load the R symbolic expression data recursivly start from here.
            Return InternalLoadS4Object(RData, TypeInfo)

        Case Internals.SymbolicExpressionType.LogicalVector
            Return RData.AsLogical.ToArray
        Case Internals.SymbolicExpressionType.CharacterVector
            Return RData.AsCharacter.ToArray
        Case Internals.SymbolicExpressionType.IntegerVector
            Return RData.AsInteger.ToArray
        Case Internals.SymbolicExpressionType.NumericVector
            Return RData.AsNumeric.ToArray
        Case Internals.SymbolicExpressionType.List
            Return InternalCreateMatrix(RData, TypeInfo)

        Case Else
            Throw New NotImplementedException

    End Select

End Function

As you can see in this function, if the r object is an s4object, then the program will continue the operation recursive, or else if the object is an elementary type, then the function will exist from the recursive operation and return the value. In this serializes, we just simply read the simple data type in .NET language: Boolean, String, Integer, Double and Object(), other data type such as function in R (lambda expression in.NET language) is skipped in this function, because we don't know how to save this data into the filesystem.

Then, we are going to the recursive operation steps if the object we are going to map in our program is the s4object in R language:

Case Internals.SymbolicExpressionType.S4

    'Load the R symbolic expression data recursivly start from here.
    Return InternalLoadS4Object(RData, TypeInfo)
''' <summary>
''' The recursive operation of the S4Object in R starts from here.
''' This recursive operation will stop when the property value is not a S4Object.
''' (这个可能是一个递归的过程,一直解析到各个属性的R类型不再是S4对象类型为止)
''' </summary>
''' <param name="RData"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalLoadS4Object(RData As RDotNET.SymbolicExpression, _
       TypeInfo As System.Type) As Object
    Dim Mappings = Microsoft.VisualBasic.ComponentModel.DataSourceModel._
                      DataFrameColumnAttribute.LoadMapping(TypeInfo)
    Dim obj As Object = Activator.CreateInstance(TypeInfo)

    Call Console.WriteLine("[DEBUG] {0}  ---> R.S4Object (""{1}"")", _
                TypeInfo.FullName, String.Join("; ", RData.GetAttributeNames))

    For Each Slot In Mappings
        Dim RSlot As RDotNET.SymbolicExpression = RData.GetAttribute(Slot.Key.Name)
        Dim value As Object = InternalLoadFromStream(RSlot, Slot.Value.PropertyType)

        Call InternalValueMapping(value, Slot.Value, obj:=obj)
    Next

    Return obj
End Function

We are going to load the mapping at first in this step using:

Dim Mappings = Microsoft.VisualBasic.ComponentModel.DataSourceModel._
               DataFrameColumnAttribute.LoadMapping(TypeInfo)

Then, we create an object instance of target mapping type to contain the data.

Dim obj As Object = Activator.CreateInstance(TypeInfo)

Since an attribute in S4Object is equal to the .NET class property, when we load the mapping from the metadata in the schema definition of the target type in our .NET program, then we can load the data from R expression specific for each property in our class. The steps in the For loop contain these steps:

  1. Gets the specific attribute in S4Object as the mapping serialization data source:
    Dim RSlot As RDotNET.SymbolicExpression = RData.GetAttribute(Slot.Key.Name)
    
  2. Then we are able to continue deserialization of the R expression recursively:
    Dim value As Object = InternalLoadFromStream(RSlot, Slot.Value.PropertyType)
    
  3. At last, we get the value in .NET format, so that we can assign the value to the property using the reflection operation.
    Call InternalValueMapping(value, Slot.Value, obj:=obj)
    

The matrix value cannot be directly assigned using reflection.

As you can see in the previous steps, the value we get from the serialization mapping is not directly assigned to the specific property, but using a function to implement this job, this is because the matrix object in R is mapping as the array of (object array)...... so that we get the matrix from R, in fact is an object array (since object array equals the object type, or everything in.NET is equal to the object type because all of the data type in .NET inherits from the object type) so that the matrix in R in fact the .NET program thinks it is an object array, not a specific type array's array, so that when we directly assign the matrix value, the program will crash!

Picture 2. How does the R Matrix will convert to a object array

Finally, we gets an Object() of which the element type in this array is Double(), not the type we want: Double()() matrix, this will cause the exception. So that we are using the function:

''' <summary>
'''
''' </summary>
''' <param name="value"></param>
''' <param name="pInfo"></param>
''' <param name="obj">对象实例</param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalValueMapping(value As Object, _
      pInfo As System.Reflection.PropertyInfo, ByRef obj As Object) As Boolean
    Dim pTypeInfo As System.Type = pInfo.PropertyType

    If pTypeInfo.HasElementType Then
       Call InternalMappingCollectionType(value, pInfo, obj, pTypeInfo)
    Else
       Call InternalRVectorToNETProperty_
           (pTypeInfo:=value.GetType, value:=value, obj:=obj, pInfo:=pInfo)
    End If

    Return True
End Function

To help us to correctly convert the vector matrix type into a properly .NET array type.

Since almost all of the R data type is a vector, when the property in our .NET class is a single element such as string/integer/double not the vector string()/integer()/double(), so when the reflected type of the property in .NET class is a single element, then we just need to convert the r data to an array and get the first element value, things just works fine. When the data type in our .NET class property is an array, then we just directly assign the r converted value to it, things also work fine!

Convert the object array into a specific type matrix using this function:

''' <summary>
''' Object() to T()()
''' </summary>
''' <param name="value"></param>
''' <param name="pInfo"></param>
''' <param name="obj"></param>
''' <param name="pTypeInfo"></param>
''' <remarks></remarks>
Private Sub InternalMappingCollectionType(value As Object, _
    pInfo As System.Reflection.PropertyInfo, ByRef obj As Object, pTypeInfo As System.Type)
    Dim EleTypeInfo As Type = pTypeInfo.GetElementType
    Dim SourceList = (From val As Object In DirectCast(value, System.Collections.IEnumerable) _
                      Select val).ToArray
    Dim List = Array.CreateInstance(EleTypeInfo, SourceList.Count)

    For i As Integer = 0 To SourceList.Count - 1
        Call List.SetValue(SourceList(i), i)
    Next

    Call pInfo.SetValue(obj, List)
End Sub

We can use the Array.CreateInstance in this reflection operation function to create a type specific array, before we create the array, we should know its element type, the element type can be known from the reflection of the property type:

Dim EleTypeInfo As Type = pTypeInfo.GetElementType

Since we already know that the R converted data is a matrix, we directly convert it into an array data:

Dim SourceList = (From val As Object In DirectCast_
                 (value, System.Collections.IEnumerable) Select val).ToArray

At last, we have known two key elements of how to create an array: its element type and the element counts in the array (or we can say the array size):

Dim List = Array.CreateInstance(EleTypeInfo, SourceList.Count)

After we used the List.SetValue to assign the element value to each position in the array, then we get an array(of array) type matrix in the .NET program. Finally, we can assign this converted matrix value to the specific property:

Call pInfo.SetValue(obj, List)

A Simple Code Testing Example

In the test project, you can learn how to do this happy and easily hybrid programming. There are two modules in the test project:

Quote:

Module Wavelets for define the required r function and r object mapping type to read the wavelets calculation result from the r invoke

Module Program for testing example code

Important Note

Before you run this code, the R program should properly install on your computer and the required wavelets R library should install on your R system.

1. The Simplest VB/C# Hybrid Programming Example

' VB/C# with R language hybrid programming example

Dim ChipData = (From row As Microsoft.VisualBasic.DataVisualization.DocumentFormat.Csv.File.RowObject
                In Microsoft.VisualBasic.DataVisualization.DocumentFormat.Csv.File.FastLoad_
                ("../DM_1184.GeneChipDataSamples.csv")
                Select ID = row.First, ExpressionData0 = (From s As String In row.Skip(1) _
                Select Val(s)).ToArray).ToArray

Call Wavelets.Initialize()

Dim TestResultRS4Object = Wavelets.DWT_RInvoke(ChipData.First.ExpressionData0, filter:="haar")
Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream_
             (Of Wavelets.Waveletmodwt)(TestResultRS4Object)

Call Result.GetXml.SaveTo("./Test.Result.xml")

The program code follows the typical steps of the R hybrids programming:

  1. Initialize the R engine services and load the required library in function:
    Call Wavelets.Initialize()
  2. And then invoke the R function gets a RDotNET symbolic expression:
    Dim TestResultRS4Object = Wavelets.DWT_RInvoke(ChipData.First.ExpressionData0, filter:="haar")
  3. At last, we get the result in the .NET class format through the serialization:
    Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream_
                 (Of Wavelets.Waveletmodwt)(TestResultRS4Object)

Invoking the wavelets signal analysis only needs simple and happy 3 steps of coding, right? :)

2. Hybrids Scripting With the ShoalShell Language

The Shoal Shell language is a new type of embed scripting language in .NET which was originally developed for my virtual cell system. And it has the feature of a lot of hybrids scripting ability with R/Perl/SQL/LINQ, currently, I just released the R hybrids scripting API for the shoal shell.

The example shows how to do hybrids scripting with shoal/R and your .NET program:

'Shoal Shell Script programming example

Dim ShoalShell As Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.ShellScript = _
                 New Scripting.ShoalShell.Runtime.Objects.ShellScript()

Call ShoalShell.InstallModules(GetType_
       (RDotNET.Extensions.ShellScriptAPI.Serialization).Assembly.Location)
Call ShoalShell.InstallModules(GetType(Wavelets).Assembly.Location)
Call ShoalShell.InstallModules(GetType_
       (ShoalShell.PlugIns.Plot_Devices.DataSource).Assembly.Location)

Call ShoalShell.TypeLibraryRegistry.Save()

Dim Script As String =
<ShoalShell-Script>

imports wavelets
imports r.net
imports io_device.csv
imports system

chipdata &lt; (imports.csv) ../DM_1184.GeneChipDataSamples.csv
chipdata &lt;- $chipdata -> as.datasource
chipdata &lt;= $chipdata [0]
chipdata &lt;- $chipdata -> get.X

s4obj &lt;- $chipdata -> dwt.r.invoke filter haar n.levels 5
result.type &lt;- wavelets result.type.schema
result &lt;- ctype r.data $s4obj cast.type $result.type

call $result > ./Test.Result.ShoalInvoke.xml

return $result
</ShoalShell-Script>

Dim bResult = ShoalShell <= Script  'Execute the script and gets the return value
MsgBox(DirectCast(bResult, Wavelets.Waveletmodwt).GetXml, MsgBoxStyle.Information)

First, we instantiate a shoal shell scripting host in our code and then install the required module DLL file:

Dim ShoalShell As Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.ShellScript = _
                         New Scripting.ShoalShell.Runtime.Objects.ShellScript()

For install, the external dynamics API module DLL file, you can use:

Call ShoalShell.InstallModules("<DLL_filepath>")

Example as:

Call ShoalShell.InstallModules(GetType_
         (RDotNET.Extensions.ShellScriptAPI.Serialization).Assembly.Location)

Then we start to script and get the return result from:

# Shoal shell statement
return $result
' VB code gets the result from the shoal shell returns value
Dim bResult = ShoalShell <= Script  'Execute the script and gets the return value

3. Dynamics Programming With Shoal Shell

The shoal shell also has the dynamics programming feature with your .NET program:

' Shoal Shell VB/C# dynamics programming example

Dim Dynamics As Object = New Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.Dynamics_
                         (ShoalShell)
'  ---------------------------Translate version of the shell shell scripting show above--------------

Dim ChipDataDy = Dynamics.Imports.Csv("../DM_1184.GeneChipDataSamples.csv")
ChipDataDy = Dynamics.As.DataSource(ChipDataDy)
ChipDataDy = ChipDataDy(0)
ChipDataDy = Dynamics.Get.X(ChipDataDy)

Dim s4obj = Dynamics.dwt.r.invoke(ChipDataDy)
Result = DirectCast(Dynamics.CType(s4obj, GetType(Wavelets.Waveletmodwt)), Wavelets.Waveletmodwt)

'  ---------------------------------------------------------------------------------------------------

Result.GetXml.SaveTo("./Test.Result.ShoalInvoke.Dynamics.Programming.xml")
MsgBox(DirectCast(Result, Wavelets.Waveletmodwt).GetXml, MsgBoxStyle.Information)      

As you can see, the dynamics code shown above is the VB translated version of the shoal shell scripting! Things are amazing!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here