Introduction
In my recent job, there was a requirement for a library to draw the correlation relationship between the genes and the phenotype in the bacteria genome. The heatmap is a good choice, and there are already so many good libraries that exist in the R language, so a hybrid programming technology is required in my research job.
The RDotNET project makes the hybrid programming between R and .NET language possible, but it is still not so convenient for the programing. So I decided to develop this project for the R hybrid programming which is more efficient.
Visit RDotNET home: https://rdotnet.codeplex.com/
Declare an R API
The R API is Different from the Win32 API
If we define a Win32API, then DllImport
attribute will be used, for example:
<DllImport("kernel32.dll", EntryPoint:="LoadLibrary", SetLastError:=True)> _
Private Shared Function InternalLoadLibrary(
<MarshalAs(UnmanagedType.LPStr)> lpFileName As String) As IntPtr
End Function
Or in vb6 old style:
Public Declare Function InternalLoadLibrary _
Lib "kernel32.dll" _
Alias "LoadLibrary" _
(<MarshalAs(UnmanagedType.LPStr)> lpFileName As String) As IntPtr
But the situation in R is different, function in R is an object, just like the method in VisualBasic is an object too, Or Everything is object.
A Simple R API Example
A basically R API without any parameter can be an empty class object, like:
<RFunc("heatmap.2")>
Public Class heatmap2 : Inherits IRToken
End Class
So that we can define a function API entry point from R like:
heatmap.2()
If we want to add some parameter for the API, just add some property in your class:
<RFunc("heatmap.2")>
Public Class heatmap2 : Inherits IRToken
Public Property x As RExpression
Public Property Rowv As Boolean = True
Public Property Colv As RExpression = [TRUE]
Public Property col As RExpression = "rev(brewer.pal(10,""RdYlBu""))"
Public Property revC As RExpression = [TRUE]
Public Property scale As RExpression = "row"
Public Property margins As RExpression = c(15, 15)
Public Property key As Boolean = True
<Parameter("density.info")>
Public Property densityInfo As String = Rstring("none")
End Class
So that finally this function API from R looks like:
heatmap.2(x,
rowv = TRUE,
colv =TRUE,
col= rev(brewer.pal(10,""RdYlBu"")),
revC = TRUE,
scale="row",
margins=c(15,15),
key=TRUE,
density.info="none")
API Details
API Entry Point
RFunc
attribute is used to define a R API entry point, just like the DllImport
for Win32API, by using this RFunc
attribute, we can declare a function name with a dot, which is illegal in the VisualBasic identifier name.
Tweak on the Name
If the parameter name has a dot, and the dot character is not allowed in VisualBasic identifier, so that you can use Parameter
attribute to declare the alias of the parameter name.
In addition, if the property of your API is not a parameter, and then you can using Ignored
attribute to mask this property from the API builder.<o:p>
IRToken Wrapper
Finally, your R API class can optionally inherit from the IRToken
object as a set of extension method for R scripting have been defined for the IRToken
wrapper object. Finally, the API builder can serialize your API into the R script just by using the extension method:
Me.GetScript(Me.GetType)
Me.GetScript
Why Choose Class as API?
Conveniently Share Your Script With Your Friend or Archive Script Model
Since sometimes drawing an R
image needs a lot of parameter adjustments, and you wish to share the R
script with your friends after the parameter tweaks, you just need to do serialization of your script into a json file and then e-Mail it to your friend, your friend just needs to load your script by json deserialization and makes some further tweaks.
Makes the Programming in Visual Basic More Easier
As you can see, most of the R
function has a lot of parameters that can be tweaked, so that when you are programming with R, you need to define a lot of parameters in your program if the API is written in a function object.
As for me, I prefer using a class to transfer multiple parameters to the multiple function parameter.
Function example(path, format, blablabla...) As Type
End Function
Function example(path, format, RAPI) As Type
End Function
Where the RAPI
is a class, and this property in this class is the multiple parameters blablabla in the function above.
API can Inherit other API, and This Defines API of Some Overloads Function in R More Easy
For example, there are some image format API in grDevices
namespace in R
, like bmp, jpeg, png and tiff, these R
function have some common parameters to drawing a image, so that when you are designing this API, you just need to declare a base class for the common parameter and the sub class for unique parameter, and this inherits relationship makes your program more clear and easy.
Yeah, the class type function API from R
makes your program structure more clear!
API Builder
R Script Token
Here I have defined a set of abstract
class as the R API token, which contains a serials wrapper extension method.
Public MustInherit Class IRProvider
Implements IScriptProvider
Dim __requires As String()
<Ignored> Public Overridable Property Requires As String()
Get
Return __requires
End Get
Protected Set(value As String())
__requires = value
End Set
End Property
Public MustOverride Function RScript() As String Implements IScriptProvider.RScript
Public Overrides Function ToString() As String
Return RScript()
End Function
Public Shared Narrowing Operator CType(R As IRProvider) As String
Return R.RScript
End Operator
End Class
Public Class IRToken : Inherits IRProvider
Implements IScriptProvider
Public Overrides Function RScript() As String
Return Me.GetScript([GetType])
End Function
Public Overloads Shared Narrowing Operator CType(token As IRToken) As String
Return token.RScript
End Operator
Public Shared Operator &(token As IRToken, script As String) As String
Return token.RScript & script
End Operator
Public Shared Operator &(script As String, token As IRToken) As String
Return script & token.RScript
End Operator
End Class
The Builder API
Building an R API from a class object is based on the System.Reflection
methods, so that two basically parameter of the API builder is required:
<Extension>
Public Function GetScript(token As Object, Optional type As Type = Nothing) As String
If token Is Nothing Then
Throw New NullReferenceException("Script tokens is nothing!")
End If
If type Is Nothing Then
type = token.GetType
End If
Return __getScript(token, type)
End Function
First, the token parameter provides the R
function object instance, which is the class object we define on the previous section, the R
function Entry Point.
Then, if we want to using the reflection operations, then a reflection source will be required, and this source comes from the type parameter, which we can achieve the property information and the class information.
Getting the API name just needs to achieve the custom attribute of RFunc
that we defined on the class definition:
<Extension> Public Function GetAPIName(type As Type) As String
Dim name As RFunc = type.GetAttribute(Of RFunc)
If name Is Nothing Then
Dim ex As New Exception(IsNotAFunc)
ex = New Exception(type.FullName, ex)
Throw ex
Else
Return name.Name
End If
End Function
Since all of the R function parameters are in the form of class property, the next step of the builder is to achieve all of the can read properties
And furthermore, if we want to mask property from the builder, we should skip all of the properties which we have defined an ignored attribute, so that a Linq expression can be used for this job:
Dim props = (From prop As PropertyInfo In type.GetProperties
Where prop.GetAttribute(Of Ignored) Is Nothing AndAlso
prop.CanRead
Let param As Parameter = prop.GetAttribute(Of Parameter)
Select prop,
func = prop.__getName(param),
param.__isOptional,
param
Order By __isOptional Ascending)
IMPORTANT Note on the Data Type
There are some data type that need to be paid attention to.
1. Bool Logical Value
The Boolean logical type in the R language is the all up case word TRUE, FALSE or T, F, and the R language is not like VisualBasic language, the R language is case sensitive, so that we should make a map between the logical value.
Public Structure RBoolean : Implements IScriptProvider
Public Shared ReadOnly Property [TRUE] As New RBoolean(RScripts.TRUE)
Public Shared ReadOnly Property [FALSE] As New RBoolean(RScripts.FALSE)
ReadOnly __value As String
Sub New(value As String)
__value = value
End Sub
Public Function RScript() As String Implements IScriptProvider.RScript
Return __value
End Function
End Structure
2. String Value Type
For example, a string
value in VisualBasic is:
Dim s As String = "abc"
When we write this variable to a text file, then the content just have abc, two quote character has gone. So that this situation will be the same when we write an R script:
The function in the R script required a string
value, and it has two quote character wrap the string
, but when we write the script, those two quote characters disappear as well, so that before write the script, a processing on the string
type is required:
Public Function Rstring(s As String) As String
Return $"""{s}"""
End Function
3. Expression as Parameter
The R expression we can directly use a string
represents.
4. String as File Path
Due to the reason of character \ is the escape character in the C/C++ language, so that the \ character in a file path will caused error in the R language, an easy method of dealing with this situation is to replace all of the \ characters to /.
<Extension>
Public Function UnixPath(file As String, Optional extendsFull As Boolean = False) As String
If String.IsNullOrEmpty(file) Then
Return ""
End If
If extendsFull Then
file = FileIO.FileSystem.GetFileInfo(file).FullName
End If
Return file.Replace("\"c, "/"c)
End Function
Finally, we can write a function to makes the additional processing on the API builder of the different data type:
<Extension>
Private Function __getValue(type As Type, value As Object, valueType As ValueTypes) As String
If value Is Nothing Then
Return Nothing
End If
Select Case type
Case GetType(String)
If valueType = ValueTypes.Path Then
Return Rstring(Scripting.ToString(value).UnixPath)
Else
Return Rstring(Scripting.ToString(value))
End If
Case GetType(Boolean)
If True = DirectCast(value, Boolean) Then
Return RBoolean.TRUE.__value
Else
Return RBoolean.FALSE.__value
End If
Case GetType(RExpression)
Return DirectCast(value, RExpression).RScript
Case Else
Return Scripting.ToString(value)
End Select
End Function
Example: Drawing heatmap in VisualBasic
Drawing a heatmap by using R language, an example can be found at http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/.
So that based on this example, we can create an R API wrapper:
Imports System.Text
Imports System.IO
Imports Microsoft.VisualBasic.DocumentFormat.Csv.DocumentStream.Tokenizer
Imports Microsoft.VisualBasic.Linq
Imports Microsoft.VisualBasic
Imports RDotNet.Extensions.VisualBasic
Imports RDotNet.Extensions.VisualBasic.utils.read.table
Imports RDotNet.Extensions.VisualBasic.stats
Imports RDotNet.Extensions.VisualBasic.Graphics
Imports RDotNet.Extensions.VisualBasic.grDevices
Public Class Heatmap : Inherits IRScript
Const df As String = "df"
Public Property rowNameMaps As String
Public Property dataset As readcsv
Public Property heatmap As heatmap_plot
Public Property image As grDevice
Sub New()
Requires = {"RColorBrewer"}
End Sub
Protected Overrides Function __R_script() As String
Dim script As StringBuilder = New StringBuilder()
Call script.AppendLine($"{df} <- " & dataset)
Call script.AppendLine($"row.names({df}) <- {df}${__getRowNames()}")
Call script.AppendLine($"{df}<-{df}[,-1]")
Call script.AppendLine("df <- data.matrix(df)")
heatmap.x = df
If Not heatmap.Requires Is Nothing Then
For Each ns As String In heatmap.Requires
Call script.AppendLine(RScripts.library(ns))
Next
End If
Call script.AppendLine(image.Plot("result <- " & heatmap))
Return script.ToString
End Function
End Class
By using this heatmap
API, required of three parameters:
1. Define the heatmap data source, which its data source is comes from read a csv file
Property dataset As readcsv
By read data from a location, just construct the object instance of the read.csv API class, like:
dataset = New readcsv("http://datasets.flowingdata.com/ppg2008.csv")<o:p></o:p>
2. Define the heatmap drawing method, which it available API can be found at namespace gplots or stats
Property heatmap As heatmap_plot
3. Define the heatmap image saved location
Property image As grDevice
In addition, a set of image file format API have been defined in namespace RDotNET.Extensions.VisualBasic.grDevices
.
Like: grDevices .bmp, grDevices .jpeg, grDevices .png, grDevices .tiff
By using this R API, just simply construct an object instance like:
Dim image As grDevice = New tiff("imagefile.tiff", 8000, 6500)<o:p></o:p>
You can download this example from github:
Go ahead and try it!