The R# language is born in the idea of bring the vectorization programming language feature on the .NET platform. There are some vectorization programming languages like MATLAB language, S language and R language, all of them exist as the language prototype candidates for designing my new language. After the language feature study and doing some background investigation work, the R language was choosen as the new vectorization programming language prototype on .NET platform, so this new vectorization programming language is named R# as this new language is a kind of dialect language which is derived from the R language.
Introduction
With many years of doing scientific computing works by VB.NET language, I'm curious if there's a way to scripting my VB.NET library. After learning the R language in my school college study, I was wondering if I could combine the R language, its vectorized programming feature with my VB.NET library natively. So, this idea bring the R#
language.
The R# language is born in the idea of bringing the vectorization programming language feature on the .NET platform. There are some vectorization programming languages like MATLAB language, S language and R language, all of them exist as the language prototype candidates for designing my new language. After the language feature study and doing some background investigation work, the R language was chosen as the new vectorization programming language prototype on .NET platform, so this new vectorization programming language is named R#
as this new language is a kind of dialect language which is derived from the R language.
Here are some resource links that may be useful for learning R/R# language if you are interested in the R# language:
Design of the R# Interpreter
How it works?
The R# language is a kind of interpreted programming language currently, and its interpreter consists of four modules:
Interpreter
: contains the R# language interpreter source code, all of the expression class model definition. Language
: contains the necessary code for parsing the language tokens from the input script text and the syntax parser for creating the corresponding R# expression object based on the language token sequence and the language context. Runtime
: contains the necessary code for imports the external .NET function and the runtime environment definition for running the R# expression evaluation. This folder also contains some primitive R# function for manipulating your dataset, example as lapply, sapply, list, which, etc. System
: contains the code for the runtime configuration and the third part package loader and tools for building your own R# package.
By combining the code in these 4 modules, we can create a workflow to run the R# script, interop R# script with the existed function in our .NET library and evaluate the R# expression to produce .NET object.
Workflow: Run R# Code
Here is a workflow figure that can be used for illustrating how to run the R# code input:
- R# environment initialization: At the very beginning of the R# system initialization, the code modules of the R# system will be called for:
- load configuration file,
- initialize the global environment,
- hook all of the .NET API function which is inside of the R# base package,
- then load startup packages and initialize the runtime environment.
Finally, the R# is ready for running our script code.
- The input script text then will be parsed as the R# language tokens by the scanner object which is defined in the language namespace. The language token sequence is output from the scanner its
char
walking operation. The order of the language tokens in the generated token sequence is the syntax context information for creating the syntax tree by the syntax analysis module in R# interpreter. And after building the syntax tree model from the token sequence, the script text is parsed as a R# program: a collection of the expression models. - The expression model of R# language is the very fundamental model for producing the result value based on a given evaluation context, so we can abstract the R# expression model as a base class object:
Namespace Interpreter.ExecuteEngine
Public MustInherit Class Expression
Public MustOverride Function Evaluate(envir As Environment) As Object
End Class
End Namespace
Code Demo in VisualBasic
The R#
language interpreter is written in VB.NET language originally, so the R# language is fully compatible with the .NET runtime. which means you can embed the R# environment into your .NET application, this will give the ability to script your .NET library. Here is a full example code about running a R# script file in a VB.NET application on github: "RunRScriptFile".
First, we should have a runtime configuration file for running the initialization workflow for the R# language interpreter runtime. the runtime configuration file is an XML file and it can be generated automatically if it is missing from the given file location:
Dim R As RInterpreter = RInterpreter.FromEnvironmentConfiguration(
configs:="/path/to/config.xml"
)
If some external 3rd part R# library DLL file is not located in the application directory or library folder, then you should set the DLL directory folder path via config of the runtime by:
If Not SetDllDirectory.StringEmpty Then
Call R.globalEnvir.options.setOption("SetDllDirectory", SetDllDirectory)
End If
Load some startup packages before running the given R# script file:
For Each pkgName As String In startupsLoading
Call R.LoadLibrary(
packageName:=pkgName,
silent:=silent,
ignoreMissingStartupPackages:=ignoreMissingStartupPackages
)
Next
Finally, we can run the script code via the Source
function which is exported from the R# interpreter:
result = R.Source(filepath)
if you just want to evaluate the script text, not expected run code from a text file, then you can try the Evaluate
function which is exported from the R# interpreter engine:
Call R.Evaluate("
# test script
let word as string = ['world', 'R# user', 'GCModeller user'];
let echo as function(words) {
print( `Hello ${ words }!` );
}
echo(word);
")
Comparison between R# and LINQ
As we mentioned above, the R# language is a kind of the vectorization programming language. So a lot of operation in R# programming is vectorized, which means we can do many times of the same operation in just one expression.
Although the LINQ language features in .NET platform provides some vectorization programming liked language feature for all .NET language, but it is still a bit of inconvenient when compares with the R/R# language. Here are some examples:
1. Arithmetic
Here, we can do some simple math like addition, subtraction, multiplication and division via LINQ:
{1,2,3,4,5}.Select(Function(xi) xi + 5).ToArray
and do the exact same math operation in R# language will be more simple:
[1, 2, 3, 4, 5] + 5;
# [1] 6 7 8 9 10
Here are the operators that are supported in the R# environment:
operator | description | example | compares VB |
+ | addition | a + b | a + b |
- | subtraction | a - b | a - b |
* | multiplication | a * b | a * b |
/ | division | a / b | a / b |
\ | integer division | a \ b | a \ b |
% | mod | a % b | a Mod b |
! | not | !a | Not a |
== | equals | a == b | a = b |
!= | not equals | a != b | a <> b |
&& | and | a && b | a AndAlso b |
|| | or | a || b | a OrElse b |
like | string pattern matched | a like $"\d+" | a Like "*.jpg" |
in | contains | a in b | b.ContainsKey(a) |
2. Math Function
Using the math function is also super elegant and simple when the R# language is compared with the .NET LINQ:
log10([10, 100, 1000, 10000, 100000]);
.NET LINQ:
{10, 100, 1000, 10000, 100000}.Select(AddressOf Math.Log10).ToArray()
3. LINQ Function
Although most of the R# script code can be Vectorized, but when we deal with a collection of complex composed dataset in R# script, some loop liked operation is still needed. Although there is the for
loop or while
loop in R# language, but this loop code in R# programming is not recommended most of the time. Like the original R language, the apply
family function can be used for such purpose.
sapply
or lapply
function in R# language is a kind of LINQ liked function that could be used for the purpose of dealing with the complex data collection.
sapply
means sequence apply, which can be Equivalent to the Select
function in LINQ. The sapply
function accepts collection data in R# language and then produces a new vector data. lapply
means list apply, which can be Equivalent to the ToDictionary
function in LINQ. The lapply
function is working as the sapply
function, accepts collection data in R# language but produces a new named key-value paired list data.
Here is an example about the usage of sapply
and lapply
function in R# language and the corresponding comparison code in LINQ:
[1,2,3,4,5] |> sapply(xi -> xi + 5);
# [1] 6 7 8 9 10
{1,2,3,4,5}.Select(Function(xi) xi + 5).ToArray()
Then, if your want to filter out some un-wanted data in your input data collection, you can apply the Where
function in .NET LINQ. And as the same as the LINQ it does, the R# language also has a data filter in a data processing pipeline. The LINQ function Where
conditional filter is equivalent to the R# function named which, here is an example:
' filter data in .NET LINQ by Where
{1,2,3,4,5}
.Where(Function(x) x > 3)
.ToArray()
# filter data in R# language by which
[1,2,3,4,5]
|> which(x -> x > 3)
;
# another conditional filter syntax in original R language style
x = [1,2,3,4,5];
x[which(x > 3)];
# more simple way:
x[x > 3];
Comparison between R# and VisualBasic
Besides the Vectorization programming feature in R# language is the biggest difference when it compares with the VisualBasic.NET language, there are a lot of other language features that can distinct the R# language and the VisualBasic.NET language.
1. Declare New Function
The function is the basic module in our program, we can build a complex application by the combination of the functions by some logic. With the functions, we can re-use our code, make our program modular and standardized. Declaring a new function in R# language can be very flexible.
As the documentation writes, the R function is also kind of data type in R language. So we can create a R# function in VisualBasic symbol declaration style, example like:
# formal style
const add5 as function(xi) {
return(xi + 5);
}
# or replace the as with equal sign
# this will makes the R# code more typescript style:
const add5 = function(xi) {
return(xi + 5);
}
in the formal style of a R# function declaration, the symbol name is the function name, the as part expression shows that the type of target symbol that we declared is a function, and the function closure body is the symbol data instance value.
May be the formal style contains a lot of words to write our R# code, so you also can write a R# function in lambda style:
# syntax sugar borrowed from julia language
const f(x) = x + 5;
# syntax sugar from the original R language
const add5 = function(xi) xi + 5;
Please notice that: all of the R# function that we declared in our script is Vectorized, so we don't need the extra for
loop or while
loop in our function most of the time:
const f(x) = x + 5;
f([1,2,3,4,5]);
# [1] 6 7 8 9 10
2. Lambda Function & Functional Programming
The R# language is also a kind of functional programming language, so using the function as the parameter value of another function in R# is also very easy. By the same example of the sapply
function that we learned above, we can demonstrate how we do the functional programming in R# language:
const add5 = function(xi) {
return(xi + 5);
}
sapply([1,2,3,4,5], add5);
sapply([1,2,3,4,5], function(x) {
x + 5;
});
Maybe, it is still too many words to write that shows in the above demo code. so, the lambda
function is introduced into R# language, to make the code of functional programming in R# more simple:
sapply([1,2,3,4,5], x -> x + 5);
3. Pipeline Compares the Extension Function
There is a great language programming feature in .NET, which is called extension method: by tag the target static function with ExtensionAttribute
in VisualBasic.NET language, we can make the target function call to a style of object instance method liked. With the extension method, we can chain our function calls in .NET and build a data pipeline.
A pipeline operator is introduced into R# language when compared to the original R language. The pipeline operator will make all of the R# function to be called in a pipelined way naturally. Example as:
const add5 = function(x) {
return(x + 5);
}
[1,2,3,4,5]
|> add5()
# we even can pipeline the anonymous function
# in R# language
|> (function(x) {
return(x ^ 2);
})
;
4. Expression Based and Statement Based
The VisualBasic language is a kind of statement based language, which it means most of the VisualBasic code does not produce value to us unless the VB statement expression is a function invoked. Unlike the VisualBasic language, the R# programming language is expression based, which means all of the R# code can produce value. Here is an example that it is clearly enough to show the difference between the two languages:
Dim x As Double
If test1 Then
x = 1
Else
x = -1
End If
As you can see, in the code that is shown above, due to the reason VB code is statement based, If block cannot produce value, so we need to assign the value of variable x
in two statements. In different, the R# language is expression based, so we can get the result value from such if branch code directly:
const x as double = {
if (test1) {
1;
} else {
-1;
}
}
Dataset in R# Language
There are four primitive data types in R# language, and all of the primitive types in R# language is a kind of atomic vector:
R# Primitive | VisualBasic.NET | Note |
num | Single, Double | Single will be convert to Double |
int | Short, Integer, Long | Short, Integer will be convert to Long |
raw | Byte | value in range [0,255] |
chr | Char, String | The Char and String comes from VisualBasic.NET is unify as character in R# runtime, and the Char is a kind of special string : its nchar value equals to 1 |
logi | Boolean | except TRUE and FALSE , the literal of logical value in R# also can be true, false, yes, no |
any | Object | Any kind of .NET object in R# language is also a faked primitive type |
Based on these primitive types, then we can compose a more complex data type in R# language:
Key-value Paired List
The list type in R# language is kind of a Structure
liked data type in VisualBasic. The list type is very flexible: you can store any kind of the data in the value slot, but the key name in a list must be character type. You can create a list via list function, example as:
list(a = 1, b = 2, flag = [TRUE, FALSE], c = "Hello world!")
# List of 4
# $ a : int 1
# $ b : int 2
# $ flag : logical [1:2] TRUE FALSE
# $ c : chr "Hello world!"
Instead of the list function, a more syntax sugar liked language feature was introduced to the R# language: the JSON literal:
# json literal in R# language will also produce a list object
{
a: 1,
b: 2,
flag: [TRUE, FALSE],
c: "Hello world!"
}
# List of 4
# $ a : int 1
# $ b : int 2
# $ flag : logical [1:2] TRUE FALSE
# $ c : chr "Hello world!"
For reference, a slot value in a R# key-value paired list, we can used the $
operator if we know the name, and use the [[xxx
]] indexer syntax if we don't know the slot name. Example as:
const x = list(a = 1, b = 2, flag = [TRUE, FALSE], c = "Hello world!");
# TRUE, FALSE
x$flag
for(name in names(x)) {
# the code we demonstrate at here is kind of
# reflection liked code in .NET
print(x[[name]]);
}
dataframe
The dataframe
type in R# language is kind of 2D table. Each column in the R# dataframe
is a kind of atomic vector data. You can treat the dataframe
in R# language as a kind of special key-value paired list object. The data type between the columns in a dataframe
could be variational.
Create a dataframe
object can be done via the data.frame
function:
data.frame(a = 1, b = 2, c = "Hello world!", flag = [TRUE, FALSE]);
# a b c flag
# ----------------------------------------------------
# <mode> <integer> <integer> <string> <boolean>
# [1, ] 1 2 "Hello world!" TRUE
# [2, ] 1 2 "Hello world!" FALSE
or dataframe
can be cast from a list data object via the as.data.frame
function:
as.data.frame(list(a = 1, b = 2, c = "Hello world!", flag = [TRUE, FALSE]));
# a b c flag
# ----------------------------------------------------
# <mode> <integer> <integer> <string> <boolean>
# [1, ] 1 2 "Hello world!" TRUE
# [2, ] 1 2 "Hello world!" FALSE
The difference between the key-value list and the dataframe
object is that: the value in a list could be any kind of the data, by the value in a dataframe
should be an atomic vector. And there is a more obvious difference about the vector data between the list and dataframe
is the vector size: all of the vector size in a list can be variational, but the vector size in each column of the dataframe
should be in size of 1 element or n elements where the n elements must equal to the number or rows of the dataframe. Here is an error example about the create a dataframe in different vector size:
data.frame(a = 1, b = [1,2,3], f = [TRUE, FALSE]);
# Error in <globalEnvironment> -> data.frame
# 1. arguments imply differing number of rows
# 2. a: 1
# 3. b: 3
# 4. f: 2
#
# R# source: Call "data.frame"("a" <- 1, "b" <- [1, 2, 3], "f" <- [True, False])
#
# base.R#_interop::.data.frame at REnv.dll:line <unknown>
# SMRUCC/R#.global.<globalEnvironment> at <globalEnvironment>:line n/a
Based on the atomic vector, list, and dataframe data types, we have enough components to create a R# script to solve a specific scientific problem.
Visit Any .NET Object in R#
Besides the R# vector, list and dataframe, there is another kind of data type in R# language: the native .NET object. Yes, we can interop the R# code with .NET code directly. For visiting the data property of a given .NET object instance, the .NET object property reference syntax in PowerShell language is introduced to the R# language, example like there is a Class definition in VisualBasic:
Class metadata
Public Property name As String
Public Property features As Double()
End Class
Then, we could read the name property value from the class object that we show above:
# this syntax just works for get property
# set property value is not yet supported.
x = new metadata(name = "My Name", features = [1,2,3,4,5]);
[x]::name;
# if the property value is an array of the
# primitive type in R# language, then it will
# be treated as a atomic vector!
[x]::features + 5;
# [1] 6 7 8 9 10
Magic!
Data Visualization in R# Language
Except the purpose of creating R# language to make our .NET library scriptable, one of the other purposes of creating the R# language is we can inspect our data in a simple way. For inspecting our dataset
, we can use the str
or print
function in R# language. And more exciting, we can plot our data directly in R# environment, for inspecting data in a visual way.
Before learning the charting plot in R#, we should learn how to save the graphics image in R# language. There are two kinds of graphics driver in R# environment currently:
bitmap
function for raster image wmf
function for creating window metadata image svg
function for vector image pdf
function for use the pdf file as graphics canvas (not working well currently)
Like the original R language does, we should create a graphics device before any data plot, and then write code to plot data. After graphics drawing by code, we should use the dev.off()
function to close the graphics device driver and flush all of the data into target file which is opened by the bitmap or svg graphics driver function.
We can do the graphics plot to a given image file in such R# code pattern, usually:
# for vector image, just simply change the bitmap function to svg function
# svg(file = "/path/to/image.svg");
bimap(file = "/path/to/image.png");
# code for charting plot
plot(...);
dev.off();
Now we have already known how to create image file in R# language, then we are going to learn how to plot our data in R# environment. There are some primitive charting plot that has already been defined in the R# base environment, which you can use directly in the R# scripting without installing any other third part libraries. Example as scatter plot:
# read scatter point data from a given table file
# and then assign to tuple variables
[x, y, cluster] = read.csv("./scatter.csv", row.names = NULL);
# umap scatter with class colors
bitmap(file = "./scatter.png") {
plot(x, y,
padding = "padding:200px 400px 200px 250px;",
class = cluster,
title = "UMAP 2D Scatter",
x.lab = "dimension 1",
y.lab = "dimension 2",
legend.block = 13,
colorSet = "paper",
grid.fill = "transparent",
size = [2600, 1600]
);
};
Plotting your data in R# environment is just very simple, yes, we just plot our data! The primitive data plot function in R# environment makes things simple, but not too flexible: if we want to do more plot style tweaking, we don't have too many parameters to modify out plot. So here, we introduce a graphic charting library which is written for R# environment: the ggplot
package.
ggplot for R#
The ggplot
package is a R language ggplot2
package liked grammar of graphics library for R# language programming. The R# language is another scientific computing language which is designed for .NET runtime, R# is evolved from the R language. There is a famous graphics library called ggplot2
in R language, so keeps the same, there is a graphics library called ggplot
that was developed for R# language.
By using the ggplot
package, we can do the data charting in .NET environment in a more convenient and flexible way. Example as stat plots in R# via ggplot
:
ggplot(myeloma, aes(x = "molecular_group", y = "DEPDC1"))
+ geom_boxplot(width = 0.65)
+ geom_jitter(width = 0.3)
# Add horizontal line at base mean
+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype="dash",
line.width = 6, color = "red")
+ ggtitle("DEPDC1 ~ molecular_group")
+ ylab("DEPDC1")
+ xlab("")
+ scale_y_continuous(labels = "F0")
# Add global annova p-value
+ stat_compare_means(method = "anova", label.y = 1600)
# Pairwise comparison against all
+ stat_compare_means(label = "p.signif", method = "t.test",
ref.group = ".all.", hide.ns = TRUE)
+ theme(
axis.text.x = element_text(angle = 45),
plot.title = element_text(family = "Cambria Math", size = 16)
)
;
ggraph for R#
It is not so easy to make network graph data visualization in .NET environment. The ggplot
package for R# also provides a package module that can be used for the network graph data visualization in a simple way, this package is named ggraph
.
As we mentioned above, doing data visualization using the ggplot
package in the .NET environment is super easy and flexible. We just combine the ggraph
and ggplot
, then we can write the elegant code for the network graph data visualization:
ggplot(g, padding = "padding: 50px 300px 50px 50px;")
+ geom_node_convexHull(aes(class = "group"),
alpha = 0,
stroke.width = 0,
spline = 0,
scale = 1.25
)
+ geom_edge_link(color = "black", width = [1,6])
+ geom_node_point(aes(
size = ggraph::map("degree", [12, 50]),
fill = ggraph::map("group", "paper"),
shape = ggraph::map("shape", pathway = "circle", metabolite = "Diamond")
)
)
+ geom_node_text(aes(size = ggraph::map("degree", [4, 9]),
color = "gray"), iteration = -5)
+ layout_springforce(
stiffness = 30000,
repulsion = 100.0,
damping = 0.9,
iterations = 10000,
time_step = 0.0001
)
+ theme(legend.text = element_text(
family = "Bookman Old Style",
size = 4
))
;
The ggplot
and ggraph
R# package is developed inspired by the ggplot2
package for R language, so that many of the function usage can be referenced to the ggplot2
package. Here is the ggplot2
package manual that may be useful for using ggplot
charting function in R# .NET environment.