Content
- Introduction
- Expression Trees
- Dive Deeply into Implementation
- Generic T[] Binary Operator
- Time Tests
- Vector<T> Class Implementation
- Code Use and Limitations
- Future Development
- History
- References
Using generics in Vector like class implementation is neverending story. This article depictes another way for implementation.
Main problem with generic Vector implementation is that there does not exists possibility to directly implement something like
class Vector<T>
{
public static Vector<T> operator +(Vector<T> a, Vector<T> b)
{
int cnt = a.Lenght;
Vector<T> result = new Vector(new T[cnt]);
for(int i = 0; i < cnt; i++)
result[i] = a[i] + b[i];
return result;
}
}
the compiler will throw an error at line
result[i] = a[i] + b[i];
So if you still want to implement it you have to solve this problem and avoid by some way the line code above.
One way which can help in this problem is use of lambda expression as implementation if operators are very simillar to lambda expressions. Let consider line code
Func<double, double, double> add = (a, b) => (a + b);
it behaves exactly as we will expect fot the + operator over double type. But if generic version will be typed compiler will throw an error, even if expression Func<T, T, T> add;
is for compiler OK.
Func<T, T, T> add = (a, b) => (a + b);
So the question is how to fill variable Func<T, T, T> add;
with proper generic form?
Expression trees represents set of classes which are useable for building structures which can be compiled into delegates. Some articles describing the process, structures, classes etc. can be found here:
Going directly to the goal of this article let show very simple expression tree use
ParameterExpression ap = Expression.Parameter(typeof(double), "a");
ParameterExpression bp = Expression.Parameter(typeof(double), "b");
Expression operationResult = Expression.Add(ap, bp);
Expression<Func<double, double, double>> lambda = Expression.Lambda<Func<double, double, double>>(operationResult, ap, bp);
Func<double, double, double> add = lambda.Compile();
this code will create Func<double, double, double> add
delegate with same functionallity as
Func<double, double, double> add = (a, b) => (a + b);
So the non generic form is implemented. What about generic form? If the double
keyword will be replaced in implementation it will still works? (header and footer were added to define full method)
Func<T, T, T> CreateAdd()
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = Expression.Add(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
Func<T, T, T> add = lambda.Compile();
return add;
}
This implementation is nearly same as
Func<T, T, T> add = (a, b) => (a + b);
The difference is that CreateAdd
method is OK for compiler and the lambda.Compile()
expression will be called at runtime which can throw an error if operator + is not defined for type T. Second implementation is compiled at compile time and the generic avalability of operator + is tested (and failed).
Method CreateAdd
can be generalized for all operators so only one implementation is enought for any kind of binary operator.
public static class FuncGenerator<T>
{
public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = f(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
return lambda.Compile();
}
}
Now it is easy to call method FuncGenerator<T>.ExpressionToFunc()
for creating appropriate Func delegate. But still the user must be familliar with expression trees.
Func<double, double, double> add = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double, double, double> sub = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Subtract(a, b));
Now we have generic implementation of binary operators and the rest is quite easy. Or not?
Bellow is implemented generic operator implementation for array binary operations (see method ExpressionToFuncArray
). This implementation use for loop and evaluation of Func method returned by FuncGenerator<T>.ExpressionToFunc()
.
public static class FuncGenerator<T>
{
public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
ParameterExpression ap = Expression.Parameter(typeof(T), "a");
ParameterExpression bp = Expression.Parameter(typeof(T), "b");
Expression operationResult = f(ap, bp);
Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
return lambda.Compile();
}
public static Func<T[], T[], T[]> ExpressionToFuncArray(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
{
Func<T, T, T> op = ExpressionToFunc(f);
return (a, b) =>
{
int len = a.Length;
T[] result = new T[len];
for (int i = 0; i < len; i++)
result[i] = op(ap[i], bp[i]);
return result;
};
}
}
Focus now on line
result[i] = op(ap[i], bp[i]);
op
delegate is called n times per one array operation which is really time consuming. So we have generic implementation but as can be tested it is very slow implementation. Back to expression trees and hope that it give us some solution.
public static Func<double[], double[], double[]> ArrayExpressionToFunc()
{
return (a, b) =>
{
int len = a.Length;
double[] result = new double[len];
for (int i = 0; i < len; i++)
result[i] = ap[i] + bp[i];
return result;
};
}
Above is depicted nongeneric implementation for double
type. This implementation is faster than generic implementation introduced earlier. Now lets try to construct expression tree which behaves exactly as this nongeneric implementation.
private static Func<double[], double[], double[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
{
ParameterExpression apA = Expression.Parameter(typeof(double[]), "a");
ParameterExpression bpA = Expression.Parameter(typeof(double[]), "b");
ParameterExpression operationResult = Expression.Parameter(typeof(double[]), "c");
ParameterExpression iA = Expression.Parameter(typeof(int), "i");
LabelTarget labelReturn = Expression.Label(typeof(double[]));
Expression innerBlock = Expression.Block(
Expression.SubtractAssign(iA, Expression.Constant(1)),
Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
Expression.Return(labelReturn, operationResult)),
Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
);
Expression<Func<int, double[]>> newTA = (i) => new double[i];
Expression addeA = Expression.Block(
new[] { iA, operationResult },
Expression.Assign(iA, Expression.ArrayLength(apA)),
Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
Expression.Loop(innerBlock, labelReturn)
);
Expression<Func<double[], double[], double[]>> lambdaA = Expression.Lambda<Func<double[], double[], double[]>>(addeA, apA, bpA);
return lambdaA.Compile();
}
This implementation returns delegate which can be used for binary operators application on double arrays. Possible use is:
Func<double[], double[], double[]> addArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double[], double[], double[]> subArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));
Now the last step is needed for generic implementation. Just replace double
keyword at appropriate places with T
keyword.
private static Func<T[], T[], T[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
{
ParameterExpression apA = Expression.Parameter(typeof(T[]), "a");
ParameterExpression bpA = Expression.Parameter(typeof(T[]), "b");
ParameterExpression operationResult = Expression.Parameter(typeof(T[]), "c");
ParameterExpression iA = Expression.Parameter(typeof(int), "i");
LabelTarget labelReturn = Expression.Label(typeof(T[]));
Expression innerBlock = Expression.Block(
Expression.SubtractAssign(iA, Expression.Constant(1)),
Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
Expression.Return(labelReturn, operationResult)),
Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
);
Expression<Func<int, T[]>> newTA = (i) => new T[i];
Expression addeA = Expression.Block(
new[] { iA, operationResult },
Expression.Assign(iA, Expression.ArrayLength(apA)),
Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
Expression.Loop(innerBlock, labelReturn)
);
Expression<Func<T[], T[], T[]>> lambdaA = Expression.Lambda<Func<T[], T[], T[]>>(addeA, apA, bpA);
return lambdaA.Compile();
}
The code can be used for definition array operators:
Func<int[], int[], int[]> addArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<int[], int[], int[]> subArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));
For testing and comparison of implementation above set of tests was implemented.
- Simple implementation (full compile time specification expected to be faster implementation)
- Func<double, double, double> in for loop (use of delegate for + operator of type T, expected to be slow)
- Func<double[], double[], double[]> created at compile time (use of delegate for + operator of type T[], expected to be fast)
- Func<double[], double[], double[]> created at runtime (use of expression trees to get operator + of type T[], hope to be fast enought)
Code for all tests is bellow:
int lastTick = 0;
private void Msg(string msg)
{
int thisTick = Environment.TickCount;
int delta = thisTick - lastTick;
textBox1.AppendText(delta.ToString() + "ms\t" + msg + "\r\n");
lastTick = thisTick;
}
private void TimeTests()
{
double[] a = new double[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
double[] b = new double[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
double[] r = new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
Func<double, double, double> Add = (ap, bp) => (ap + bp);
Func<double[], double[], double[]> AddA = (ap, bp) =>
{
int len = ap.Length;
double[] result = new double[len];
for (int i = 0; i < len; i++)
result[i] = ap[i] + bp[i];
return result;
};
Func<double[], double[], double[]> AddA2 = FuncGenerator<double>.CreateArrayOperatorFuncAdd();
Msg("beginLoops");
int cnt = 1000000;
for (int i = 0; i < cnt; i++)
{
r = new double[a.Length];
for (int j = 0; j < a.Length; j++)
r[j] = a[j] + b[j];
}
Msg("Simple implementation");
for (int i = 0; i < cnt; i++)
for (int j = 0; j < a.Length; j++)
r[j] = Add(a[j], b[j]);
Msg("Func<double, double, double> in for loop");
for (int i = 0; i < cnt; i++)
r = AddA(a, b);
Msg("Func<double[], double[], double[]> created at compile time");
for (int i = 0; i < cnt; i++)
r = AddA2(a, b);
Msg("Func<double[], double[], double[]> created at runtime");
}
Result of this test:
0ms beginLoops
109ms Simple implementation
282ms Func<double, double, double> in for loop
109ms Func<double[], double[], double[]> without loop
109ms Func<double[], double[], double[]> created runtime
"Simple implementation" is fastest and it is on par with "created runtime". "Func in for loop" is slowest. This prove that goal was reached and this implementation is very good. Interesting is the difference between slowest and fastest implementation as this difference is spend on repeated calls of Func<double, double, double>
delegate.
Please take into account that during test garbage collection sometimes go to collect memory so test results can vary but still the introduced implementation is on par with "Simple implementation".
All partial jobs were done and now lets introduce template for full implementation. Full implementation is not goal so just main parts are depicted.
class Vector<T>
{
private static Func<T, T, T> Add = FuncGenerator<T>.CreateOperatorFuncAdd();
private static Func<T[], T[], T[]> AddT = FuncGenerator<T>.CreateArrayOperatorFuncAdd();
private T[] values;
public Vector(int dim)
{
this.values = new T[dim];
}
public Vector(IEnumerable<T> initialValues)
{
values = initialValues.ToArray();
}
public static Vector<T> operator +(Vector<T> a, Vector<T> b)
{
return AddT(a.values, b.values);
}
public static implicit operator Vector<T>(T[] a)
{
return new Vector<T>(a);
}
public override string ToString()
{
string result = "";
result = values.Select(t => t.ToString()).Aggregate((a, b) => a + ";" + b);
return "[" + result + "]";
}
public static T[] Parse(string value, Func<string, T> parse)
{
if (value[0] != '[' | value[value.Length - 1] != ']')
throw new FormatException(string.Format("{0} is not valid format for type {1}", value, typeof(T[]).ToString()));
string tmpStr = value.Substring(1, value.Length - 2).Trim();
string[] items = tmpStr.Split(new char[] { ';' });
var values = items.Select(s => parse(s.Trim()));
return values.ToArray();
}
}
Basic form of generic Vector<T> was introduced and now lets show possible way to use. Firstly there are set of numerical types which works with this implementation very well.
string aValue = "[0; 1; 2; 3; 4; 5; 6; 7; 8; 9]";
string bValue = "[1; 1; 1; 1; 1; 1; 1; 1; 1; 1]";
private void TypeTest<T>(Func<string, T> parser)
{
Msg("Test for " + typeof(T).ToString());
Vector<T> a = Vector<T>.Parse(aValue, parser);
Vector<T> b = Vector<T>.Parse(bValue, parser);
Vector<T> result = a + b;
Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
}
private void ImplementationTest()
{
TypeTest<int>(s => int.Parse(s));
TypeTest<uint>(s => uint.Parse(s));
TypeTest<short>(s => short.Parse(s));
TypeTest<ushort>(s => ushort.Parse(s));
TypeTest<long>(s => long.Parse(s));
TypeTest<ulong>(s => ulong.Parse(s));
TypeTest<double>(s => double.Parse(s));
TypeTest<float>(s => float.Parse(s));
TypeTest<decimal>(s => decimal.Parse(s));
}
Result of this test:
0ms Test for System.Int32
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms Test for System.UInt32
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms Test for System.Int16
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms Test for System.UInt16
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms Test for System.Int64
16ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms Test for System.UInt64
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms Test for System.Double
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms Test for System.Single
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms Test for System.Decimal
0ms c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
Unfortunately there are types which cannot be used by this way even if someone can suppose that they will work. Presented implementation does not work for byte
and sbyte
types.
Test on user definded types is last test which have to be run with this implementation. For this test UserNumericType
(with + operator overloaded) was prepared and hypothesis about structure Vector<Vector<double>>
was set.
private void UserTypeTest()
{
Vector<UserNumericType> a = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(10, 10) };
Vector<UserNumericType> b = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(20, 20) };
Vector<UserNumericType> result = a + b;
Msg("Test for UserNumericType");
Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
Vector<Vector<double>> avvd = new Vector<double>[] { new double[] { 0, 1, 2 }, new double[] { 10, 10, 10 } };
Vector<Vector<double>> bvvd = new Vector<double>[] { new double[] { 1, 1, 1 }, new double[] { 21, 21, 21 } };
Vector<Vector<double>> resultvvd = avvd + bvvd;
Msg("Test for Vector<Vector<double>> type");
Msg(string.Format("c = a + b; => {0} = {1} + {2}", resultvvd.ToString(), avvd.ToString(), bvvd.ToString()));
}
Results for this test:
0ms Test for UserNumericType
16ms c = a + b; => [(0; 2);(30; 30)] = [(0; 1);(10; 10)] + [(0; 1);(20; 20)]
15ms Test for Vector<Vector<double>> type
0ms c = a + b; => [[1;2;3];[31;31;31]] = [[0;1;2];[10;10;10]] + [[1;1;1];[21;21;21]]
Many times was in article mentioned that goal is not implement perfect class Vector<T> so it is not done. Just some principles which create solid basement for future development was set. If someone want it is possible to find many implementations of some kind of vectors over internet with friendly licence and reform them to principles presented in this article.
2014-07-21 First implementation after few days of investigation.