Arm64 (often referred to as AArch64) provides a power-optimized architecture that is the basis for many systems on a chip (SoC). SoCs integrate CPUs, memory, GPUs, and I/O devices to perform power-efficient computing operations across various industries, applications, and devices. Because of its portability and low power consumption, Arm64 architecture is ideal for mobile devices. However, laptops and desktops are also starting to use Arm64.
Microsoft Windows 11 has helped speed up this adoption by supporting Arm64 and offering several features that simplify app porting. Specifically, Windows 11 provides Windows on Arm (WoA) to run Python applications well via the native Arm64 approach, while Arm64EC (Emulation Compatible) helps port your x64 apps to Arm64 gradually. Also, many frameworks, including Qt, now use Windows on Arm (WoA) to run UI-based applications natively. To help developers, Microsoft has introduced Windows Dev Kit 2023, which provides a convenient Arm64-powered device.
Arm64 offers native advantages when using C++ and Python, but it also offers numerous benefits on other frameworks. For example, .NET is a cross-platform development framework for Windows, Linux, and MacOS. Combining the framework with Arm64 enables you to implement highly efficient apps on several platforms.
This article demonstrates how to use Arm64 to run .NET applications, acquiring advantages of native architecture like power efficiency and a speed gain. You can set up your development environment for .NET and understand the performance boost you can expect by running code natively on Arm64. Download the companion code to follow along.
Environment Setup
To set up your development environment, you need the following:
Begin by installing the .NET 8.0 SDK for Windows for each of two architectures, x64 and Arm64, on your Arm64 device. The SDK is currently available as a preview version (download it here). To confirm successful installation, open a Command Prompt window and enter:
dotnet --info
This produces the following output.
By default, the dotnet
command uses the Arm64 architecture when running on an Arm64 device. However, it recognizes that the x64 architecture is also available in the Other architectures found list.
After installing the .NET SDK, install Visual Studio Code for Windows as your IDE. Select User Installer for Arm64, and then launch the installer. Use the default settings. Once installed, pick your color theme. Finally, install Git for Windows using the 64-bit standalone installer.
Benchmarking .NET Applications
The .NET team provides a set of benchmark tests you can use to evaluate the performance of various .NET versions on different architectures. These benchmarks depend on the BenchmarkDotNet library, which provides a framework to simplify measurements of code execution time.
You can add these measurements to your code using C# attributes. The library evaluates execution time and reports mean computation time and standard deviation. Moreover, the library can generate plots to help you assess code performance. All these features are also available in the .NET performance benchmarks.
To use these benchmarks, start by cloning the dotnet performance
repository:
git clone https://github.com/dotnet/performance.git
Then, navigate to performance\src\benchmarks\micro as shown below.
In performance\src\benchmarks\micro, enter the following command:
dotnet run -c Release -f net8.0
The application builds and launches, showing you the list of available performance benchmarks.
Now, type "475
" and press Enter to launch performance tests for the C# string data type. The result of this operation looks as follows (scroll up to see this table).
By default, the table summarizes the performance test results. You see each performance test’s execution time and statistics (mean, median, minimum, maximum, and standard deviation). This gives you comprehensive information about code performance.
As in the case of the console app, the dotnet run
command uses the Arm64 architecture by default. To launch performance tests using x64, use the -a
switch:
dotnet run -c Release -f net8.0 -a x64
However, at this time, the BenchmarkDotNet library is not compatible with the .NET SDK for x64 on Arm64-based machines. Consequently, BenchmarkDotNet reports errors and incorrect execution times.
To compare .NET performance on x64 and Arm64, use the Console App template and implement your custom benchmarks.
Custom Benchmarks
To implement custom benchmarks, use the System.Diagnostics.Stopwatch
class. Start by creating the console application Arm64.Performance
(dotnet new console -o Arm64.Performance
). Then, open it in Visual Studio. Next create a new file by clicking New File and typing in "PerformanceHelper.cs" as the file name.
Then, open PerformanceHelper.cs, and define the PerformanceHelper
class:
using System.Diagnostics;
namespace Arm64.Performance
{
public static class PerformanceHelper
{
private static readonly Stopwatch stopwatch = new();
public static void MeasurePerformance(Action method, int executionCount, string label)
{
stopwatch.Restart();
for(int i = 0; i < executionCount; i++)
{
method();
}
stopwatch.Stop();
Console.WriteLine($"[{label}]: {stopwatch.ElapsedMilliseconds.ToString("f2")} ms");
}
}
}
The PerformanceHelper
class is static. It has one method, MeasurePerformance
, which works by invoking the provided function using the Action delegate. This delegate passes as the first parameter of the MeasurePerformance
method. You invoke it as many times as specified by the method’s second parameter, executionCount
. After that, the MeasurePerformance
method prints the time needed to execute the specific code. Additionally, MeasurePerformance
accepts a third parameter, label
, which you can use to pass the string describing the performance test.
You define your performance tests by creating a new file, PerformanceTests.cs, where you declare the PerformanceTests
class:
namespace Arm64.Performance
{
public static class PerformanceTests
{
}
}
This class is empty. You will extend it in the next section.
List Sorting
The performance test for list sorting shows how long it takes to sort the list containing 100,000 elements of type double. You can prepare the list using the pseudo-random number generator available in the System.Random
class.
To implement this test, supplement the PerformanceTests
class with the following code:
public static class PerformanceTests
{
private static readonly Random random = new();
private static List<double> PrepareList()
{
const int listSize = 100000;
return Enumerable.Range(0, listSize)
.Select(r => random.NextDouble())
.ToList();
}
public static void ListSorting()
{
var list = PrepareList();
list.Sort();
}
}
There are three new elements:
- Declaration and initialization of the
private static random
member. This is an instance of the pseudo-random number generator. - A
private static PrepareList
method creates a pseudo-random list containing 100,000 elements of type double. To generate this list, use the Enumerate.Range method from System.Linq
. The pseudo-random number generator creates each element of this list. - A public static
ListSorting
method first invokes the PrepareList
helper method to create the random list. Then, the Sort
method sorts this random list.
Matrix Multiplication
Now, you implement square matrix multiplication. First, you extend the definition of the PerformanceTests
class with the GenerateRandomMatrix
method. Add this method to the PerformanceTests.cs file after ListSorting
:
private static double[,] GenerateRandomMatrix()
{
const int matrixSize = 500;
var matrix = new double[matrixSize, matrixSize];
for (int i = 0; i < matrixSize; i++)
{
for (int j = 0; j < matrixSize; j++)
{
matrix[i, j] = random.NextDouble();
}
}
return matrix;
}
This method generates a 500 by 500 square matrix. A double for
loop, each step of which invokes random.NextDouble
, generates a pseudo-randomly generated value of type double.
Next, in the PerformanceTests
class, add the following method:
private static double[,] MatrixMultiplication(double[,] matrix1, double[,] matrix2)
{
if (matrix1.Length != matrix2.Length)
{
throw new ArgumentException("The matrices must be of equal size");
}
if (matrix1.GetLength(0) != matrix1.GetLength(1) || matrix2.GetLength(0) != matrix2.GetLength(1))
{
throw new ArgumentException("The matrices must be square");
}
int matrixSize = matrix2.GetLength(0);
var result = new double[matrixSize, matrixSize];
for (int i = 0; i < matrixSize; i++)
{
for (int j = 0; j < matrixSize; j++)
{
result[i, j] = 0;
for (int k = 0; k < matrixSize; k++)
{
result[i, j] += matrix1[i, k] * matrix2[k, j];
}
}
}
return result;
}
The MatrixMultiplication
method takes two square matrices as an input, and then calculates a product using a mathematical formula. Using three for
loops, the result
variable stores the result of matrix multiplication, which the MatrixMultiplication
method returns.
Finally, in the PerformanceTests
class, you implement the following public method, which generates two square matrices, and then calculates a product:
public static void SquareMatrixMultiplication()
{
var matrix1 = GenerateRandomMatrix();
var matrix2 = GenerateRandomMatrix();
MatrixMultiplication(matrix1, matrix2);
}
String Operations
For the last performance benchmark, use string operations. In the PerformanceTests
class, define the loremIpsum
variable, which stores a fragment of the placeholder text:
private static readonly string loremIpsum = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " +
"Curabitur ut enim dapibus, pharetra lorem ut, accumsan massa. " +
"Morbi et nisi feugiat magna dapibus finibus. Pellentesque habitant morbi " +
"tristique senectus et netus et malesuada fames ac turpis egestas. Proin non luctus lectus, " +
"vel sollicitudin ante. Nullam finibus lobortis venenatis. Nulla sit amet posuere magna, " +
"a suscipit velit. Cras et commodo elit, nec vestibulum libero. " +
"Cras at faucibus ex. Suspendisse ac nulla non massa aliquet sagittis. " +
"Fusce tortor enim, feugiat ultricies ultricies at, viverra et neque. " +
"Praesent dolor mauris, pellentesque euismod pharetra ut, interdum non velit. " +
"Fusce vel nunc nibh. Sed mi tortor, tempor luctus tincidunt et, tristique id enim. " +
"In nec pellentesque orci. Nulla efficitur, orci sit amet volutpat consectetur, " +
"enim risus condimentum ex, ac tincidunt mi ipsum eu orci. Maecenas maximus nec massa in hendrerit.";
Then, implement the StringOperations
public method:
public static void StringOperations()
{
loremIpsum.Split(' ');
loremIpsum.Substring(loremIpsum.LastIndexOf("consectetur"));
loremIpsum.Replace("nec", "orci");
loremIpsum.ToLower();
loremIpsum.ToUpper();
}
This method splits the placeholder text into substrings using the space separator. Then, it takes the substring starting at the last index of the word consectetur
. Next, it replaces nec
with orci
, converts the string to lowercase, and then to uppercase to mimic typical operations on the string variables in your C# apps.
Putting Things Together
You can now use these performance tests in your console application. Modify the Program.cs file by replacing the default content (Console.WriteLine("Hello, World!");
) with the following statements:
using Arm64.Performance;
Console.WriteLine($"Processor architecture: " +
$"{Environment.GetEnvironmentVariable("PROCESSOR_ARCHITECTURE")}");
const int trialCount = 5;
for ( int i = 0; i < trialCount; i++ )
{
Console.WriteLine($"Trial no: {i + 1} / {trialCount}");
PerformanceHelper.MeasurePerformance(PerformanceTests.ListSorting,
executionCount: 500, "List sorting");
PerformanceHelper.MeasurePerformance(PerformanceTests.SquareMatrixMultiplication,
executionCount: 10, "Matrix multiplication");
PerformanceHelper.MeasurePerformance(PerformanceTests.StringOperations,
executionCount: 500000, "String operations");
}
This code imports the Arm64.Performance
namespace, where you defined the PerformanceHelper
and PerformanceTests
classes. Then, the code prints the processor architecture, either Arm64 or AMD64, depending on the architecture of the SDK used to run the app.
You have one constant, trialCount
, which you can use to specify how often the performance test sets execute. Run ListSorting
500 times for each test batch, then perform SquareMatrixMultiplication
10 times and StringOperations
500,000 times. This achieves comparable execution times for the test batches. A single matrix multiplication is slower than a single-string operation. So, there must be more executions of the latter.
.NET Performance Boost on Arm64
You can now launch the app to evaluate its performance on different architectures. Start by running the app using Arm64. In the Arm64.Performance
folder, enter:
dotnet run -c Release
This launches the console app, and you should see the following output.
Now, to compare the execution times, launch the app using the x64 architecture:
dotnet run -c Release -a x64
This command leads to the following output:
The operations all take more time on the emulated x64 than when you execute them natively on Arm64. On average, native execution provides about a 15 percent performance improvement for list sorting, 291 percent for matrix multiplication, and 239 percent for string operations.
The following chart summarizes the mean execution times for x64 and Arm64’s natively executed code.
The graph shows significant improvement of execution times.
Conclusion
This article demonstrated how to use the .NET SDK for cross-platform console app development. It explained how to create a project application template and launch the application using different processor architectures (x64 or Arm64).
It then showed how to benchmark .NET applications using standard and custom code. It used the latter to demonstrate the significant performance boost when you natively execute the code on an Arm64-powered device — nearly three times faster for matrix multiplication. Code running on x64 must use the emulation layer, consuming extra CPU and memory. Without this extra layer, the native Arm64 achieves a performance advantage and better efficiency.
Now that you have learned how to harness the power of Arm64, start using Arm64 for your .NET apps.