Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

.NET Performance on Arm64

0.00/5 (No votes)
8 Sep 2023 1  
This article demonstrates how to use Arm64 to run .NET applications, acquiring advantages of native architecture like power efficiency and a speed gain.

Arm64 (often referred to as AArch64) provides a power-optimized architecture that is the basis for many systems on a chip (SoC). SoCs integrate CPUs, memory, GPUs, and I/O devices to perform power-efficient computing operations across various industries, applications, and devices. Because of its portability and low power consumption, Arm64 architecture is ideal for mobile devices. However, laptops and desktops are also starting to use Arm64.

Microsoft Windows 11 has helped speed up this adoption by supporting Arm64 and offering several features that simplify app porting. Specifically, Windows 11 provides Windows on Arm (WoA) to run Python applications well via the native Arm64 approach, while Arm64EC (Emulation Compatible) helps port your x64 apps to Arm64 gradually. Also, many frameworks, including Qt, now use Windows on Arm (WoA) to run UI-based applications natively. To help developers, Microsoft has introduced Windows Dev Kit 2023, which provides a convenient Arm64-powered device.

Arm64 offers native advantages when using C++ and Python, but it also offers numerous benefits on other frameworks. For example, .NET is a cross-platform development framework for Windows, Linux, and MacOS. Combining the framework with Arm64 enables you to implement highly efficient apps on several platforms.

This article demonstrates how to use Arm64 to run .NET applications, acquiring advantages of native architecture like power efficiency and a speed gain. You can set up your development environment for .NET and understand the performance boost you can expect by running code natively on Arm64. Download the companion code to follow along.

Environment Setup

To set up your development environment, you need the following:

Begin by installing the .NET 8.0 SDK for Windows for each of two architectures, x64 and Arm64, on your Arm64 device. The SDK is currently available as a preview version (download it here). To confirm successful installation, open a Command Prompt window and enter:

Terminal
dotnet --info

This produces the following output.

By default, the dotnet command uses the Arm64 architecture when running on an Arm64 device. However, it recognizes that the x64 architecture is also available in the Other architectures found list.

After installing the .NET SDK, install Visual Studio Code for Windows as your IDE. Select User Installer for Arm64, and then launch the installer. Use the default settings. Once installed, pick your color theme. Finally, install Git for Windows using the 64-bit standalone installer.

Benchmarking .NET Applications

The .NET team provides a set of benchmark tests you can use to evaluate the performance of various .NET versions on different architectures. These benchmarks depend on the BenchmarkDotNet library, which provides a framework to simplify measurements of code execution time.

You can add these measurements to your code using C# attributes. The library evaluates execution time and reports mean computation time and standard deviation. Moreover, the library can generate plots to help you assess code performance. All these features are also available in the .NET performance benchmarks.

To use these benchmarks, start by cloning the dotnet performance repository:

Terminal
git clone https://github.com/dotnet/performance.git

Then, navigate to performance\src\benchmarks\micro as shown below.

In performance\src\benchmarks\micro, enter the following command:

Terminal
dotnet run -c Release -f net8.0

The application builds and launches, showing you the list of available performance benchmarks.

Now, type "475" and press Enter to launch performance tests for the C# string data type. The result of this operation looks as follows (scroll up to see this table).

By default, the table summarizes the performance test results. You see each performance test’s execution time and statistics (mean, median, minimum, maximum, and standard deviation). This gives you comprehensive information about code performance.

As in the case of the console app, the dotnet run command uses the Arm64 architecture by default. To launch performance tests using x64, use the -a switch:

Terminal
dotnet run -c Release -f net8.0 -a x64

However, at this time, the BenchmarkDotNet library is not compatible with the .NET SDK for x64 on Arm64-based machines. Consequently, BenchmarkDotNet reports errors and incorrect execution times.

To compare .NET performance on x64 and Arm64, use the Console App template and implement your custom benchmarks.

Custom Benchmarks

To implement custom benchmarks, use the System.Diagnostics.Stopwatch class. Start by creating the console application Arm64.Performance (dotnet new console -o Arm64.Performance). Then, open it in Visual Studio. Next create a new file by clicking New File and typing in "PerformanceHelper.cs" as the file name.

Then, open PerformanceHelper.cs, and define the PerformanceHelper class:

C#
using System.Diagnostics;
 
namespace Arm64.Performance
{
    public static class PerformanceHelper
    {
        private static readonly Stopwatch stopwatch = new();
 
        public static void MeasurePerformance(Action method, int executionCount, string label)
        {
            stopwatch.Restart();
 
            for(int i = 0; i < executionCount; i++)
            {
                method();
            }
 
            stopwatch.Stop();
 
            Console.WriteLine($"[{label}]: {stopwatch.ElapsedMilliseconds.ToString("f2")} ms");
        }
    }
}

The PerformanceHelper class is static. It has one method, MeasurePerformance, which works by invoking the provided function using the Action delegate. This delegate passes as the first parameter of the MeasurePerformance method. You invoke it as many times as specified by the method’s second parameter, executionCount. After that, the MeasurePerformance method prints the time needed to execute the specific code. Additionally, MeasurePerformance accepts a third parameter, label, which you can use to pass the string describing the performance test.

You define your performance tests by creating a new file, PerformanceTests.cs, where you declare the PerformanceTests class:

C#
namespace Arm64.Performance
{
    public static class PerformanceTests 
    { 
    }
}

This class is empty. You will extend it in the next section.

List Sorting

The performance test for list sorting shows how long it takes to sort the list containing 100,000 elements of type double. You can prepare the list using the pseudo-random number generator available in the System.Random class.

To implement this test, supplement the PerformanceTests class with the following code:

C#
public static class PerformanceTests
{ 
    private static readonly Random random = new();
 
    private static List<double> PrepareList()
    {
        const int listSize = 100000;
 
        return Enumerable.Range(0, listSize)
                    .Select(r => random.NextDouble())
                    .ToList();
    }
 
    public static void ListSorting()
    {
        var list = PrepareList();
 
        list.Sort();
    }    
}

There are three new elements:

  • Declaration and initialization of the private static random member. This is an instance of the pseudo-random number generator.
  • A private static PrepareList method creates a pseudo-random list containing 100,000 elements of type double. To generate this list, use the Enumerate.Range method from System.Linq. The pseudo-random number generator creates each element of this list.
  • A public static ListSorting method first invokes the PrepareList helper method to create the random list. Then, the Sort method sorts this random list.

Matrix Multiplication

Now, you implement square matrix multiplication. First, you extend the definition of the PerformanceTests class with the GenerateRandomMatrix method. Add this method to the PerformanceTests.cs file after ListSorting:

C#
private static double[,] GenerateRandomMatrix()
{
    const int matrixSize = 500;
 
    var matrix = new double[matrixSize, matrixSize];
 
    for (int i = 0; i < matrixSize; i++)
    {
        for (int j = 0; j < matrixSize; j++)
        {
            matrix[i, j] = random.NextDouble();
        }
    }
 
    return matrix;
}

This method generates a 500 by 500 square matrix. A double for loop, each step of which invokes random.NextDouble, generates a pseudo-randomly generated value of type double.

Next, in the PerformanceTests class, add the following method:

C#
private static double[,] MatrixMultiplication(double[,] matrix1, double[,] matrix2)
{
    if (matrix1.Length != matrix2.Length)
    {
        throw new ArgumentException("The matrices must be of equal size");
    }
 
    if (matrix1.GetLength(0) != matrix1.GetLength(1) || matrix2.GetLength(0) != matrix2.GetLength(1))
    {
        throw new ArgumentException("The matrices must be square");
    }
 
    int matrixSize = matrix2.GetLength(0);
 
    var result = new double[matrixSize, matrixSize];
 
    for (int i = 0; i < matrixSize; i++)
    {
        for (int j = 0; j < matrixSize; j++)
        {
            result[i, j] = 0;
 
            for (int k = 0; k < matrixSize; k++)
            {
                result[i, j] += matrix1[i, k] * matrix2[k, j];
            }
        }
    }
 
    return result;
}

The MatrixMultiplication method takes two square matrices as an input, and then calculates a product using a mathematical formula. Using three for loops, the result variable stores the result of matrix multiplication, which the MatrixMultiplication method returns.

Finally, in the PerformanceTests class, you implement the following public method, which generates two square matrices, and then calculates a product:

C#
public static void SquareMatrixMultiplication()
{
    var matrix1 = GenerateRandomMatrix();
    var matrix2 = GenerateRandomMatrix();
 
    MatrixMultiplication(matrix1, matrix2);
}

String Operations

For the last performance benchmark, use string operations. In the PerformanceTests class, define the loremIpsum variable, which stores a fragment of the placeholder text:

private static readonly string loremIpsum = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " +
    "Curabitur ut enim dapibus, pharetra lorem ut, accumsan massa. " +
    "Morbi et nisi feugiat magna dapibus finibus. Pellentesque habitant morbi " +
    "tristique senectus et netus et malesuada fames ac turpis egestas. Proin non luctus lectus, " +
    "vel sollicitudin ante. Nullam finibus lobortis venenatis. Nulla sit amet posuere magna, " +
    "a suscipit velit. Cras et commodo elit, nec vestibulum libero. " +
    "Cras at faucibus ex. Suspendisse ac nulla non massa aliquet sagittis. " +
    "Fusce tortor enim, feugiat ultricies ultricies at, viverra et neque. " +
    "Praesent dolor mauris, pellentesque euismod pharetra ut, interdum non velit. " +
    "Fusce vel nunc nibh. Sed mi tortor, tempor luctus tincidunt et, tristique id enim. " +
    "In nec pellentesque orci. Nulla efficitur, orci sit amet volutpat consectetur, " +
    "enim risus condimentum ex, ac tincidunt mi ipsum eu orci. Maecenas maximus nec massa in hendrerit.";

Then, implement the StringOperations public method:

C#
public static void StringOperations()
{
    loremIpsum.Split(' ');
 
    loremIpsum.Substring(loremIpsum.LastIndexOf("consectetur"));
 
    loremIpsum.Replace("nec", "orci");
 
    loremIpsum.ToLower();
 
    loremIpsum.ToUpper();
}

This method splits the placeholder text into substrings using the space separator. Then, it takes the substring starting at the last index of the word consectetur. Next, it replaces nec with orci, converts the string to lowercase, and then to uppercase to mimic typical operations on the string variables in your C# apps.

Putting Things Together

You can now use these performance tests in your console application. Modify the Program.cs file by replacing the default content (Console.WriteLine("Hello, World!");) with the following statements:

C#
using Arm64.Performance;
 
Console.WriteLine($"Processor architecture: " +
    $"{Environment.GetEnvironmentVariable("PROCESSOR_ARCHITECTURE")}");
 
const int trialCount = 5;
 
for ( int i = 0; i < trialCount; i++ )
{
    Console.WriteLine($"Trial no: {i + 1} / {trialCount}");
 
    PerformanceHelper.MeasurePerformance(PerformanceTests.ListSorting, 
        executionCount: 500, "List sorting");
 
    PerformanceHelper.MeasurePerformance(PerformanceTests.SquareMatrixMultiplication, 
        executionCount: 10, "Matrix multiplication");
 
    PerformanceHelper.MeasurePerformance(PerformanceTests.StringOperations, 
        executionCount: 500000, "String operations");
}

This code imports the Arm64.Performance namespace, where you defined the PerformanceHelper and PerformanceTests classes. Then, the code prints the processor architecture, either Arm64 or AMD64, depending on the architecture of the SDK used to run the app.

You have one constant, trialCount, which you can use to specify how often the performance test sets execute. Run ListSorting 500 times for each test batch, then perform SquareMatrixMultiplication 10 times and StringOperations 500,000 times. This achieves comparable execution times for the test batches. A single matrix multiplication is slower than a single-string operation. So, there must be more executions of the latter.

.NET Performance Boost on Arm64

You can now launch the app to evaluate its performance on different architectures. Start by running the app using Arm64. In the Arm64.Performance folder, enter:

Terminal
dotnet run -c Release

This launches the console app, and you should see the following output.

Now, to compare the execution times, launch the app using the x64 architecture:

Terminal
dotnet run -c Release -a x64

This command leads to the following output:

The operations all take more time on the emulated x64 than when you execute them natively on Arm64. On average, native execution provides about a 15 percent performance improvement for list sorting, 291 percent for matrix multiplication, and 239 percent for string operations.

The following chart summarizes the mean execution times for x64 and Arm64’s natively executed code.

The graph shows significant improvement of execution times.

Conclusion

This article demonstrated how to use the .NET SDK for cross-platform console app development. It explained how to create a project application template and launch the application using different processor architectures (x64 or Arm64).

It then showed how to benchmark .NET applications using standard and custom code. It used the latter to demonstrate the significant performance boost when you natively execute the code on an Arm64-powered device — nearly three times faster for matrix multiplication. Code running on x64 must use the emulation layer, consuming extra CPU and memory. Without this extra layer, the native Arm64 achieves a performance advantage and better efficiency.

Now that you have learned how to harness the power of Arm64, start using Arm64 for your .NET apps.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here