The Arm architecture brings power and efficiency to edge computing and mobile devices, especially for newer Windows on Arm (WoA) devices.
Python, a widely used programming language, now has native support for Arm platforms using Windows. Starting with Python 3.11, an official installer for WoA is now available, so it’s time to start targeting WoA.
This article demonstrates the convenience of using native Arm Python 3.11 on Arm-powered devices to experience up to a threefold performance boost over using it in emulation mode.
Python on Arm
CPython provides the official Python implementation and its standard library. It compiles code into bytecode before interpretation, enabling it to contain platform-specific code. You can install it using the installer for ARM64 or an older version available from Nuget or build it directly from the source.
Prerequisites
This tutorial uses the Windows Dev Kit 2023 (Project Volterra) for development. However, you can achieve a similar performance boost on Surface Pro 9 5G, and the Lenovo X13s or on Apple silicon devices, by using Parallels Desktop to run Windows 11.
Setting Up
To set up native Arm support for Python, you need native Arm C build tools. As explained in Arm documentation, you can install them using a standalone installer or through Visual Studio 2022 Community for desktop development with C++. This ensures that native Arm C build tools are present. Alternatively, you can use a standalone installer for the build tools.
Once you have Visual Studio installed, switch to Visual Studio Code (VS Code) for Arm64 for development so you can use its included Python tools.
After installing VS Code, install the Python extensions.
For this demo, you will install both the Arm64-specific Python and the standard non-Arm Python to compare and contrast them.
Start by installing Arm64 Python 3.11 and then x64Python 3.11. Choose the default settings for simplicity and consistency.
By default Python installs both packages in the following file path:
Users\<User_name>\AppData\Local\Programs\Python
There are two subfolders: Python311 for the x64 version and Python311-arm64 for the Arm64 version. By running python.exe from each subfolder, you see they are built with different 64-bit Microsoft C compilers: either AMD64 or ARM64.
Alternatively, to see the different Python versions, you could call py –3.11 or py –3.11-arm64.
The installation process is straightforward. However, because x64 and Arm64-based Python use different C compilers, the Python packages can have compatibility and porting issues. Traditionally, you install Python packages using pip, which automatically installs the dependencies. First, pip tries to find the platform-independent package (called the wheel). Then, it looks for the platform-specific package and eventually builds it from the source code.
Python Packages on Arm64
If you are writing Python packages to take advantage of Arm64, you must ensure you compile your packages for Arm64, not x64. This problem is not present for pure (platform-independent) Python packages.
Make the Python directory your working directory:
cd: Users\<User_name>\AppData\Local\Programs\Python
Now set up x64 by typing the following command:
Python311\python.exe -m pip install --upgrade pip
This upgrades pip to the most recent version.
Now, install the NumPy package, which you will use later to implement your sample application. To install NumPy, type:
Python311\Scripts\pip.exe install numpy
You’ll see that it downloaded the platform-specific NumPy’s wheel for x64 Python.
Now, repeat the procedure for the Arm64 version of Python:
Python311-Arm64\python.exe -m pip install --upgrade pip
Python311-arm64\Scripts\pip.exe install numpy
For Arm64, there is not a platform-specific wheel. So, pip downloads and builds the package from the source code to create the local Arm64 package wheel.
Development
You now have all the tools needed to implement the actual Python app.
Start by creating the new file, sample.py, in the PythonOnWoa
directory.
Then, import the NumPy and time packages.
import numpy as np
import time
The first package is for numerical computations and the second is for measuring the computation time.
Next, define a function that calculates a signal’s fast Fourier transform (FFT). Here, the signal is composed of a single-frequency sine wave with some random noise.
Repeat the FFT multiple times (trial_count
) to have a stable estimate of the computation time.
def perform_sin_fft(signal_length, frequency, trial_count):
start = time.time()
for i in np.arange(1, trial_count+1):
ramp = np.linspace(0, 2 * np.pi, signal_length)
noise = np.random.rand(signal_length)
input_signal = np.sin(ramp * frequency) + 0.1*noise
np.fft.fft(input_signal)
computation_time = time.time() - start
return computation_time
The above function returns the total time (in seconds) needed for calculating the FFT.
To measure the performance, invoke the perform_sin_fft
function for various signal lengths.
signal_lengths = [2**10, 2**11, 2**12, 2**13, 2**14]
trial_count = 5000
for signal_length in signal_lengths:
frequency = int(signal_length / 4)
computation_time = perform_sin_fft(signal_length, frequency, trial_count)
print("Signal length {}, Computation time {:.3f} s".format(signal_length, computation_time))
Now run this script using Arm64 and non-Arm64 Python 3.11 to measure the performance difference:
.\Python311\python.exe <path_to_your_sample.py>
.\Python311-arm64\python.exe <path_to_your_sample.py>
The first command executes the script using x64 emulation mode. The computation times depend on the signal length. Specifically, for 16,384 points, the computation time is 6.86 seconds. The second command uses Arm64 Python, producing much shorter computation times. The same 16,384-point computation takes 2.72 seconds, reducing the computation time to about 40 percent of the time needed by the emulation mode (x64). This difference represents a performance boost of about two and a half times the speed of the emulation mode.
This graph illustrates the computation times and the corresponding performance boosts.
The Future of Python on Arm
Python 3.11 with native Arm64 presents a massive opportunity for Python developers looking to get the most out of your Arm-powered devices on Windows 11. As more developers add support to their Python packages, you will see even more performance improvements.
One example is this Linaro demonstration of porting TensorFlow to Arm64, which displays impressive speed improvements and offers tremendous possibilities for AI, data scientists, and researchers reliant on the ease and power of Python.
Conclusion
This article walked you through installing native Arm64 Python 3.11 on Windows 11, including setting up your development environment to ensure all the necessary tools are in place.
You wrote a simple module that applied a fast Fourier transformation to a signal and saw the performance improvements Arm64 Python unlocked. This performance improvement accelerates support for WoA. Many companies are jumping on board to port libraries and the toolset, so they can employ Arm64 to accelerate Python workloads.
Get started with WoA today and try Python 3.11 from the official WoA installer to get the power you need with the efficiency you demand from your Arm devices.