Parallel programming has popped up as an answer to the gradual end of Moore’s law. Advancements in the twenty-first century have led to complex problems that require more computing power, but physical and practical limits impede further significant improvements in computer hardware.
As such, the focus has shifted from hardware to software. Parallel programming models are one way to improve performance by more efficient hardware use. This approach uses the heterogeneous architectures already available and significantly enhances the performance of computationally intensive applications. Let’s compare two parallel programming options: CUDA and SYCL.
CUDA
The Compute Unified Device Architecture (CUDA) platform is a parallel computing platform and programming model for general computing. CUDA enables developers to harness the power of graphic processing units (GPUs) to speed up applications.
This method distributes a computing load between the central processing unit (CPU) and the GPU. NVIDIA developed CUDA for its GPUs and programmed them using CUDA C.
SYCL
Another parallel programming platform is SCYL, offering several advantages over CUDA. SYCL is a single-source, standard C++ programming model. It’s unlike CUDA, which requires separate source code files and uses derived C++ for programming.
SYCL allows developers to support various devices, including CPUs, GPUs, and field-programmable gate arrays (FPGAs) by using a single code written in standard C++. It’s a platform-independent programming model without vendor lock.
Due to the growing heterogeneity of processors available, SYCL has attained importance due to its flexibility and independence. CUDA, in contrast, only works with NVIDIA systems.
SYCL provides an abstraction layer that allows code for heterogeneous processors to be in a single source file instead of separate host and kernel files.
Multiple SYCL implementations are available in the market. One SYCL implementation, spearheaded by Intel, is oneAPI. The oneAPI specification is an open, standards-based unified programming model that provides the same developer experience across several architectures.
DPC++ is the primary language for oneAPI. It’s based on SYCL to provide data parallelism and heterogeneous programming for performance in CPUs and accelerators. DPC ++ is based on conventional C and C++ constructs. OneAPI aims to simplify programming and allow source code reusability across hardware platforms while enabling unique accelerator tweaking.
Comparing SYCL and CUDA
SYCL and CUDA serve the same purpose: to enhance performance through processing parallelization in varied architectures. However, SYCL offers better functionality than CUDA while simplifying the coding process.
Instead of using complex syntax, SYCL enables developers to use ISO C++ for programming. CUDA uses a specially developed syntax called CUDA C and offers no code reusability. Unlike CUDA, SYCL is a pure C++ domain-specific embedded language that doesn’t require C++ extensions, allowing for a simple CPU implementation that relies on pure runtime rather than a particular compiler.
Working with SYCL is easier than working with CUDA. Consider, for instance, the difference between the languages of CUDA and SYCL. SYCL contains standard ISO C++ constructs. CUDA uses specially-designed CUDA C. SYCL programming doesn’t require any other language extensions, a great convenience from a developer's point of view.
SYCL is a competitive alternative to CUDA in terms of programmability, needing fewer lines of code to create kernels and less frequent calls to crucial API functions. Also, there’s no need for a complex toolchain to develop an application, and the tools ecosystem is readily available, ensuring a hassle-free development experience. SYCL code is also more high-level than CUDA, and offers code clarity and readability.
SYCL offers a wide range of benefits over CUDA. SYCL doesn't need separate source files. Instead, you can find the code for the host and the device in the same source file. A SYCL implementation's responsibility is to split the C++ source file and forward each piece of parsed code to the appropriate compilation backend.
SYCL is also vendor-agnostic. SYCL-implemented code executes on practically any platform, from CPUs to accelerated servers. By generalizing and introducing additional APIs, SYCL has evolved into a high-level programming model that can target a wide range of hardware. It enables generic programming and backend-specific optimizations.
Unlike CUDA, SYCL tries to solve the challenge of architecture interoperability by providing a set of interfaces for standard building blocks. You can optimize these blocks for multiple manufacturers and target platforms. The oneAPI initiative using SYCL delivers an application platform with one unified interface that you can implement on diverse platforms. Just combine numerous libraries spanning a wide variety of building blocks with a standard programming style.
Extensive studies and experiments have established that the performance of SYCL-based applications is comparable to that of CUDA. And if there’s a performance gap, primarily due to specific vendor-related enhancements and calibrations, it’s getting narrower at a fast pace. It doesn’t appear important when comparing this performance gap to the convenience of portability and reusability. Fast-paced SYCL adoption will ultimately lead to optimum performance, as separate algorithms that run in parallel and code paths for varied hardware become available.
Unlike CUDA, where the architecture must be precisely the same, we only need to specify our intention to read and write for the SYCL runtime to figure out which buffers it must transfer to and from host containers. SYCL command queues must be asynchronous, and while the actual execution order is unknown, the runtime satisfies data dependencies between kernels.
Performance portability is a priority for SYCL. However, because most performance portability is for lower-level building blocks built for this purpose, various architectures need to be considered. Adapting a kernel to a new hardware platform must be as simple, convenient, and painless as possible.
Migrating from CUDA to SYCL Using DPC++
CUDA has established a monopoly in parallel programming and the general-purpose computing on graphics processing units (GPGPU) field. Since the industry has become accustomed to using CUDA even with inherent vendor lock-in and higher operating costs, you may think that migration from CUDA to SYCL is impossible due to code incompatibility. However, SYCL provides a way out for when you wish to transfer to SYCL but are unable to due to the legacy coding you’ve already completed in CUDA.
OneAPI based on SYCL provides a compatibility tool to help with CUDA-based code transfer to the SYCL-based data-parallel C++ (DPC++) programming language. The DPC++ programming language is often called C++ with C++ SYCL and is at the heart of the oneAPI environment.
Since OneAPI is based on the SYCL programming model, converting CUDA code to SYCL with DPC++ involves converting the code to SYCL. The DPC++ tool generates human-readable code, with the original identifiers from the original code preserved. The conversion tool also detects and transforms standard CUDA index computations to SYCL.
The compatibility tool modifies CUDA-related code and leaves the rest alone. As a result, you must make only minor manual changes to create a runnable application. Furthermore, the modified elements are still human-readable and hence easily inspectable.
Conclusion
SYCL is a royalty-free, cross-platform abstraction layer that enables developers to code for heterogeneous processors in ISO C++. The application’s host and kernel code are in the same source file.
SYCL has evolved into a high-level programming model that can target a wide range of hardware. SYCL outperforms CUDA in capabilities and speeds the coding process. In terms of programmability, SYCL is a viable alternative to CUDA, requiring fewer lines of code to generate kernels and less frequent calls to critical API functions.
Code written in SYCL can run on almost any platform, from CPUs to accelerated servers. The OneAPI, based on SYCL, is a compatibility tool for CUDA-based code transfer to the SYCL-based DPC++ programming language.
Although SYCL’s advantages over CUDA automatically inspire you to give SYCL a chance, we urge you to familiarize yourself with oneAPI based on SYCL to implement the SYCL model better. The OneAPI initiative is an attempt at developing an industry standard that is portable and vendor agnostic. Industry leaders in high-performance computing (HPC), AI inventors, hardware vendors and original equipment manufacturers (OEMs), and universities are all taking a considerable interest in SYCL and coming on board. Try oneAPI for yourself to experience the difference.