GPU

Great Reads

A new Adaptive Scalable Texture Compression (ASTC) codec to speed up texture processing in games

by Jeremy C. Ong

Texture encoders and decoders are often the most bandwidth intensive parts of a game and optimized encoding with Arm's ASTC encoder can provide a lot of benefit to runtime decoding.

Advancing OpenCL™ for FPGAs

by Intel

Boosting Performance with Intel® FPGA SDK for OpenCL™ Technology

ANNdotNET v1.0 Has Been Released

by Bahrudin Hrnjica

ANNdotNET v1.0 has been released

C++ Virtual-Array Implementation that is Backed by the Combined Video Memory of User-System

by tugrulGtx

Header-only C++ tool that supports basic array-like usage pattern and uses multiple graphics cards in system as storage with LRU caching

Latest Articles

A new Adaptive Scalable Texture Compression (ASTC) codec to speed up texture processing in games

by Jeremy C. Ong

Texture encoders and decoders are often the most bandwidth intensive parts of a game and optimized encoding with Arm's ASTC encoder can provide a lot of benefit to runtime decoding.

Advancing OpenCL™ for FPGAs

by Intel

Boosting Performance with Intel® FPGA SDK for OpenCL™ Technology

ANNdotNET v1.0 Has Been Released

by Bahrudin Hrnjica

ANNdotNET v1.0 has been released

C++ Virtual-Array Implementation that is Backed by the Combined Video Memory of User-System

by tugrulGtx

Header-only C++ tool that supports basic array-like usage pattern and uses multiple graphics cards in system as storage with LRU caching

All Articles

top

GPU

A new Adaptive Scalable Texture Compression (ASTC) codec to speed up texture processing in games

by Jeremy C. Ong

Texture encoders and decoders are often the most bandwidth intensive parts of a game and optimized encoding with Arm's ASTC encoder can provide a lot of benefit to runtime decoding.

Advancing OpenCL™ for FPGAs

by Intel

Boosting Performance with Intel® FPGA SDK for OpenCL™ Technology

ANNdotNET v1.0 Has Been Released

by Bahrudin Hrnjica

ANNdotNET v1.0 has been released

C++ Virtual-Array Implementation that is Backed by the Combined Video Memory of User-System

by tugrulGtx

Header-only C++ tool that supports basic array-like usage pattern and uses multiple graphics cards in system as storage with LRU caching

Comparing Programming models: SYCL and CUDA

by Dhruv__Patel

In this article we compare and contrast SYCL and CUDA, and discuss how the oneAPI compiler can work with SYCL.

Cross-Architecture Capabilities: Thinking With GPUs

by Roger Winter

In this article we look at how developers can take advantage of the cross-architecture of oneAPI to make use of GPU resources in their applications.

Deep-Learning AI on Low-Power Microcontrollers: MNIST Handwriting Recognition Using TensorFlow Lite Micro on Arm Cortex-M Devices

by Raphael Mun

In this article we’re going to build a fully functional MNIST handwriting recognition app using TensorFlow Lite to run our AI inference on a low-power STMicroelectronics microcontroller using an Arm Cortex M7-based processor

Descriptive Statistics and Data Normalization with CNTK and C#

by Bahrudin Hrnjica

How to calculate some of the basic statistics operations on data set

Direct3D* 12 - Console API Efficiency & Performance on PCs

by Michael Coppock

At GDC 2014, Microsoft announced stunning news for PC Gaming in 2015—the next iteration of Direct3D, version 12. D3D 12 returns to low level programming; it gives more control to game developers and introduces many new exciting features.

DirectX 11 Compute Shaders

by Asif Bahrainwala

HPC via Compute Shaders (GPGPU).

Discover How oneAPI Is Revolutionizing Programming

by Intel

It's not just between one architecture today and another architecture today, but the architectures of today and tomorrow. oneAPI has the potential to become the industry norm for compiling code for all kinds of architectures.

Efficient Heterogeneous Parallel Programming Using OpenMP

by Elmira Volkova, Alexander Bobyr, Igor Ermolaev

In this article, we divide the computation between the host CPU and a discrete Intel® GPU such that both processors are kept busy.

GCN Assembler for AMD GPUs

by Ryan Scott White

an assembler/compiler for AMD’s GCN (Generation Core Next Architecture) Assembly Language

Gelid Icy Vision Rev 2- GPU Cooler

by DaveAuld

Replacement GPU Cooler - Icy cool or hot and bothered?

Generative AI Playground: LLMs with Camel-5b and Open LLaMA 3B on the Latest Intel® GPU

by Benjamin Consolvo

This article explores the use of Large Language Models (LLMs) in various applications, such as chatbots, code generation, and debugging.

Get Started with the Intel® Distribution of OpenVINO™ Toolkit and AWS Greengrass

by Intel

This section describes implementation of FaaS inference samples (based on Python 2.7) using Amazon Web Services (AWS) Greengrass and AWS Lambda software.

Heterogeneous Computing Implementation via OpenCL™

by Intel

This article is a step-by-step guide on the methodology of dispatching a workload to all OpenCL devices in the platform with the same kernel to jointly achieve a computing task.

High Performance Computer Graphics for Android Mobile Game Development Using Vulkan API

by Raphael Mun

In this article we briefly look at two examples of how to use Vulkan to maximize the graphics performance in your game.

How to Determine if You’re Getting the Most from Your Cross-Architecture Implementation

by Dhruv__Patel

In this article we look at how developers can use Intel® Advisor and Intel® VTune™ Profiler to efficiently offload to GPU and optimize their cross-architecture applications.

Intel® RealSense™ SDK Background Segmentation Feature

by Intel

This whitepaper describes how developers can integrate Intel® RealSense™ SDK background segmentation (BGS) middleware to create new immersive collaboration applications.

Introducing the Process of Mining in Blockchain

by Packt Publishing

Bitcoin mining process

Is Your Game GPU-bound

by Intel

In this article, we’ll walk through a quick and easy way to see whether your game is CPU-bound using a high-level system overview.

Learning Modern OpenGL

by Bartlomiej Filipek

A little guide about modern OpenGL and why it gives us so much value.

Linear Regression with CNTK and C#

by Bahrudin Hrnjica

Linear regression with CNTK and C#

Optimizing Android Game mTricks Looting Crown on the Intel® Atom™ Platform

by Android on Intel

This article shows how to analyze and improve the performance of a mobile game and how to optimize graphic resources for a mobile platform, using mTricks Looting Crown as an example.

Port a CUDA App to oneAPI and DPC++ in 5 Minutes

by Jeremy C. Ong

A quick 5-minute introduction to porting a CUDA app to Data Parallel C++ (DPC++)

Sorting and Removing Elements from the Structure of Arrays (SOA) in C++

by Igor Gribanov

C++ iterators and algorithms work well for containers, but can we sort the Structure of Arrays?

Sparse Procedural Volumetric Rendering

by Doug Mcnabb

Sparse Procedural Volumetric Rendering (SPVR) is a technique for rendering real-time volumetric effects. We’re excited that the upcoming book “GPU Pro 6” will include an SPVR chapter. This document gives some additional details.

SYCL and OpenCL

by Sergiu Oprea

Differences and similarities between SYCL and OpenCL

Touch Response Measurement, Analysis, and Optimization for Windows Applications

by Tom Pantels

This paper describes two applications that demonstrate problems such as poor or no touch response times and high energy consumption, both of which are critical to app performance and UX. We then discuss how to optimize these applications to resolve these problems.

Transfer Learning with TensorFlow on Intel® Arc™ GPUs

by Intel

Fast and Easy Training and Inference Using Intel® Consumer GPUs and Windows* Subsystem for Linux 2

World of Tanks Blitz: Automated Performance Testing for Modern Graphics Needs

by Pavel Busko

Arm Mobile Studio Pro allows us to record hardware counters for the Mali family GPUs while performing an autotest on a CI server, helping us solve tasks.