One of the challenges of AI has been catering for the wide range of compute devices that might be used for inference. The OpenVINO™ toolkit helps to speed up AI applications, by providing a runtime that takes advantage of hardware features in the processor, GPU, or Vision Processing Unit (VPU). It abstracts away the complexity of writing for different architectures, while enabling developers to unlock the performance features of their target platform.
Now, with the 2022.1 release of OpenVINO, the Automatic Device Plugin (AUTO) makes it easy to target different devices. It can automatically select the most suitable target device, and configure it appropriately to prioritize either latency or throughput. The new plugin also includes a feature to accelerate first inference latency.
What Is AUTO in the OpenVINO Toolkit?
The Automatic Device Plugin (or AUTO for short) is a new virtual proxy device in OpenVINO that doesn’t bind to a specific type of hardware device.
When you choose AUTO as your target platform, OpenVINO automatically discovers the accelerators and hardware features of the platform, and chooses which one to use to achieve your goals. You provide hints in the configuration API to tell OpenVINO to optimize for latency or throughput, depending on the application.
The benefits of using AUTO include:
- Faster application development: Using AUTO, there is no need for the application to include the logic for detecting, choosing, or configuring the compute devices.
- Improved application portability: Because applications do not need to include dedicated code for a particular device, or even select devices from a list, the application is more portable. Not only is it easier to run the application on other platforms today, but it also enables the application to take advantage of new generations of hardware as they become available.
- Faster application startup: AUTO enables applications to start quickly by using the CPU while other target platforms, such as GPUs, are still loading the AI network. When the network has loaded, AUTO can switch inference over to the GPUs.
- Uses hints instead of configuration: Using AUTO, there’s no need to provide device-specific configuration. Instead, you can express performance hints to prioritize either latency or throughput. AUTO takes care of choosing the best device. OpenVINO provides configurations for developers to choose from, for example to use multiple cores in parallel or use a large queue.
How to Configure AUTO in OpenVINO
AUTO is part of OpenVINO’s core functionality. To use it, either choose “AUTO” as the device name, or leave the device name out.
Here’s a C++ example:
ov::CompiledModel model0 = core.compile_model(model);
ov::CompiledModel model1 = core.compile_model(model, "AUTO");
Here’s a Python example:
compiled_model0 = core.compile_model(model=model)
compiled_model1 = core.compile_model(model=model, device_name="AUTO")
Specify the Devices to Use
AUTO has an option to choose only from your preferred devices. For example, the following code shows the scenario where the CPU and GPU are the only two devices acceptable for network execution.
Here’s a C++ example:
ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
ov::CompiledModel model4 = core.compile_model(model, "AUTO", ov::device::priorities("GPU,CPU"));
Here’s a Python example:
compiled_model3 = core.compile_model(model=model, device_name="AUTO:GPU,CPU")
compiled_model4 = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
Provide Performance Hints
You can also, optionally, give AUTO a performance hint of either “latency” or “throughput.” AUTO then chooses the best hardware device and configuration to achieve your goal.
Here’s a C++ example:
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));ov::hint::PerformanceMode::LATENCY));
Here’s a Python example:
compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})
Using the googlenet-v1 model on an Intel® CoreTM i7 processor , we found that using a throughput hint with an integrated GPU delivers twice the frames per second (FPS) performance compared to a latency hint¹. In contrast, using the latency hint with the GPU delivered more than 10 times lower latency than the throughput hint¹.
Note that the hints do not require device-specific settings, and are also completely portable between compute devices. That means more performance, with fewer code changes, and with less expert knowledge required.
To achieve the throughput results, the device is configured for higher utilization, for example with increased batch size, and more threads and streams. For the latency scenario, the size of the task queue and parallelization are reduced to achieve a faster turn-around time.
A full Python example of AUTO usage with performance hints is available in this OpenVINO notebook.
How Does AUTO Choose Devices?
When you use the 2022.1 release of OpenVINO, AUTO selects devices in the order shown in Table 1, depending on whether the device is available and can support the precision of the AI model. The device is selected only once when the AI network is loaded. The CPU is the default fallback device.
Table 1: How AUTO prioritizes compute device for use with AI networks, depending on the availability of the device and the AI network’s precision
Figure 1 shows how AUTO acts as a proxy device, and selects the most appropriate device for an AI network to run on.
Figure 1: AUTO acts as a proxy device between the applications and the devices.
Accelerating First Inference Latency
One of the key performance benefits of AUTO is in accelerating first inference latency (FIL).
Compiling the OpenCL graph to GPU-optimized kernels takes a few seconds. This initialization time may not be tolerable for some applications, such as face-based authentication.
Using a CPU would provide the shortest FIL, because the OpenVINO graph representations can be JIT-compiled quickly for the CPU. However, the CPU may not be the best platform to meet the developer’s throughput or latency goals after startup.
To speed up the FIL, AUTO uses the CPU as the first inference device until the GPU is ready (see Figure 2). The FIL with AUTO is close to that of a CPU device, even though the CPU does the inference in addition to the network compilation for the GPU. With AUTO, we have seen FIL reduced by more than 10 times compared to only using a GPU1.
Note, however, that the throughput on the CPU may be worse than with a GPU. For real-time applications where you need to meet a throughput target, the initial period of slow inference may not be acceptable. It might be preferable to wait for the model to load on the GPU. In many cases, it is recommended to use model/kernel caching for faster model loading.
Figure 2: AUTO cuts first inference latency (FIL) by running inference on the CPU until the GPU is ready.
Debugging Using Information AUTO Exposes
If there are execution problems, AUTO provides information on exceptions and error values. Should the returned data not be enough for debugging purposes, more information may be acquired using ov::log::Level
.
All major performance calls of both the runtime and AUTO are instrumented with Instrumentation and Tracing Technology (ITT) APIs. For more information, see the documentation on OpenVINO profiling and the Intel® VTune™ Profiler User Guide.
Future Releases of AUTO
In future releases, AUTO will provide more performance hints and will balance workloads at the system level, for example, by offloading the inferences of one neural network to multiple hardware devices (similar to the Multi-Device Plugin).
Summary
In summary, using the new AUTO device plugin in OpenVINO:
- Developers don’t need to update their application logic to use the advanced features and capabilities provided by Intel’s new platforms and new versions of OpenVINO.
- Developers can enjoy optimized performance with faster time to market.
For more information, see the AUTO documentation.
Resources
Notices and Disclaimers
Performance varies by use, configuration, and other factors. Learn more at www.intel.com/PerformanceIndex" www.intel.com/PerformanceIndex
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Intel technologies may require enabled hardware, software or service activation.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
¹Test Configuration
Configuration one: Intel® Core™ i7–10710U Processor with DDR4 2*16 GB at 2666MHz, Integrated GPU, OS: Windows 10 Enterprise Version 10.0.19042 Build 19042, Microsoft Visual Studio Community 2019 Version 16.11.8, Intel(R) UHD Graphics driver Version 30.0.101.1191, OpenVINO 2022.1 (zip file download), googlenet-v1 network model. Tested with notebook 106-auto-device.
Configuration two: Intel® Core™ i7–1165G7 Processor with DDR4 2*16 GB at 4,266 MHz, Integrate GPU, Intel® Iris® Xe MAX Graphics, OS: Windows 10 Enterprise Version 10.0.19042 Build 19042, Microsoft Visual Studio Community 2019 Version 16.11.10, Intel(R) Iris® Xe Graphics driver Version 30.0.101.1003 (Integrated GPU), Intel(R) Iris® Xe MAX Graphics driver Version 30.0.101.1340 (Discrete GPU), OpenVINO 2022.1 (zip file download), googlenet-v1 network model. Tested with CPP benchmark_app inside OpenVINO 2022.1.
The tests were conducted by Intel on April 20, 2022.