Summary
MXNet is an open-source deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices. It is highly scalable, allowing for fast model training, and supports a flexible programming model and multiple languages. MXNet allows you to mix symbolic and imperative programming flavors to maximize both efficiency and productivity. MXNet is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The latest version of MXNet includes built-in support for the Intel® Math Kernel Library (Intel® MKL) 2017. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors.
Prerequisites
Follow the instructions given here.
Building/Installing with MKL
MXNet can be installed and used with several combinations of development tools and libraries on a variety of platforms. This tutorial provides one such recipe describing steps to build and install MXNet with Intel MKL 2017 on CentOS*- and Ubuntu*-based systems.
1. Clone the mxnet tree and pull down it’s submodule dependencies
git submodule update --init --recursive
git clone https://github.com/dmlc/mxnet.git
2. Edit the following lines in make/config.mk to "1" to enable MKL support.
With these enabled when you attempt your build it will pull the latest MKL package for you and install it on your system.
USE_MKL2017 = 1
USE_MKL2017_EXPERIMENTAL = 1
3. Build the mxnet library
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
4. Install the python modules
cd python
python setup.py install
Benchmarks
A range of standard image classification benchmarks can be found under examples/image-classification. We’ll focus on running a benchmark meant to test inference across a range of topologies.
Running Inference Benchmark:
The provided benchmark_score.py will run a variety of standard topologies (AlexNet, Inception, ResNet, etc) at a range of batch sizes and report the img/sec results. Prior to running set the following environmental variables for optimal performance:
export OMP_NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
export KMP_AFFINITY=granularity=fine,compact,1,0
Then run the benchmark by doing:
python benchmark_score.py
If everything is installed correctly you should expect to see img/sec #’s output for a variety of topologies and batch sizes. Ex:
INFO:root:network: alexnet
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: XXX
INFO:root:batch size 2, image/sec: XXX
…
INFO:root:batch size 32, image/sec: XXX
INFO:root:network: vgg
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: XXX