Click here to register and download your free 30-day trial of Intel® Parallel Studio XE.
Intel® Data Analytics Acceleration Library (Intel® DAAL) is a performance library for data analytics that provides optimized building blocks for data analysis and machine learning on Intel® platforms. The functions in Intel DAAL cover all the stages of data processing in machine learning, from preprocessing, transformation, analysis, and modeling to decision-making. For each stage of processing, Intel DAAL includes functions and utilities that are optimized for Intel® architecture, including Intel® Atom™, Intel® Core™, Intel® Xeon®, and Intel® Xeon Phi™ processors. Intel DAAL supports batch, online, and distributed processing modes, and it supports three programming APIs: C++, Java*, and Python*.
Here, we’ll explore a C++ example of the Intel DAAL application in handwritten digit recognition—a classical type of machine learning problem. Several application algorithms are relevant. Support vector machine (SVM), principal component analysis (PCA), Naïve Bayes, and neural network can all be used to tackle this problem with different prediction accuracy. Intel DAAL includes most of these algorithms. We have chosen SVM to demonstrate how a library algorithm in Intel DAAL can be used in handwritten digit recognition.
Loading Data in Intel DAAL
In handwritten digit recognition, recognition is essentially the prediction or inference stage in the machine learning pipeline. Given a handwritten digit, the system should be able to recognize or infer what digit was written. For a system to be able to predict the output with a given input, it needs a trained model learned from the training data set that provides the system with the capability to make an inference or prediction. The first step before constructing a training model is to collect training data.
For our sample application, we are using public data downloadable from the UCI Machine Learning Repository, where there are 3,823 preprocessed training data and 1,797 preprocessed testing data. Intel DAAL supports several data formats (CSV, SQL, HDFS, and KDB+), as well as user-defined data formats. In our example, we’ll use the CSV format. Let’s assume we store the training data in a file named digits_tra.csv and the testing data in a file named digits_tes.csv.
Intel DAAL provides some utilities to help load data into memory from the data source. Here, we first define the object trainDataSource
, which is a CSVFeatureManager
that can load the data from a CSV file into memory. Inside Intel DAAL, data in memory are stored as a numerical table. With CSVFeatureManager
, a numeric table is automatically created. To load the data from the CSV file, simply invoke the member function loadDataBlock()
. Now the data are loaded into memory as a numeric table that is ready for further computation. The core parts of the C++ codes for loading the training data from the CSV file are shown in Figure 1.
string trainDatasetFileName = "digits_tra.csv";
string testDatasetFileName = "digits_tes.csv";
FileDataSource<CSVFeatureManager> trainDataSource(trainDatasetFileName,
DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
trainDataSource.loadDataBlock(nTrainObservations);
Figure 1 - C++ codes for loading training data
Training SVM-Based Handwritten Digit Recognition Model
Now that the training data are loaded into memory, it is ready to learn from the data and build a training model. We pick the SVM algorithm for training. In this example, since the training data set is small, data can be fit into memory all at once. We’ll use batch processing mode for our demonstration. After an algorithm is defined, set the related parameters needed for algorithm, such as the number of classes (in this case, 10). Then the training data is passed to algorithm via the training data numeric table trainDataSource.getNumericTable()
.
By invoking algorithm.compute()
, the SVM computation is active. After that, training is complete. The trained model is embedded in the object trainingResult
, which is obtained by invoking algorithm.getResult()
. The sample codes of the training process are shown in Figure 2.
services::SharedPtr<svm::training::Batch<> > training(new
svm::training::Batch<>());
multi_class_classifier::training::Batch<> algorithm;
algorithm.parameter.nClasses = nClasses;
algorithm.parameter.training = training;
algorithm.input.set(classifier::training::data,
trainDataSource.getNumericTable());
algorithm.compute();
trainingResult = algorithm.getResult();
ModelFileWriter writer("./model");
writer.serializeToFile(trainingResult->get(classifier::training::model));
Figure 2 - Sample process training codes
Intel DAAL also provides serialization functions that can read out the trained model from memory into a file and deserialization functions that can load the trained model file into memory. As shown in the last two lines of codes in Figure 2, we define a ModelFileWriter
that writes to a file named model. By invoking writer.serializeToFile()
, the trained model embedded in trainingResult
is written to the file named model. This serialization/deserialization utility is useful in a case where, after training, a server can port the trained model to a client, and the client can do the prediction or inference using the trained model without training. We will see how this model file is utilized in the section "Handwritten Digit Recognition Application."
Testing the Trained Model
With the trained model, we can do testing. As we did for training, we store the testing data from the UCI Machine Learning Repository in a CSV file and load it with testDataSource
. loadDataBlock()
. We still need to define the SVM algorithm object algorithm for prediction. We have two inputs for algorithm: the testing data and the trained model. Testing data are passed by testDataSource.getNumericTable()
. In batch testing, the trained model is passed by a pointer of trainingResult->get()
. (In the section "Handwritten Digit Recognition Application," we will see how to pass the trained model by a file.)
After setting up algorithm inputs, calling algorithm.compute()
will complete the testing process. The sample codes for testing are shown in Figure 3. The predicted results after testing are retrievable by calling algorithm.getResult()
.
services::SharedPtr<svm::prediction::Batch<> > prediction(new
svm::prediction::Batch<>());
FileDataSource<CSVFeatureManager> testDataSource(testDatasetFileName,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);
testDataSource.loadDataBlock(nTestObservations);
multi_class_classifier::prediction::Batch<> algorithm;
algorithm.parameter.prediction = prediction;
algorithm.input.set(classifier::prediction::data,
testDataSource.getNumericTable());
algorithm.input.set(classifier::prediction::model,
trainingResult->get(classifier::training::model));
algorithm.compute();
predictionResult = algorithm.getResult();
Figure 3 - Sample codes for testing
Intel DAAL also includes functions to calculate the quality metrics, such as the confusion matrix, average accuracy, error rate, etc. With SVM, for the test data set from the UCI Machine Learning Repository, the quality metrics are shown in Figure 4, where 99.6 percent average accuracy is achieved for the whole test data.
Figure 4 - Quality metrics
SVM Performance Comparison between Intel DAAL and scikit-learn*
The SVM algorithm is a classic algorithm that many machine learning frameworks or libraries have in their packages. scikit-learn* is a popular Python library for machine learning. In the scikit-learn SVM classification example, the same training and testing data are used for a handwritten digit application. We compared the SVM performance between Intel DAAL and scikit-learn for both training and prediction. The results are shown in Figure 5.
Figure 5 - SVM performance boosts using Intel® Data Analytics Acceleration Library (Intel® DAAL) versus scikit-learn* on an Intel® CPU
The comparison is done on a system with two-socket Intel® Xeon® processors E5-2680 v3 @ 2.50 GHz, 24 cores, 30 MB L3 cache per CPU, 256 GB RAM. The operating system used is Red Hat Enterprise Linux* Server release 6.6, 64 bit. The versions of libraries used are Intel DAAL 2016 Update 2 and scikit-learn 0.16.1.
As shown in Figure 5, for both training and testing, SVM in Intel DAAL outperforms scikit-learn. In training, SVM in Intel DAAL is 6.77x faster than scikit-learn, and in testing or prediction, SVM in Intel DAAL is 11.25x faster than scikit-learn.
Handwritten Digit Recognition Application
As mentioned earlier, the trained model learned in a server can be ported to a client for its application. Here we have a simple handwritten digit recognition application where the trained model is ported. There is only prediction or testing in this interactive application. The snapshot of the interface of the application is shown in Figure 6 as a Doodle Frame. In the Doodle Frame, there are two white panel boxes. The left big white box is where a digit from 0 to 9 can be written one at a time; the right small white box displays the digit that the system believes is written on the left panel. In Figure 6, the number 3 is handwritten in the left box, and the system recognizes that number and then displays the inferenced result, 3, in the right box.
Figure 6 - Digital recognition application
For this application, we are only testing or predicting one digit at a time. For any written digit, a test CSV file is generated that contains the features that are extracted from the handwritten digit by using some preprocessing techniques. (The preprocessing is beyond the scope of this article and will not be covered.) Now that we have the test data, we still need a trained model for the object algorithm as inputs.
As mentioned above, we have already built a trained model, and have actually written the model in a file named model. Now it is time to load that trained model from the file, which means deserializing the model into memory. We define a ModelFileReader
to read from the file named model by calling reader.deserializeFromFile(pModel)
. The trained model is loaded into memory, where pModel
is the pointer to the model. The C++ codes are shown in Figure 7, where most of the codes are the same as in Figure 3. After algorithm.compute()
, we get the prediction result predictionResult1
, which contains the label, or the predicted number, for the given input digit.
services::SharedPtr<svm::prediction::Batch<> > prediction1(new
svm::prediction::Batch<>());
FileDataSource<CSVFeatureManager> testDataSource(testDatasetFileName,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);
testDataSource.loadDataBlock(1);
multi_class_classifier::prediction::Batch<> algorithm;
algorithm.parameter.prediction = prediction1;
ModelFileReader reader("./model");
services::SharedPtr<multi_class_classifier::Model> pModel(new
multi_class_classifier::Model());
reader.deserializeFromFile(pModel);
algorithm.input.set(classifier::prediction::data,
testDataSource.getNumericTable());
algorithm.input.set(classifier::prediction::model, pModel);
algorithm.compute();
predictionResult1 = algorithm.getResult();
predictedLabels1 = predictionResult1->get(classifier::prediction::prediction);
Figure 7 - C++ codes
Summary
Intel DAAL provides the building blocks for the whole pipeline in machine learning. With the SVM C++ code snippets, we’ve shown how to use Intel DAAL in an application: specifically, how to load data from a file, how to build the training model, and how to apply the trained model in the application.
Since the functions in Intel DAAL are optimized for Intel architecture, developers can get the best performance by using the building blocks in Intel DAAL on Intel platforms for machine learning applications. As we’ve seen, SVM in Intel DAAL outperforms SVM in scikit-learn.