Finally, we demonstrate how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem.
Data Science is a new recent field that put together lots of concepts of other areas such as: Data mining, Data Analysis, Data modeling, Data Prediction, Data Visualization and so on. The need for performing such tasks as quickly as possible has become the main issue in today's data solutions. With that in mind, the Intel DAAL, is a highly optimized library whose goal is to provide a full solution for data analytics targeting today's highly parallel systems such as Intel® Xeon Phi™ processors.
Intel DAAL delivers solutions for many steps of a data analytics pipeline, such as pre-processing, data transformations, dimensionality reduction, data modeling, prediction, and several drivers for reading and writing in most of the common data formats. A summary of all features inside the library can be seen in Figure 1.
Figure 1. Main algorithms delivered by Intel® Data Analytics Acceleration Library
As can be seen in Figure 1, all APIs are compatible with C++, Java*, and Python* (a recent addition available from version 2017 beta). Many of the algorithms implemented inside the tool can be executed in 3 main modes:
- Batch: in this mode, the processing occurs in a serial way, e.g., the training algorithm is executed in a single node sequentially;
- Distributed: as the name suggests, in this processing mode, the dataset must be split and distributed among the computing nodes. The algorithm then calculate partial solutions and, at the last step, unifies such solutions; and
- Online: in this processing mode, the data is considered as being a continuous stream. The processing occurs by building incremental models, and, at the end, building a full model from the partial models.
More on the processing modes, together with additional details on Data Management and how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem are covered in this whitepaper.
Source available on GitHub