Introduction
Machine learning is not at all a new concept, but its actual implementation is still far from reality for many businesses. Why? When they have big data problems, enterprises typically hire experts from different domains to help the data engineers in preparing their systems for data preprocessing, model training and predictions. This requires that you also manage the employees, or resources that are going to provide the domain-based expert suggestions to clean the data sources, conduct feature engineering, algorithms selection and so on. This process can take months and continues after domain experts and data engineers build and test the initial models. Different from this traditional approach, new machine learning (ML) tools are making it easier to automatically process the data and generate the predictive models faster and better.
R2 Learn is a cloud-based automated machine learning (AutoML) product that does exactly what is needed to fill the gap between the data engineer and a domain-expert data scientist. Unlike many other AutoML solutions today which only cover algorithm selection and parameter tuning, R2 Learn covers data quality check, data preprocessing, feature processing, algorithm selection, parameter tuning, model development, and model performance monitoring and optimization – the entire machine learning development and operation life cycle. This end-to-end automation and optimization of the ML workflow enables "one click, data in, model out".
The end-to-end process of R2 Learn
In this article we'll step you through the process of setting up an effective AutoML prediction model using R2 Learn, demonstrating how the product streamlines what is otherwise a process that requires extensive expert intervention.
Building the Model
R2 Learn requires an account, and provides a 14-day free trial account for you to evaluate the product. First, for this example, create a new trial account for the R2 Learn service. This gives you access to the tools needed to work through this example or your own project data. You can import CSV data, run training on your own data sets, and generate predictions.
To start, R2 Learn enables you to create a project and setup basic information for it, such as the type of project, a description, and so on.
Next, R2 Learn provides 2 ways to choose a data set. One way is using their own data sets to explore the service. The other way is to import your own data. R2 Learn accepts data sets in CSV or Excel file formats or as SQL scripts. We will be using this publicly available data set of California Housing data to explore the service. This data set contains data about different properties and the area. We will use this information to predict the pricing of a house, when provided with the region, bedroom, bathroom, house size, area population and other necessary information.
This problem is a regression problem—since we try to predict a value. The file is a CSV file, and you can simply upload it to the R2 Learn platform. R2 Learn will then perform the basic ETL operations on the data. Once this part is done, platform will show you with the data and will let you define the variable that has to be predicted and the variables that are to be used as predictors (basic dependent and independent variables here).
Data quality directly determines the modeling outcome. R2 Learn can automatically detect data type mismatches, missing values outliers and supports abnormal data preview. The platform can automatically fix issues for the target variables and predictors, or let you manually fix it.
The easiest way to plan out the model training is by using R2 Learn’s automatic process. Once you click on Automatic Modeling, R2 Learn will automatically start the training process and provide you with the trained models for deployment. This one-click automatic modeling process is especially helpful for people with limited AI expertise.
If you decide to perform these actions manually, R2 Learn provides "Advanced Modeling" function to support you in the selection process, with which you have more control over the dependent and independent variables and override the suggestions "Automatic Modeling" otherwise makes. You can decide which variables you want and which you can ignore. R2 Learn also allows you to verify the integrity of the dataset and validate the data source to remove any null values, and to also perform any further transformation on the data.
In Advanced Modeling view you can select the variables you want to select for training purposes, R2 Learn provides you with support in selection process—as it shows the importance factor of a variable. You can also check the correlation matrix to better understand this too. Another major benefit that Advanced Modeling provides is the selection of algorithm for learning process. Under the Advanced Modeling Setting tab, you can find these settings and options. R2 Learn supports dozens of most effective learning algorithms available in the market and continue to expand their selection every quarter.
When you are happy with this, you can then select to proceed with the modeling phase. In this phase, you are presented with the information about the model statistics, including an evaluation of the model's quality.
A complete report of the training stage is provided so that you can decide for yourself. R2 Learn continues to train and learn from the data to provide better (or different) trained models. You can choose which of the models you want to publish for users to access.
Making Predictions with the Model
Once you have trained a model and are ready to start making predictions, R2 Learn supports two ways to generate the predictions for the input data. One of the ways is that you can bring in your input data as a file and use the online portal to run predictions on the data. The other way is using an API for each trained model.
First, let's explore the data import option. You can use the CSV file format to input all the data and upload the file to the service. R2 Learn will process the input file and generate the response file for you. Remember that you need to provide all the data that is necessary for the model to be predicted. For this quick test, an easy way to test this is to take the top two rows of the dataset (housing.csv file).
longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,ocean_proximity-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,NEAR
BAY
Create a new file for this and upload this file. R2 Learn will return a file in response that will contain the results for this query. In most cases, you will not need to use this approach, but R2 Learn supports it in case you ever need it.
The API is a RESTful HTTP endpoint that can be used to predict the results for our input. The endpoints for the service are given on the same page, and you can capture most parameters from there. Once you have them, you can use any of your favorite programming language to write the HTTP client or use an API debugger like Postman. We will explore this option using Postman.
R2 Learn requires that you provide the following headers to the service,
- POST method for HTTP request
- Endpoint where model is deployed (https://admin.r2.ai/api/deploy)
- Token header (your secret code)
- Deployment ID (specific to the project and the deployment)
- Data (the JSON payload)
Enter them as seen in the image below,
A couple of things to note here include the data that you must pass, in JSON notation you must submit an array of JSON documents, and each document must contain the fields that are required by the platform to predict the values.
To help you with the test, here is the text version of the JSON data payload above.
[{
"longitude": "-122.23",
"latitude": "37.88",
"housing_median_age": "41.0",
"total_rooms": "880",
"total_bedrooms": "129",
"population": "322",
"households": "126",
"median_income": "8.3252",
"ocean_proximity": "NEAR BAY"
}]
You can use this and send a request to your deployment. The same applies to any HTTP client: you need to provide these headers, and you can use any of your favorite languages’s libraries for this, such as HttpClient from the .NET Framework.
If a value is missing, platform will return an error code.
There are other error codes that you need to look for, and you can find those errors on the deployment page as well. If the query is successful, you will receive the prediction result in JSON notation.
Once the models are in operation, R2 Learn can monitor the model performance and refit the models to keep them fresh and up-to-date.
Conclusion
As you can see, instead of relying on experience and intuition to find the best model for the task as hand, which can be very slow, tedious, expensive and flawed process, R2 Learn makes it easy for data scientists or data engineers to get from importing a dataset, to training models and getting predictions in just a few steps. It doesn't require setting up unfamiliar software libraries or learning a new language. R2 Learn makes AutoML accessible both on-premises and as a SaaS offering. You get a transparent and easily interpreted modeling process, high quality models, quick modeling and self-learning capability all in one platform. It enables businesses and users to develop, deploy, monitor and optimize models efficiently and intelligently.
Give R2 Learn a try and see how quickly and easily you're making trained predictive models. Just sign in on www.r2.ai/product for the 14-day free trial, during which you can explore how R2 Learn can help your data modeling tasks, and uncover underlying data values.