Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Getting Started with Machine Learning DotNet for Clustering Model

0.00/5 (No votes)
31 Oct 2018 2  
In this article, we will see how to work on Clustering model for predicting the Mobile used by model, Sex, before 2010 and After 2010 using the Clustering model with ML.NET.

Introduction

In Build 2018, Microsoft introduced the preview of ML.NET (Machine Learning .NET) which is a cross platform, open source machine learning framework. Yes, now it's easy to develop our own Machine Learning application or develop costum module using Machine Learning framework. ML.NET is a machine learning framework which was mainly developed for .NET developers. We can use C# or F# to develop ML.NET applications. ML.NET is an open source which can be run on Windows, Linux and macOS. The ML.NET is still in development and now we can use the preview version to work and play with ML.NET.

Reference link: Introducing ML.NET: Cross-platform, Proven and Open Source Machine Learning Framework

In this article, we will see how to develop our ML.Net application for Clustering Model.

Machine Learning for Clustering Model

Machine Learning is nothing but a set of programs which is used to train the computer to predict and display the output for us. Example live applications which are using Machine Learning are Windows Cortana, Facebook News Feed, Self-Driving Car, Future Stock Prediction, Gmail Spam detection, Pay pal fraud detection, etc.

In Machine Learning, there is 3 main types:

  • Supervised learning: Machine gets labelled inputs and their desired outputs, example, we can say as Taxi Fare detection.
  • Unsupervised learning: Machine gets inputs without desired outputs, example, we can say as Customer Segmentations.
  • Reinforcement learning: In this kind of algorithm, we will interact with the dynamic interaction, example we can say as self-Driving Cars.

In each type, we will be using the Algorithm to train the Machine for producing the result. We can see the algorithm for each Machine Learning type.

  • Supervised learning has Regression and Classification Algorithms
  • Unsupervised learning has Clustering and Association Algorithms
  • Reinforcement learning has Classification and Control Algorithms

In my previous article, I explained about predicting Future Stock for an Item using ML.NET for the Regression Model for the Supervised learning.

In this article, we will see how to work on Clustering model for predicting the Mobile used by model, Sex, before 2010 and after 2010 using the Clustering model with ML.NET.

In this sample program, we will be using Machine Leaning Clustering model of ML.NET to predict the Customer Segmentation by Sex, Mobile Phone purchased before 2010 and After 2010 by mobile model. Some members can use Windows Mobile. Some members can use Samsung or Iphone, some members can use both Samsung and Windows or IPhone, to segment the members count by Sex, Mobile Phone purchased before 2010 and after 2010 by mobile model and to find the Cluster ID and Score value, we have created the sample program in this article. We have used a simple dataset with random members cluster count by Sex, Before 2010 and After 2010.

Reference link: ML.NET to cluster, Taxi fare predictor (regression), https://en.wikipedia.org/wiki/Machine_learning

Background

Things to Know Before Starting ML.NET

Initialize the Model

For working with Machine Learning, first we need to pick our best fit machine learning algorithm. Machine learning has Clustering, regression, classification and anomaly detection modules. Here in this article, we will be using the Clustering model for predicting the Customer Segmentation of Mobile phone usage.

Train

We need to train the machine learning model. Training is the process of analyzing input data by model. The training is mainly used for model to learn the pattern and save it as a trained model. For example, we will be creating a CSV file in our application and in the CSV file, we will be giving the Customer details as Male, Female, Before2010 and After2010 and MobilePhone type for the Input. We give more than 100 records in the CSV file as sample with all necessary details. We need to give this CSV file as input to our model. Our model needs to be trained and using this data, our model needs to be analyzed to predict the result. The Predicted result will be displayed to as Cluster ID and Score as Distance to us in our console application.

Score

Score here is not the same as our regression model where in Regression, we will be having the labeled input as well as labeled output, but for the Clustering model, we don’t have the desired output here in score will contain the array with squared Euclidean distances to the cluster centroids.

Reference link: ML.NET to cluster

Prerequisites

Make sure you have installed all the prerequisites in your computer. If not, then download and install Visual Studio 2017 15.6 or later with the ".NET Core cross-platform development" workload installed.

Using the Code

Step 1 - Create C# Console Application

After installing the prerequisites, click Start >> Programs >> Visual Studio 2017 >> Visual Studio 2017 on your desktop. Click New >> Project. Select Visual C# >> Windows Desktop >> Console APP (.NET Framework). Enter your project name and click OK.

Step 2 – Add Microsoft ML package

Right click on your project and click on Manage NuGet Packages.

Select Browse tab and search for Microsoft.ML.

Click on Install, I Accept and wait till the installation is complete.

We can see that the Microsoft.ML package was installed and all the references for Microsoft.ML have been added in our project references.

Step 3 – Creating Train Data

Now we need to create a Model training dataset. For creating this, we will add a CSV file for training the model. We will create a new folder called Data in our project to add our CSV files.

Add Data Folder

Right click the project and Add New Folder and name the folder as “Data”.

Creating Train CSV File

Right click the Data folder, click on Add >> New Item >> select the text file and name it as “custTrain.csv”.

Select the properties of the “StockTrain.csv”, change the Copy to Output Directory to “Copy always”.

Add your CSV file data like below:

Here, we have added the data with the following fields:

(Feature)

  • Male - Total No of phone using (Feature)
  • Female – Total No of phone using (Feature)
  • Before2010 – Total No of phone using (Feature)
  • After2010 – Total No of phone using (Feature)
  • MobilePhone – Mobile Phone Type.

Note: We need minimum 100 records of data to be added to train our Model.

Step 4 – Creating Class for Input Data and Prediction

Now, we need to create a class for Input Data and prediction. For doing this, right click our project and add new class and name it as “CustData.cs”.

In our class first, we need to import the Microsoft.ML.Runtime.Api for column and ClusterPrediction Class creation.

using Microsoft.ML.Runtime.Api;

Next, we need to add all our columns same like our CSV file in the same order in our class and set as the column 0 to 3.

class CustData
    {
        [Column("0")]
        public float Male;

        [Column("1")]
        public float Female;

        [Column("2")]
        public float Before2010;

        [Column("3")]
        public float After2010;
    } 

Creating prediction class. Now we need to create a prediction class and, in this class, we need to add our Prediction column. Here, we add PredictedLabel and Score column as PredictedCustId and Distances.Predicted Label will contains the ID of the predicted cluster. Score column contains an array with squared Euclidean distances to the cluster centroids. The array length is equal to the number of clusters. For more details, refer to this reference link.

Note: It is important to note that in the prediction column, we need to set the column name as the “Score”, also set the data type as the float[] for Score and for PredictedLabel set as uint.

public class ClusterPrediction
    {
        [ColumnName("PredictedLabel")]
        public uint PredictedCustId;

        [ColumnName("Score")]
        public float[] Distances;
    }

Step 5 – Program.cs

To work with ML.NET, we open our “program.cs” file and first, we import all the needed ML.NET references.

using Microsoft.ML.Legacy;
using Microsoft.ML.Legacy.Data;
using Microsoft.ML.Legacy.Trainers;
using Microsoft.ML.Legacy.Transforms;

Also, import the below to your program.cs file.

using System.Threading.Tasks; 
using System.IO;

Dataset Path

We set the custTrain.csv data and Model data path. For the traindata, we give “custTrain.csv” path.

The final trained model needs to be saved for producing results. For this, we set modelpath with the “custClusteringModel. zip” file. The trained model will be saved in the zip file automatically during runtime of the program, our bin folder with all needed files.

static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "custTrain.csv");
static readonly string _modelPath = 
       Path.Combine(Environment.CurrentDirectory, "Data", "custClusteringModel.zip");

Change the Main method to async Task Main method like the below code:

static async Task Main(string[] args)
        {
             
        }

Before doing this, we need to perform 2 important tasks to successfully run our program:

First is to set Platform Target as x64.The ML.NET only runs in x64, for doing this, right click the project and select properties >> Select Build and change the Platform target to x64.

In order to run with our async Task Main method, we need to change the Language version to C#7.1.

In the Project Properties >> Build tab >> click on Advance button at the bottom and change the Language Version to C#7.1.

Working with Training Model

First, we need to train the model and save the model to the zip file for this in our main method, we call the predictionModel method and pass the CustData and ClusterPrediction class and return the model to the main method.

static async Task Main(string[] args)
        {
            PredictionModel<CustData, ClusterPrediction> model = await Train();
        }

  public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()
        {
        }

Train and Save Model

In the above method, we add the function to train the model and save the model to the zip file.

LearningPipeline

In training, the first step will be working the LearningPipeline().

The LearningPipeline loads all the training data to train the model.

TextLoader

The TextLoader used to get all the data from train CSV file for training and here, we set as the useHeader:true to avoid reading the first row from the CSV file.

ColumnConcatenator

Next, we add all our features columns to be trained and evaluate.

Adding Learning Algorithm

KMeansPlusPlusClusterer

The learner will train the model. We have selected the Clustering model for our sample and we will be using KMeansPlusPlusClusterer learner. KMeansPlusPlusClusterer is one of the clustering learners provided by ML.NET. Here, we add the KMeansPlusPlusClusterer to our pipeline.

We also need to set the K value as how many clusters we are using for our model. Here, we have 3 segments as Windows Mobile, Samsung and Apple so we have set K=4 in our program for the 3 clustering.

Train and Save Model

Finally, we will train and save the model from this method.

public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()
        {
            // Start Learning
            var pipeline = new LearningPipeline();
             
            // Load Train Data
            pipeline.Add(new TextLoader(_dataPath).CreateFrom<CustData>
                                  (useHeader: true, separator: ','));
            // </Snippet6>

            // Add Features columns
            pipeline.Add(new ColumnConcatenator(
                    "Features",
                    "Male",
                    "Female",
                    "Before2010",
                    "After2010"));
             
            // Add KMeansPlus Algorithm for k=3 (We have 3 set of clusters)
            pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 });
            

            // Start Training the model and return the model
            var model = pipeline.Train<CustData, ClusterPrediction>();
            return model; 
        } 

Prediction Results

Now it's time for us to produce the result of predicted results by model. For this, we will add one more class and, in this Class, we will give the inputs.

Create a new Class named as “TestCustData.cs“.

We add the values to the TestCustDataClass which we already created and defined the columns for Model training.

static class TestCustData
    {
        internal static readonly CustData PredictionObj = new CustData
        {
            Male = 300f,
            Female = 100f,
            Before2010 = 400f,
            After2010 = 1400f
        };
    }

We can see in our custTrain.csv file that we have the same data for the inputs.

Produce the Model Predicted Results

In our program main method, we will add the below code at the bottom after Train method calling to predict the result of ClusterID and distances and display the results from model to users in command window.

var prediction = model.Predict(TestCustData.PredictionObj);
            Console.WriteLine($"Cluster: {prediction.PredictedCustId}");
            Console.WriteLine($"Distances: {string.Join(" ", prediction.Distances)}");
            Console.ReadLine(); 

Build and Run

When we can run the program, we can see the result in the command window like below:

Points of Interest

ML.NET (Machine Learning DotNet) is a great framework for all the dotnet lovers who are all looking to work with machine learning. Now only preview version of ML.NET is available and I can’t wait till the release of public version of ML.NET. Here in this article, I have used the clustering for Unsupervised type. If you are .NET lovers, not aware about Machine Learning and looking forward to work with machine learning, then ML.Net is for you all and it's a great framework to get you started with ML.NET. Hope you enjoy reading this article and see you all soon with another post.

History

  • 2018-11-01: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here