(untagged)

IEI Tank AIoT Developer Kit and AWS Greengrass: Running Machine Learning Prediction on the Edge

Intel Corporation

20 Nov 2018

In this tutorial, we will setup a basic machine learning prediction model to run as an Amazon Web Services (AWS) Lambda function in an AWS Greengrass group.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

In this tutorial, we will setup a basic machine learning prediction model to run as an Amazon Web Services (AWS)* Lambda function in an AWS Greengrass* group. We will use basic K-Means clustering to train the module for motor fault prediction. The Lambda function will utilize the resources of the Greengrass Core, which be setup on an IEI Tank* AIoT Developer Kit. The IEI Tank AIoT Developer Kit comes with preinstalled developer tools and SDKs like the OpenVINO™ toolkit, Intel® Media SDK and Intel® System Studio 2018 to help accelerate your path to deployment. The Lambda function will send status updates of its ML prediction process to the Greengrass group using MQTT messages.

Prerequisites

IEI TANK with Ubuntu* 16.04 OS

AWS account

AWS Greengrass

AWS Greengrass* Setup

First, we will need to setup the Greengrass Core on the IEI TANK. Follow instructions in modules 1 and 2 in the linked documentation, Environment Setup for Greengrass and Installing the Greengrass Core Software in AWS Greengrass.

Go to AWS console, select Services from the top left ribbon, enter IoT in the search bar, and select IoT Core. On the IoT Core page, select Software from the bottom left. Download the AWS Greengrass Core SDK by clicking on Configure Download. Choose Python* 2.7 and click Download Greengrass Core SDK. After the package has loaded, untar it:

tar –xzvf greengrass-core-python-sdk-1.0.0.tar.gz

Go to the HelloWorld folder and unzip the file:

cd aws_greengrass_core_sdk/examples/HelloWorld
unzip greengrassHelloWorld.zip

Contents of the unzipped folder will be used later in the tutorial to create a zip folder for AWS Lambda.

IEI Tank* Setup

Because AWS Greengrass needs Python* 2.7, we need to install packages specifically for Python 2.7:

sudo apt install python-pip
sudo pip2 install pandas numpy matplotlib scipy sklearn
sudo pip2 install -U pandas numpy matplotlib scipy sklearn

Clone the Motor-Defect-Detector GitHub* repository and go the Kmeans folder:

git clone https://github.com/intel-iot-devkit/motor-defect-detector.git
cd motor-defect-detector/Kmeans/

We will be using the Bearing Data Set for K-means basic model training and prediction. Download the Bearing Data Set by going to the website.

Install the apps to extract the files:

sudo apt-get install p7zip-full unrar

Unzip the data set:

7za x IMS.7z

Extract the rar files (only the first and second test sets are used in this tutorial):

unrar x 1st_test.rar 
unrar x 2nd_test.rar

Downgrading Code to Python* 2.7

Before we can use the GitHub repository code, we need to implement some changes to downgrade it from Python* 3.5 to Python 2.7, and run the training script. To modify the script on your own, follow these two steps.

In the Kmeans folder, open the kmeanstraining.py script and add to the first line:

from __future__ import print_statement

Replace input to raw_input throughout the file, like the following:

filedir_testset1 = raw_input("enter the complete directory path for the testset1")

Alternatively, you can also get the completely modified training script kmeanstrainingall.py from the Sample Code section of this article.

Training the Model

In the Kmeans folder, train the K-means model and follow the prompts:

python kmeanstrainingall.py
enter the complete directory path for the testset1 /<path-to>/motor-defect-detector/Kmeans/1st_test/
enter the complete directory path for the testset2 /<path-to>/motor-defect-detector/Kmeans/2nd_test/

Training is done on the Bearing Data Set to improve prediction of motor defects. The method outputs the kmeanModel.npy file which will be used in the actual prediction of motor defects.

AWS* Lambda Setup

In this section, we will create a kmeans.zip compressed folder and create the AWS Lambda function with it. Then, we will deploy the Lambda in our Greengrass group.

Copy the Greengrass files into the Kmeans folder:

cp –r <path-to>/aws_greengrass_core_sdk/examples/HelloWorld/greengrasssdk .

Create and move kmeans_test.py into the Kmeans folder from the Sample Code section of this article.

Compress files into a zip folder:

zip –r kmeans.zip greengrasssdk/ utils.py kmeanModel.npy kmeans_test.py

Go to AWS console, click Services on top left, put Lambda in search bar and click on it. The Lambda Management Console will open. Click Create function:

If not selected, select Author from scratch and fill out outlined fields:

Click Create function.

Upload kmeans.zip. Change handler name to kmeans_test.function_handler. Click Save:

Click on Actions, select Create new version and add a version description. Click Publish:

Go to the IoT Core console. Choose Greengrass from left-side menu, select Groups underneath it, and select your group from the main window:

Select Lambdas from the left-side menu. Click Add Lambda on right top corner of the screen:

Select Use Existing Lambda:

Select kmeans_test from the menu and click Next:

Choose the version and click Finish:

Click on the dotted area and select Edit Configuration:

Change Memory Limit to 1024 MB, Timeout to 25 seconds, and choose Lambda lifecycle to be a long-lived function:

Locate the needed environmental variables. For example, to locate Python packages like numpy, run this command:

locate 2.7/dist-packages/numpy

Add environmental variables and paths to the packages and 2nd_test folder as values:

Click Update on the bottom of the page.

Click the little grey back button, select Resources. Click on blue button Add a local resource:

Create a local resource to access the Kmeans folder on your IEI Tank. Attach kmeans_test Lambda to it with read and write access:

Create two more local resources for the Python packages folder and the 2nd_test folder, with read-only access. You should see a similar screen when you’re done:

Go to Subscriptions. Click Add Subscription or Add your first Subscription:

For the source, choose from the Lambdas tab, and select kmeans_test. For the target, select IoT Cloud:

Click Next. Add hello/world for the topic and click Next:

Click Finish.

On the group header, click Actions, select Deploy and wait until it is successfully completed:

Go to the AWS IoT console. Select Test from the left-side menu. Type hello/world in the topic field, change MQTT payload display to display it as strings, and click Subscribe to topic:

After some time, messages should display on the bottom of the screen:

Conclusion

We have successfully setup the basic K-means model for motor defect detection as a Lambda function. As the next step, you can explore the capability for automatic updates. One Lambda is setup to look for new test sets, and once found, it will trigger the automatic download of the new sets and create a new learning script based on those sets. Then the model will be updated to give new, improved predictions.

Sample Code

kmeanstraingall.py

from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster
from utils import cal_max_freq,create_dataframe,elbow_method
import os

try:
    # reading all  the files from the testset1, and testset2
    filedir_testset1 = raw_input("enter the complete directory path for the testset1 ")
    filedir_testset2 = raw_input("enter the complete directory path for the testset2 ")
    all_files_testset1 = os.listdir(filedir_testset1)
    all_files_testset2 = os.listdir(filedir_testset2)

    # relative path of the dataset, after the current working directory
    path_testset2 = "2nd_test/"
    path_testset1 = "1st_test/"

    testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5 = cal_max_freq(all_files_testset1,path_testset1)
    testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5 = cal_max_freq(all_files_testset2,path_testset2)

except IOError:
    print("you have entered either the wrong data directory path for either testset1 or testset2")

result1 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,7)
result2 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,0)

result3 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,2)
result3 = result3[:1800]

result4 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,1)
result4 = result4[:800]


#creating the final result
print("creating the final result")
frames = [result1,result3,result2,result4]
result = pd.concat(frames)

X = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]

#elbow method: to calculate the optimal no of cluster
#elbow_method(X)
#plt.show()

#clustering
print("clustering")
k_means = cluster.KMeans(n_clusters = 8,n_init = 10,max_iter = 1000,n_jobs = -1,random_state = 42)
kmeans_model = k_means.fit(X)
label = kmeans_model.labels_

#plot the labels
print("plotting the labels")
plt.scatter((np.array(range(1,len(result)+1))),label)

#save the model
print("saving the model")
filename = "kmeanModel.npy"
np.save(filename,kmeans_model)

kmeans_test.py

from __future__ import print_function

import time
from threading import Timer
import os
import greengrasssdk
import platform

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utils import cal_max_freq, plotlabels


# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')

# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()


def kmeans_test_run():
    client.publish(topic='hello/world', payload='Started kmeans test run.')
    try:
        filedir = os.environ.get("TESTSET2")
        client.publish(topic='hello/world', payload='Got data dir.')
        #filepath ="2nd_test/"
        filepath = os.environ.get("TESTSET2FOLDER")
        client.publish(topic='hello/world', payload='Got data folder.')
        # load the files
        all_files = os.listdir(filedir)
        client.publish(topic='hello/world', payload='Got all files.')
        freq_max1, freq_max2, freq_max3, freq_max4, freq_max5  =  cal_max_freq(all_files, filedir)
        client.publish(topic='hello/world', payload='Got all frequencies.')
    except IOError:
        print("you have entered either the wrong data directory path or filepath")
        client.publish(topic='hello/world', payload='Wrong data dir or folder.')

    # load the model
    filename = "kmeanModel.npy"
    model = np.load(filename).item()
    client.publish(topic='hello/world', payload='Loaded K-means model.')
    # checking the iteration
    if (filepath == "1st_test/"):
        rhigh = 8
    else:
        rhigh = 4
    testlabels = []
    for i in range(0,rhigh):
        print("Checking for the bearing",i+1)
        result = pd.DataFrame()
        result['freq_max1'] = list((np.array(freq_max1))[:,i])
        result['freq_max2'] = list((np.array(freq_max2))[:,i])
        result['freq_max3'] = list((np.array(freq_max3))[:,i])
        result['freq_max4'] = list((np.array(freq_max4))[:,i])
        result['freq_max5'] = list((np.array(freq_max5))[:,i])

        X = result[["freq_max1","freq_max2","freq_max3","freq_max4","freq_max5"]]

        label = model.predict(X)
        labelfive = list(label[-100:]).count(5)
        labelsix = list(label[-100:]).count(6)
        labelseven = list(label[-100:]).count(7)
        totalfailur = labelfive+labelsix+labelseven#+labelfour
        ratio = (totalfailur/100)*100
        if(ratio >= 25):
            client.publish(topic='hello/world', payload='Bearing is suspected to fail.')
        else:
            client.publish(topic='hello/world', payload='Bearing is in normal condition.')

        testlabels.append(label[-100:])
    # Asynchronously schedule this function to be run again in 5 seconds
    Timer(5, kmeans_test_run).start()


# Start executing the function above
kmeans_test_run()


# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):
    return

Learn More

About the Author

Rosalia Nyurguhun is a software engineer at Intel in the Core and Visual Computing Group, working on scale enabling projects for the Internet of Things.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here