Introduction
In this tutorial, we will setup a basic machine learning prediction model to run as an Amazon Web Services (AWS)* Lambda function in an AWS Greengrass* group. We will use basic K-Means clustering to train the module for motor fault prediction. The Lambda function will utilize the resources of the Greengrass Core, which be setup on an IEI Tank* AIoT Developer Kit. The IEI Tank AIoT Developer Kit comes with preinstalled developer tools and SDKs like the OpenVINO™ toolkit, Intel® Media SDK and Intel® System Studio 2018 to help accelerate your path to deployment. The Lambda function will send status updates of its ML prediction process to the Greengrass group using MQTT messages.
Prerequisites
IEI TANK with Ubuntu* 16.04 OS
AWS account
AWS Greengrass
AWS Greengrass* Setup
First, we will need to setup the Greengrass Core on the IEI TANK. Follow instructions in modules 1 and 2 in the linked documentation, Environment Setup for Greengrass and Installing the Greengrass Core Software in AWS Greengrass.
Go to AWS console, select Services from the top left ribbon, enter IoT in the search bar, and select IoT Core. On the IoT Core page, select Software from the bottom left. Download the AWS Greengrass Core SDK by clicking on Configure Download. Choose Python* 2.7 and click Download Greengrass Core SDK. After the package has loaded, untar it:
tar –xzvf greengrass-core-python-sdk-1.0.0.tar.gz
Go to the HelloWorld folder and unzip the file:
cd aws_greengrass_core_sdk/examples/HelloWorld
unzip greengrassHelloWorld.zip
Contents of the unzipped folder will be used later in the tutorial to create a zip folder for AWS Lambda.
IEI Tank* Setup
Because AWS Greengrass needs Python* 2.7, we need to install packages specifically for Python 2.7:
sudo apt install python-pip
sudo pip2 install pandas numpy matplotlib scipy sklearn
sudo pip2 install -U pandas numpy matplotlib scipy sklearn
Clone the Motor-Defect-Detector GitHub* repository and go the Kmeans folder:
git clone https:
cd motor-defect-detector/Kmeans/
We will be using the Bearing Data Set for K-means basic model training and prediction. Download the Bearing Data Set by going to the website.
Install the apps to extract the files:
sudo apt-get install p7zip-full unrar
Unzip the data set:
7za x IMS.7z
Extract the rar files (only the first and second test sets are used in this tutorial):
unrar x 1st_test.rar
unrar x 2nd_test.rar
Downgrading Code to Python* 2.7
Before we can use the GitHub repository code, we need to implement some changes to downgrade it from Python* 3.5 to Python 2.7, and run the training script. To modify the script on your own, follow these two steps.
In the Kmeans folder, open the kmeanstraining.py script and add to the first line:
from __future__ import print_statement
Replace input
to raw_input
throughout the file, like the following:
filedir_testset1 = raw_input("enter the complete directory path for the testset1")
Alternatively, you can also get the completely modified training script kmeanstrainingall.py from the Sample Code section of this article.
Training the Model
In the Kmeans folder, train the K-means model and follow the prompts:
python kmeanstrainingall.py
enter the complete directory path for the testset1 /<path-to>/motor-defect-detector/Kmeans/1st_test/
enter the complete directory path for the testset2 /<path-to>/motor-defect-detector/Kmeans/2nd_test/
Training is done on the Bearing Data Set to improve prediction of motor defects. The method outputs the kmeanModel.npy file which will be used in the actual prediction of motor defects.
AWS* Lambda Setup
In this section, we will create a kmeans.zip compressed folder and create the AWS Lambda function with it. Then, we will deploy the Lambda in our Greengrass group.
Copy the Greengrass files into the Kmeans folder:
cp –r <path-to>/aws_greengrass_core_sdk/examples/HelloWorld/greengrasssdk .
Create and move kmeans_test.py into the Kmeans folder from the Sample Code section of this article.
Compress files into a zip folder:
zip –r kmeans.zip greengrasssdk/ utils.py kmeanModel.npy kmeans_test.py
Go to AWS console, click Services on top left, put Lambda in search bar and click on it. The Lambda Management Console will open. Click Create function:
If not selected, select Author from scratch and fill out outlined fields:
Click Create function.
Upload kmeans.zip. Change handler name to kmeans_test.function_handler. Click Save:
Click on Actions, select Create new version and add a version description. Click Publish:
Go to the IoT Core console. Choose Greengrass from left-side menu, select Groups underneath it, and select your group from the main window:
Select Lambdas from the left-side menu. Click Add Lambda on right top corner of the screen:
Select Use Existing Lambda:
Select kmeans_test from the menu and click Next:
Choose the version and click Finish:
Click on the dotted area and select Edit Configuration:
Change Memory Limit to 1024 MB, Timeout to 25 seconds, and choose Lambda lifecycle to be a long-lived function:
Locate the needed environmental variables. For example, to locate Python packages like numpy, run this command:
locate 2.7/dist-packages/numpy
Add environmental variables and paths to the packages and 2nd_test folder as values:
Click Update on the bottom of the page.
Click the little grey back button, select Resources. Click on blue button Add a local resource:
Create a local resource to access the Kmeans folder on your IEI Tank. Attach kmeans_test Lambda to it with read and write access:
Create two more local resources for the Python packages folder and the 2nd_test folder, with read-only access. You should see a similar screen when you’re done:
Go to Subscriptions. Click Add Subscription or Add your first Subscription:
For the source, choose from the Lambdas tab, and select kmeans_test. For the target, select IoT Cloud:
Click Next. Add hello/world for the topic and click Next:
Click Finish.
On the group header, click Actions, select Deploy and wait until it is successfully completed:
Go to the AWS IoT console. Select Test from the left-side menu. Type hello/world in the topic field, change MQTT payload display to display it as strings, and click Subscribe to topic:
After some time, messages should display on the bottom of the screen:
Conclusion
We have successfully setup the basic K-means model for motor defect detection as a Lambda function. As the next step, you can explore the capability for automatic updates. One Lambda is setup to look for new test sets, and once found, it will trigger the automatic download of the new sets and create a new learning script based on those sets. Then the model will be updated to give new, improved predictions.
Sample Code
kmeanstraingall.py
from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster
from utils import cal_max_freq,create_dataframe,elbow_method
import os
try:
# reading all the files from the testset1, and testset2
filedir_testset1 = raw_input("enter the complete directory path for the testset1 ")
filedir_testset2 = raw_input("enter the complete directory path for the testset2 ")
all_files_testset1 = os.listdir(filedir_testset1)
all_files_testset2 = os.listdir(filedir_testset2)
# relative path of the dataset, after the current working directory
path_testset2 = "2nd_test/"
path_testset1 = "1st_test/"
testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5 = cal_max_freq(all_files_testset1,path_testset1)
testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5 = cal_max_freq(all_files_testset2,path_testset2)
except IOError:
print("you have entered either the wrong data directory path for either testset1 or testset2")
result1 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,7)
result2 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,0)
result3 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,2)
result3 = result3[:1800]
result4 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,1)
result4 = result4[:800]
#creating the final result
print("creating the final result")
frames = [result1,result3,result2,result4]
result = pd.concat(frames)
X = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]
#elbow method: to calculate the optimal no of cluster
#elbow_method(X)
#plt.show()
#clustering
print("clustering")
k_means = cluster.KMeans(n_clusters = 8,n_init = 10,max_iter = 1000,n_jobs = -1,random_state = 42)
kmeans_model = k_means.fit(X)
label = kmeans_model.labels_
#plot the labels
print("plotting the labels")
plt.scatter((np.array(range(1,len(result)+1))),label)
#save the model
print("saving the model")
filename = "kmeanModel.npy"
np.save(filename,kmeans_model)
kmeans_test.py
from __future__ import print_function
import time
from threading import Timer
import os
import greengrasssdk
import platform
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utils import cal_max_freq, plotlabels
# Creating a greengrass core sdk client
client = greengrasssdk.client('iot-data')
# Retrieving platform information to send from Greengrass Core
my_platform = platform.platform()
def kmeans_test_run():
client.publish(topic='hello/world', payload='Started kmeans test run.')
try:
filedir = os.environ.get("TESTSET2")
client.publish(topic='hello/world', payload='Got data dir.')
#filepath ="2nd_test/"
filepath = os.environ.get("TESTSET2FOLDER")
client.publish(topic='hello/world', payload='Got data folder.')
# load the files
all_files = os.listdir(filedir)
client.publish(topic='hello/world', payload='Got all files.')
freq_max1, freq_max2, freq_max3, freq_max4, freq_max5 = cal_max_freq(all_files, filedir)
client.publish(topic='hello/world', payload='Got all frequencies.')
except IOError:
print("you have entered either the wrong data directory path or filepath")
client.publish(topic='hello/world', payload='Wrong data dir or folder.')
# load the model
filename = "kmeanModel.npy"
model = np.load(filename).item()
client.publish(topic='hello/world', payload='Loaded K-means model.')
# checking the iteration
if (filepath == "1st_test/"):
rhigh = 8
else:
rhigh = 4
testlabels = []
for i in range(0,rhigh):
print("Checking for the bearing",i+1)
result = pd.DataFrame()
result['freq_max1'] = list((np.array(freq_max1))[:,i])
result['freq_max2'] = list((np.array(freq_max2))[:,i])
result['freq_max3'] = list((np.array(freq_max3))[:,i])
result['freq_max4'] = list((np.array(freq_max4))[:,i])
result['freq_max5'] = list((np.array(freq_max5))[:,i])
X = result[["freq_max1","freq_max2","freq_max3","freq_max4","freq_max5"]]
label = model.predict(X)
labelfive = list(label[-100:]).count(5)
labelsix = list(label[-100:]).count(6)
labelseven = list(label[-100:]).count(7)
totalfailur = labelfive+labelsix+labelseven#+labelfour
ratio = (totalfailur/100)*100
if(ratio >= 25):
client.publish(topic='hello/world', payload='Bearing is suspected to fail.')
else:
client.publish(topic='hello/world', payload='Bearing is in normal condition.')
testlabels.append(label[-100:])
# Asynchronously schedule this function to be run again in 5 seconds
Timer(5, kmeans_test_run).start()
# Start executing the function above
kmeans_test_run()
# This is a dummy handler and will not be invoked
# Instead the code above will be executed in an infinite loop for our example
def function_handler(event, context):
return
Learn More
About the Author
Rosalia Nyurguhun is a software engineer at Intel in the Core and Visual Computing Group, working on scale enabling projects for the Internet of Things.