…continuation from the previous post.
Once the model is built and Loss and Validation functions satisfy our expectation, we need to validate and test the model using the data which was not part of the training data set (unseen data). The model validation is very important because we want to see if our model is trained well, so that can evaluate unseen data approximately same as the training data. Otherwise, the model which cannot predict the output is called overfitted model. Overfitting can happen when the model was trained long enough that shows very high performance for the training data set, but for the testing data, evaluate bad results.
We will continue with the implementation from the prevision two posts, and implement model validation. After the model is trained, the model and the trainer are passed to the Evaluation method. The evaluation method loads the testing data and calculates the output using passed model. Then it compares calculated (predicted) values with the output from the testing data set and calculated the accuracy. The following source code shows the evaluation implementation.
private static void EvaluateIrisModel(Function ffnn_model, Trainer trainer, DeviceDescriptor device)
{
var dataFolder = "Data";
var trainPath = Path.Combine(dataFolder, "testIris_cntk.txt");
var featureStreamName = "features";
var labelsStreamName = "label";
var feature = ffnn_model.Arguments[0];
var label = ffnn_model.Output;
var streamConfig = new StreamConfiguration[]
{
new StreamConfiguration(featureStreamName, feature.Shape[0]),
new StreamConfiguration(labelsStreamName, label.Shape[0])
};
var testMinibatchSource = MinibatchSource.TextFormatMinibatchSource(
trainPath, streamConfig, MinibatchSource.InfinitelyRepeat, true);
var featureStreamInfo = testMinibatchSource.StreamInfo(featureStreamName);
var labelStreamInfo = testMinibatchSource.StreamInfo(labelsStreamName);
int batchSize = 20;
int miscountTotal = 0, totalCount = 20;
while (true)
{
var minibatchData = testMinibatchSource.GetNextMinibatch((uint)batchSize, device);
if (minibatchData == null || minibatchData.Count == 0)
break;
totalCount += (int)minibatchData[featureStreamInfo].numberOfSamples;
var labelData = minibatchData[labelStreamInfo].data.GetDenseData<float>(label);
var expectedLabels = labelData.Select(l => l.IndexOf(l.Max())).ToList();
var inputDataMap = new Dictionary<Variable, Value>() {
{ feature, minibatchData[featureStreamInfo].data }
};
var outputDataMap = new Dictionary<Variable, Value>() {
{ label, null }
};
ffnn_model.Evaluate(inputDataMap, outputDataMap, device);
var outputData = outputDataMap[label].GetDenseData<float>(label);
var actualLabels = outputData.Select(l => l.IndexOf(l.Max())).ToList();
int misMatches = actualLabels.Zip(expectedLabels, (a, b) => a.Equals(b) ? 0 : 1).Sum();
miscountTotal += misMatches;
Console.WriteLine($"Validating Model: Total Samples = {totalCount},
Mis-classify Count = {miscountTotal}");
if (totalCount >= 20)
break;
}
Console.WriteLine($"---------------");
Console.WriteLine($"------TESTING SUMMARY--------");
float accuracy = (1.0F - miscountTotal / totalCount);
Console.WriteLine($"Model Accuracy = {accuracy}");
return;
}
The implemented method is called in the previous Training method.
EvaluateIrisModel(ffnn_model, trainer, device);
As can be seen, the model validation has shown that the model predicts the data with high accuracy, which is shown in the following image:
This is the latest post in the series of blog posts about using Feed forward neural networks to train the Iris data using CNTK and C#.
The full source code for all three samples can be found here.
Filed under: .NET, C#, CNTK, CodeProject
Tagged: .NET, C#, CNTK, Code Project, CodeProject, Machine Learning