In this little post, I will go through a small and very basic prediction engine written in C# for one of my projects.
Basically, what it does is the following:
- It will collect data in the form of lists of strings.
- Given an input, it will give back a list of predictions of the next item. In falling probability order.
So no rocket science, but a good and basic prediction engine that can be built upon depending on what scenarios you want to solve.
First, Some Unit Tests
[TestMethod]
public void PredictorTest_NoMatch()
{
var target = new Predictor();
var actual = target.Predict("The rabbit-hole went straight on like a tunnel for some way");
Assert.AreEqual(0, actual.Count);
}
[TestMethod]
public void PredictorTest_SingleMatch()
{
var target = new Predictor();
target.Predict("Alice opened the door and found that it led into a small passage");
var actual = target.Predict("Alice opened the door");
Assert.AreEqual(1, actual.Count);
Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_OtherNoise()
{
var target = new Predictor();
target.Predict("Alice opened the door and found that it led into a small passage");
target.Predict("Alice took up the fan and gloves");
var actual = target.Predict("Alice opened the door");
Assert.AreEqual(1, actual.Count);
Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_TwoResultsInOrder()
{
var target = new Predictor();
target.Predict("Alice thought she might as well wait");
target.Predict("Alice thought this a very curious thing");
target.Predict("Alice thought she had never seen such a curious");
var actual = target.Predict("Alice thought");
Assert.AreEqual(2, actual.Count);
Assert.AreEqual("she", actual.First());
Assert.AreEqual("this", actual.Last());
}
So, nothing too fancy. You provide it with input and as expected, it returns a list of possible next values. Examples are from Alice's Adventures in Wonderland by Lewis Carroll, the Project Gutenberg edition.
Predictor Class
public class Predictor
{
private readonly Dictionary<string, Dictionary<string, int>>
_items = new Dictionary<string, Dictionary<string, int>>();
private readonly char[] _tokenDelimeter = {' '};
public List<string> Predict(string input)
{
var tokens = input.Split(_tokenDelimeter, StringSplitOptions.RemoveEmptyEntries);
var previousBuilder = new StringBuilder();
Dictionary<string, int> nextFullList;
foreach (var token in tokens)
{
nextFullList = GetOrCreate(_items, previousBuilder.ToString());
if (nextFullList.ContainsKey(token))
nextFullList[token] += 1;
else
nextFullList.Add(token, 1);
if (previousBuilder.Length > 0)
previousBuilder.Append(" ");
previousBuilder.Append(token);
}
nextFullList = GetOrCreate(_items, previousBuilder.ToString());
var prediction = (from x in nextFullList
orderby x.Value descending
select x.Key).ToList();
return prediction;
}
private static T GetOrCreate<T>(Dictionary<string, T> d, string key)
{
if (d.ContainsKey(key))
{
return d[key];
}
var t = Activator.CreateInstance<T>();
d.Add(key, t);
return t;
}
}
It's not smart, and it requires having seen quite a lot of samples to work well.
An additional improvement could be to add storage of previous/next token combinations so that if the full search doesn't yield any result, a guess could be made based on only the previous word. But I guess that is up to you.
All code provided as-is. This is copied from my own code-base, and may need some additional programming to work.
CodeProject
Until next time: Work to Live, Don’t Live to Work