Introduction
I am extremely interested in machine learning. However, implementing neural nets and Bayesian graphs etc., can take a long time, and are subject to domain definition problems. While they are the cutting edge, sometimes it is nice to sit back, relax, and do something no rational computer scientist would do: implement a worthless algorithm.
Actually, the algorithm is not worthless. Designing the code and running through it reveals weaknesses in the approach that makes it easier to conceptualize more advanced methods and accurately define the domain.
This algorithm will break typed messages up into words, analyze the frequency of words, and then generate a random response based on the frequency of words, the minimum sentence size, and the maximum sentence size. Once implemented, it is a fun algorithm to play with (implement an IParticipant
for human interaction) as it sometimes makes some interesting combinations of words.
A comment about the code:
I do not like attaching Zip files full of code unless absolutely necessary. Half of the fun of something like this is in the implementation. It also helps with the understanding. Besides, I hate having to download a Zip so I can find the answer I want. If I pose a question in this tutorial and give an answer, it is in the tutorial without the need for a download, unzip, and grep!
Background
If you are unfamiliar with a Turing test, please visit: The Turing Test. Other interesting Turing problems are CAPTCHA's which are fun to implement as well. In fact, my next tutorial may be on a captcha.
Using the code
Below is some unexplained code that should be obvious but are probably required for understanding the other aspects of this tutorial. For reasons I don't understand, I have IParticpant
calling Conversation.addParticipant(IParticipant)
. There is probably a better design pattern implementation, I just wasn't spending time with design issues.
public struct Message{
public IParticipant speaker;
public Conversation conversation;
public DateTime timeStamp;
public string message;
public override string ToString() {
return timeStamp.ToShortTimeString() +
" " + speaker.getName() + ": " + message;
}
public Message(IParticipant speaker, Conversation
conversation, string message){
this.timeStamp = System.DateTime.Now;
this.speaker = speaker;
this.conversation = conversation ;
this.message = message;
}
}
public interface IParticipant{
string getName();
void newMessage(Message message);
void joinConversation(Conversation conversation);
}
public class Conversation{
List<IParticipant> participants = new List<IParticipant>();
public void speak(Message message){
foreach(IParticipant p in participants){
p.newMessage(message);
}
}
public void addParticipant(IParticipant participant){
if(!participants.Contains(participant)){
participants.Add(participant);
}
}
}
Below is the class that implements the parroting logic. This code is easy and straightforward in most cases, so I will not burden the reader with silly things like a lot of explanations.
public class Word : IComparer<Word>{
private string word;
private int frequency = 0;
public event Action<Word> FrequencyChanged;
public string getWord(){
return word;
}
public int getFrequency(){
return frequency;
}
public void incrementFrequency(){
frequency++;
onFrequencyChanged();
}
protected void onFrequencyChanged(){
if(FrequencyChanged != null){
FrequencyChanged(this);
}
}
public int Compare(Word x, Word y) {
return x.getFrequency().CompareTo(y.getFrequency());
}
public Word(string word){
this.frequency = 1;
this.word = word;
}
}
public class ParrotAlgorithm : IParticipant{
private static int instanceCount = 0;
private string name =
"Parrot #" + instanceCount.ToString();
private Conversation conversation = null;
private List<Word> list = new List<Word>();
private Dictionary<string, Word> hash =
new Dictionary<string,Word>();
private int minMessageSize = 3;
private int maxMessageSize = 10;
private Random random = new Random();
private double chaos = .2d;
private int wordCount = 0;
public string getName(){
return name;
}
private void addWord(string wordString){
wordCount++;
Word word = null;
if(!hash.ContainsKey(wordString)) {
word = new Word(wordString);
hash.Add(wordString, word);
list.Add(word);
word.FrequencyChanged +=
new Action<Word>(word_FrequencyChanged);
}
else{
word = hash[wordString];
word.incrementFrequency();
}
}
public void word_FrequencyChanged(Word obj){
list.Sort(obj);
}
private void respond(){
int wordLength = (int)(minMessageSize +
(random.NextDouble() *
(maxMessageSize - minMessageSize)));
double runningTotal = 0.0d;
string message = "";
for(int i=0;i<wordLength;i++){
if(random.NextDouble() < chaos){
message += list[(int)(random.NextDouble()
* list.Count)].getWord() + " ";
}
else{
double requiredOdds = random.NextDouble();
runningTotal = 0.0d;
foreach(Word w in list){
runningTotal += ((double)w.getFrequency()/
(double)wordCount);
if(runningTotal > requiredOdds){
message += w.getWord() + " ";
break;
}
}
}
}
if(message.Length > 0){
Message mm = new Message(this,
this.conversation, message);
conversation.speak(mm);
}
}
public void newMessage(Message message){
if(message.speaker != this){
string[] tokens =
message.message.Split(new char[]{' '});
if(tokens.Length > maxMessageSize)
maxMessageSize = tokens.Length;
foreach(string s in tokens){
addWord(s);
}
respond();
}
}
public void joinConversation(Conversation conversation){
this.conversation = conversation;
conversation.addParticipant(this);
}
public ParrotAlgorithm() {
instanceCount++;
}
}
The first bit of explanation is the Word
class. What is it for, why did you do that? It wraps words or tokens in a message, and stores associated information such as frequency. The event FrequencyChanged
is called whenever the frequency is updated, to allow the list to be resorted. This is probably not the most efficient method. Adding all of the new words and then sorting the list is, possibly, more efficient.
private void addWord(string wordString){
wordCount++;
Word word = null;
if(!hash.ContainsKey(wordString)) {
word = new Word(wordString);
hash.Add(wordString, word);
list.Add(word);
word.FrequencyChanged +=
new Action<Word>(word_FrequencyChanged);
}
else{
word = hash[wordString];
word.incrementFrequency();
}
}
The above code will add a word to the memory banks. A dictionary object, which I call hash
, is used to update the frequency for words that have already been used and stored. The list is a List<Word>
object (.NET 2.0 Generics) that stores the words in sorted order. This is important for getting the random words later.
private void respond(){
int wordLength = (int)(minMessageSize +
(random.NextDouble() * (maxMessageSize - minMessageSize)));
double runningTotal = 0.0d;
string message = "";
for(int i=0;i<wordLength;i++){
if(random.NextDouble() < chaos){
message += list[(int)(random.NextDouble() *
list.Count)].getWord() + " ";
}
else{
double requiredOdds = random.NextDouble();
runningTotal = 0.0d;
foreach(Word w in list){
runningTotal += ((double)w.getFrequency()/
(double)wordCount);
if(runningTotal > requiredOdds){
message += w.getWord() + " ";
break;
}
}
}
}
if(message.Length > 0){
Message mm = new Message(this, this.conversation, message);
conversation.speak(mm);
}
}
This method contains all of the brains behind the parrot.
int wordLength = (int)(minMessageSize +
(random.NextDouble() * (maxMessageSize - minMessageSize)));
Randomly select the length of the response. This code guarantees a minimum sentence size, and won't let the parrot respond with enormous sentences unless its peers do so as well.
if(random.NextDouble() < chaos){
Randomness is crucial. This is what really makes things interesting. There just has to be mutation. Eliminate this code or change the chaos value to see how it affects the results.
runningTotal += ((double)w.getFrequency()/(double)wordCount);
This function normalizes the value of the frequencies, making the sum of all frequencies == 1 (maybe). Therefore, you can loop, starting with the lowest frequency, and add up the normals to compare to the random number you desire. To be honest, I have not proved this statement, but it seems intuitively correct. Someone with a stats book and free-time is welcome to prove or disprove it for me. For the interim, it works for me.
Points of Interest