Introduction
I don't know about everyone else but I've always had a fascination with neural nets and artificial intelligence in general. A fascination that I think was brought about by the fact that I knew absolutely nothing about it and it has always been one of those things that one day I would get round to looking at to see how they worked. This set of articles is the result of that fascination, one that has led me on to some quite surprising discoveries and into something's that just plain didn't make sense. Like the idea of a single neuron neural net. Please call it a function, wrap it in a class but don't give the single neuron neural net. It goes against everything that I think a neural net is supposed to be and don't these people ever read the books they write. I can honestly say that I have read books on neural networks and got to the end of the book none the wiser, because the theory got in the way. There's a lot of theory, some of it interesting, a lot of it up in the stratosphere where only serious mathematicians dare to venture. But some books on neural networks seem to use theory to bludgeon any chance you have of actually learning anything from the book right out of your head. There is also the wrong book at the wrong time perspective to take into account, sometimes you just aren't in the right state of mind to get the best out of a particular book. My biggest example of this is the James Joyce book - Ulysses. I have tried to read that book three times in my life, the first time I got a few chapters in and stopped because it bored me to tears. The second time I read it all the way through and found it quite enjoyable. The third time I got a few chapters in and thought he was spouting a load of crap. So, is Ulysses a good book? Is it worth reading? I would have to say that it is a difficult book and if you're not personally in the right place to get the most out of it, then you won't. And this applies to any book on neural networks as none of them are easy. The trick is not to be put off by the initial confusion and uncertainty but see it as a challenge that is going to take some time and study before it starts falling into place.
Finally, in desperation I turned to the code. Being a programmer, I could surely understand the code. Being only qualified in very basic math, the bit of the page that read "squiggle x over different squiggle z times two", actually meant "take this value at point i
in the array from that value at point n
in the other array and times it by two". It made sense, I didn't know why they were doing this but it was a start, and in my time, I have had to maintain code that didn't make anywhere near as much sense as that when I first looked at it.
It should be noted here that the original idea for this project was sequential releases each containing two working networks. This has been abandoned with the release containing all six networks that were originally envisaged. This document will occasionally make references to this original intention. Also, some of the class diagrams have marginal differences between the code and the diagram. This is due to the tandem development of the document and the code. Where the changes to the code made a difference to how the networks run, I have attempted to keep the diagrams up to date.
The project was developed with Developer Studio 2002 and Developer Studio 2003 using the .NET runtimes version 1.0 and 1.1. As the project was started on version 1.0 of the .NET framework, I see no reason why it should not compile and run on that framework although users of Developer Studio 2002 will have to rebuild the project files.
The networks themselves are set up to work at their best using the default values provided by the program, with options provided so that they can be played around with to see what happens. In many cases, changing the default options will obviously change the way the network behaves and can even prevent some networks from working altogether. As no attempt is made to save the optional changes to the networks, should you break a network, everything should work again if you restart the program.
What Is A Neural Net?
Don't you just hate those innocent little questions? I might as well just ask why? I suppose the traditional idea is that a neural net is like a miniature computer brain and that it is designed so that the individual cells can all communicate with each other and come together to solve a bigger problem, but just think about that from a coding perspective. How do you write a program that is so completely generic that it is capable of solving every problem you can throw at it with all the individual little pieces or neurons that are all identical coming together to do whatever it is you may decide to throw at them? Sounds really hard to me, what tends to happen though is you get two different types of networks, those that concentrate on trying to be biologically accurate and mimic the brain's neurons or nodes and those that concentrate on the specific task in hand. I suspect that, on balance, most of them tend to be a compromise between accuracy and functionality. I mean, it's fine to have something that perfectly models the brain's functionality, but if you can't get it to do anything then it's only of use for research purposes and these days people tend to want results more than they want research. Which means pure research remains in the hands of the lucky few.
In his book "The Essence of Neural Networks", Robert Callan in chapter one gives a brief set of rules that compromise a neural network. I give these here for two reasons. One, they are probably the most precise definition I have seen for a neural network and two, I can understand what they mean.
1. A Set Of Simple Processing Units.
A neural network is made up of neurons or nodes. These are meant to be simple processing units though if you've seen the math behind some of them, you may well be wondering "Simple to whom?". But, in essence, the nodes (I'm going to stick with calling them Nodes to save confusion as they are called neurons nodes and who knows what else in the literature and it's all done interchangeably, so from now on, they are nodes nothing else. Anything else is referring to something completely different.) The Nodes then are, from a programmer's perspective, a class that carries out a certain task or objective. As with any class, that task or objective is defined within the code which is defined by just what it is you want the class to do in the first place. For our purposes and within the accompanying code, a Neuron will be a collection of nodes, comprising in its simplest form, four nodes, two for the input, one for the bias node and one for the network work node, i.e. in the example code provided in part three, the Neuron will contain an Adaline node.
2. A Pattern of Connectivity
This is the way that the network is built and the way that the data flows through the network. Which is also how networks get their names. For example, the Adaline Network that we will be dealing with first, contains two input nodes, a bias node and an Adaline node. The Adaline node is the node that will do all the work by calling the run and the learn functions. There is no set limit to the amount of nodes that you can have within each neuron and no restriction on the way that the data is flowing. The data originates in the input nodes but once the networks get larger, then the data will be going through one node and can be passed forwards or backwards to another node for that node to process according to how it sees fit.
3. A Rule for Propagating Signals Through the Network
This is merely common sense. Whatever type of network it is that we are working with, there are certain results that we want to achieve and these are only going to be achieved by processing the data that the network is dealing with in a specific way. That way can be, to pass the data forward to an output node or back through the network for further processing, or alternatively even forward through the network for further processing. Either way, as with any other computer program, there are a set number of steps that we want to perform and usually only one or two ways that we can go about getting the correct result at the end.
4. A Rule For Combining Input Signals
This is basically the action that we are going to carry out on the data coming into the neural network. At this point, it doesn't really matter that we know what the answer will be, just that we know what we want to do with the information in the first place. This could be a mathematical function or the comparison of strings or objects.
5. A Rule For Calculating An Output Signal
This isn't necessarily the final output of the program but the output of that section of the code. If you think of it in terms of a function, then the output value of a network node is the return value of the function. This is normally a numerical value but there is absolutely no reason why it needs to be. For example, the Adaline network could quite easily return a Boolean true or false, that would in itself have no bearing on whether the node worked correctly or not.
6. A Learning Rule To Adapt the Weights.
A weight is a value given to the connection or link that helps in the learning process. This is updated on the fly by the learn function and naturally there should be a rule behind the way that this is done. Thinking about it though, seeing as the final goal of the network is to learn to generate the correct answers to the training data that is given to it. Then it seems that a perfectly good rule for updating the weights is to just randomly assign a value to it until something works. In theory, it should just take the network longer to work than it would were an explicit rule programmed in.
How Does The Network Learn?
The simple answer to this question is by trial and error but as usual, nothing is that simple. In order to look at this question, I'm going to talk about the Adaline network that you will see in section three of this series of articles. The following is a section from the output for the standard run of the Adaline 1 program.
Iteration number 172 produced 6 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 173 produced 5 Good values out of 250
Learn called at number 6 Pattern value = 1 Neuron value = -1
Iteration number 174 produced 6 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 175 produced 5 Good values out of 250
Learn called at number 6 Pattern value = 1 Neuron value = -1
Iteration number 176 produced 6 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 177 produced 5 Good values out of 250
Learn called at number 6 Pattern value = 1 Neuron value = -1
Iteration number 178 produced 6 Good values out of 250
Learn called at number 7 Pattern value = 1 Neuron value = -1
Iteration number 179 produced 7 Good values out of 250
Learn called at number 6 Pattern value = 1 Neuron value = -1
Iteration number 180 produced 6 Good values out of 250
Learn called at number 32 Pattern value = 1 Neuron value = -1
Iteration number 181 produced 32 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 182 produced 5 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 183 produced 5 Good values out of 250
Learn called at number 32 Pattern value = 1 Neuron value = -1
Iteration number 184 produced 32 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 185 produced 5 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 186 produced 5 Good values out of 250
Learn called at number 32 Pattern value = 1 Neuron value = -1
Iteration number 187 produced 32 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 188 produced 5 Good values out of 250
Learn called at number 5 Pattern value = 1 Neuron value = -1
Iteration number 189 produced 5 Good values out of 250
Iteration number 190 produced 250 Good values out of 250
As you can see from this example, the network took 190 attempts before it got the answers all right. The Adaline program basically compares two values that are within the range between -1 and 1. These values are randomly generated into a file that can be generated by the provided demonstration program that comes with the third article in the series. Basically, what the program works out is, if the first number given to it is less than the second number given to it, if it is, then the program will output a 1, and if it isn't, the output will be -1. The technical way of stating this would be that the network sums up the inputs and the weights of each node and then runs the summation through the transfer function that returns the output for the node. The training data when generated also generates the right answer to the problem and the network tests itself to see if it got the answer right or not. In this case, the network got for the most part about 6 correct answers with each run, yet on the 190th run, it got everything right. The program is written so that it keeps on going through the data until it gets them all correct which it did on attempt 190. So, what is going on here? Well, when the program calls the run function in the Adaline 1 program, the run function is basically:
for( int i=0; i<nCount; i++ )
{
dTotal += ( ( BasicLink )this.InputLinks[ i ] ).WeightedInputValue( nID );
}
this.NodeValues[ nID ] = TransferFunction( dTotal );
This code cycles through the links to the node and gets the weighted input value and then adds the complete total to the double
total variable. For now, all that you need to know is that the nID
value is equal to the nodevalue
constant stored in the Values
class (== 0) and that it is getting the first value in the input node that is being referred to in the InputLinks
array by i
. The important bit of the weighted input value looks like this:
dReturn = bnInputNode.GetValue( nID )
* ( ( double )arrayLinkValues[ Values.Weight ] );
which means times the value in the node by the weight value of the link. The weight value of the link is actually the first value in the link which is set in the Adaline Link constructor to :
arrayLinkValues[ Values.Weight ] = Values.Random( -1, 1 );
which as you can see is a random number between -1 and 1 which means that the network's first stab at getting the correct answer is nothing more than a guess. But as you can see in the loop above, the run function cycles through and uses the weight value for calculations, adding the totals to dTotal
. The dTotal
variable is then passed to the TransferFunction
which is another piece of simple code.
if( dValue < 0 )
return -1.0;
return 1.0;
which returns -1 if the value is less than 0 and 1 if the value of dTotal
is greater than 0. So, presume for a moment that we have a value in dTotal
and that according to the training set, the answer is -1 but the network returns a 1. The answer is wrong and the program would print out one of the lines above, saying that it had gotten a certain number right up until this point but it now had to call learn
because this one was wrong. The learn
function uses the delta rule or Widrow-Hoff rule which in programming terms is this:
NodeErrors[ Values.NodeError ] =
( ( double )NodeValues[ Values.NodeValue ] )*-2.0;
BasicLink link;
int nCount = InputLinks.Count;
double dDelta;
for( int i=0; i<nCount; i++ )
{
link = ( BasicLink )InputLinks[ i ];
dDelta = ( ( double )NodeValues[ Values.LearningRate ] )
* ( ( double )link.InputValue( Values.NodeValue ) )
* ( ( double )NodeErrors[ Values.NodeError ] );
link.UpdateWeight( dDelta );
}
First of all, the node's error value is set to equal to the node value * -2.0, and then the code cycles through each input link to the current node and updates the weight value for the link to the dDelta
value which is the result of the node values learning rate, which is set in the creation of the Adaline node to 0.45, though you can feel free to change this value to see how it affects the learning rate of the program. Anyway, back to the multiplication, the node value's learning rate is multiplied by the links input value and the result is multiplied by the error value that was set at the start of the function. It should probably be mentioned here that this is a simple network to enable people to learn and understand how it all works. Things can get a lot more complicated than this. You are just being eased into it gently. And it should also be noticed that although the example is simple, what you have here is the basis of a decision. As long as you know what the desired output is, this program can be modified to calculate it for you, and once it has been trained, it can search through huge amounts of data, checking things that don't fall within its required parameters.
Finally
The above is by no means all that I have to say on neural networks, but hopefully it will give the complete beginner an understanding of what the basics of a neural network are, and more importantly, how they go about learning to do what they do. In part two, I'll be describing the Basic classes that are behind all the upcoming programs, and in part three, we shall return to the Adaline network and get up close and personal. One last point that should be understood is that all code is officially experimental and changing the only thing that you can be sure of is that the code that released with each release will work. As such, I haven't made my mind up about backward compatibility yet. I will try to ensure that all previous programs will work with the latest versions of the code and the library. Keeping the neural net tester program will help enforce this but I already have some ideas that will require a complete rewrite of all previous code once I get to the point where I want to try it.
History
- 24 June 2003 :- Initial release.
- 19 October 2003 :- Review and edit for CP conformance.
References
- Tom Archer (2001) Inside C#, Microsoft Press
- Jeffery Richter (2002) Applied Microsoft .NET Framework Programming, Microsoft Press
- Charles Peltzold (2002) Programming Microsoft Windows With C#, Microsoft Press
- Robinson et al (2001) Professional C#, Wrox
- William R. Staneck (1997) Web Publishing Unleashed Professional Reference Edition, Sams.net
- Robert Callan, The Essence Of Neural Networks (1999) Prentice Hall
- Timothy Masters, Practical Neural Network Recipes In C++ (1993) Morgan Kaufmann (Academic Press)
- Melanie Mitchell, An Introduction To Genetic Algorithms (1999) MIT Press
- Joey Rogers, Object-Orientated Neural Networks in C++ (1997) Academic Press
- Simon Haykin Neural Networks A Comprehensive Foundation (1999) Prentice Hall
- Bernd Oestereich (2002) Developing Software With UML Object-Orientated Analysis And Design In Practice, Addison Wesley
- R Beale & T Jackson (1990) Neural Computing An Introduction, Institute Of Physics Publishing
Thanks
Special thanks go to anyone involved in TortoiseCVS for version control.
All UML diagrams were generated using Metamill version 2.2.