Scope
In this article, we'll see some of the basics of genetic algorithms, i.e. what they are, what they're good for, why we call them "genetic", and so on. This won't be a theoretical-only article, and we'll see how a simple genetic algorithm could be implemented, using Visual Basic .NET. Source code is provided to allow further experimentations.
Introduction
A genetic algorithm could be saw as a component og the artificial intelligence field. They are ultimately a search technique, which use metaheuristic to attain a particular goal, or search for a solution to a problem. We use the term "genetic" because, in order to solve a problem, we will use procedures inspired to the natural selection process. We don't want to simply find a solution: we want to create a code that could "grow", or "evolve" towards our goal, finding its way out by itself (well, almost).
Our natural world
In his «Principles of Biology», Herbert Spencer coined the expression "survival of the fittest", as a synonym for "most well adapted to the current environment". Darwin, in his «Origin of Species» used Spencer's quote alongside the concept of "natural selection", which could be intended as "better designed for an immediate, local environment". In any environment, the environment itself sets the standards to determine who is "fit".
For example, in a place with strong predators, a fundamental requirement to survive could be the ability to run away with great speed, or fight back with proficiency. In such a world, a fast runner or a strong fighter will be candidates for longer/better survival than those unprovided with those abilities. We could say the world poses a question like «how you will deal with those predators?», and those who can find a good solution to the riddle will be able to survive (that is an over-simplification, but we aren't biologists, after all!).
We can identify some parameters: a world to live in (or "a problem to solve"), a population that will face the world's challenges (or, individuals that will try to solve the problems posed by their natural conditions), a story regarding individuals which will pass on their experiences, their successful traits (and unsuccessful, too), their DNA, hoping their successors will grow better and better.
In computer science, we could take inspiration from biological approach to solve problems. Nature doesn't brute force solutions, it creates a multitude of possible solutions through recombinations of previous answers. Again, it's an over-simplification because in the real world challenges modify themselves, too, but the basis of the approach are still valid in a digital world. GAs (genetic algorithms) inherits some biology terms. We could identify: chromosome/individual, as a single solution to a given problem; population, as the array of possible solutions; gene, or a part (byte) of an individual; fitness, or the degree of adequacy of a certain solution; crossover/inbreeding, as the recombinations of solutions into new ones; mutation, or the random change which could occur into a solution.
Genetic algorithm overall
Let's consider a simple problem: we have a numeric target, let's say the number 5. This number can be saw as our world's challenge (or the solution domain). In our world we'll have inhabitants, and let's say the population is composed by a certain amount of mathematical operations.
We take for example 100 random operations for start. Each operation must confront itself with the challenge, trying to solve it. This match is called "fitness". Fitness could be defined, as we saw above regarding to biological beings, as the rate of adequacy in a given solution domain.
For example, the operation "0 + 0", will have a low fitness, because it cannot produce a value near to our target, but "0 + 5" will be a perfect valid answer. We must procede in evaluating every member of our population, calculating fitness for each of them. Through this process, we'll find the best (or "fittest") answers to our problems. At this point, we take some pairs of individuals amongst the best, "inbreeding" them to produce new elements that share characteristics from both parents, and repeat the process for our entire population (if we wish to mantain a specific number of elements, at the end of the process we must discard a number of the worst elements which equals the number of new elements created).
A new generation of solutions is born - nearer to our target, we hope - awaiting to be tested against our problem. Rinse and repeat until valid solutions are born. Living beings pass on their DNA, their genetic code, to their sons and daughters. In computer science, everything is composed by bytes, and we could consider them like a kind of digital DNA, a representation of what an individual is. In the above example, we could represent an operation using six bytes: the first and the last two to represent numbers between 0 and 255 (from 00 to FF), while the two bytes between them can represent the operator (for example 00 = +, 01 = -, etc.).
Through the breeding phase, we extract a part of those bytes from a parent, and a part from the other, and combining it we will have a new item. Like in a real environment, we could think of introducing a sort of "mutation", which can alter in some ways the genes passed on. In other words, GAs are about finding the best path to solve a particular problem, creating better and better solutions towards it. For further specific informations, check the Bibliography at the end of the article.
A pratical implementation
Words are good, but when it comes to programming, are pretty cheap, and a piece of code if worth thousands words. So, let's see how we can profit from GAs in a pratical way. In the following code, we'll discuss a graphical problem: given a target icon, and starting from a population composed by images made by random bytes, trying to "evolve" our random pixels to the target, hopefully reaching it completely.
Like i said in the opening, we'll use Visual Basic for this. The final user must be able to select an ICO file of 16x16 pixels, and the program will generate 100 random icons. For each of those icons, we will evaluate their fitness. This will be trivial: since an icon is an array of bytes, we'll simply match our target bytes with those of each population's element. The more they are alike, the better will be our image. For each generation of solutions, i will extract 4 element to inbreed, generating 4 new brand individuals, in which their parent's byte are reproduced, discarding the 4 worst element to mantain a constant populations of 100 elements.
Genetic image class
Our "genetic image" class can be written like this:
Public Class GAImage
Const _FIXED_BYTES As Integer = 42
Dim _targetBytes() As Byte
Dim _personalBytes() As Byte
Dim _isMutable() As Boolean
Public ReadOnly Property FixedBytes As Integer
Get
Return _FIXED_BYTES
End Get
End Property
Public Property PersonalBytes As Byte()
Get
Return _personalBytes
End Get
Set(value As Byte())
_personalBytes = value
End Set
End Property
Public ReadOnly Property Fitness As Single
Get
Dim _fitness As Single = 0
For ii As Integer = 0 To _targetBytes.Length - 1
If _targetBytes(ii) = _personalBytes(ii) Then
_fitness += 1
_isMutable(ii) = False
End If
Next
Return _fitness
End Get
End Property
Public ReadOnly Property FitnessPercent As Single
Get
Return (Me.Fitness * 100 / _targetBytes.Length).ToString("000.00")
End Get
End Property
Public Function isByteMutable(numByte As Integer) As Boolean
Return _isMutable(numByte)
End Function
Public Sub New(_target() As Byte, _code() As Byte, Optional randomInit As Boolean = True)
ReDim _personalBytes(_target.Length - 1)
ReDim _targetBytes(_target.Length - 1)
_targetBytes = _target
ReDim _isMutable(_target.Length - 1)
For ii As Integer = 0 To _isMutable.Length - 1
_isMutable(ii) = True
Next
For ii As Integer = 0 To _FIXED_BYTES
_personalBytes(ii) = _targetBytes(ii)
Next
If Not randomInit Then Exit Sub
Randomize()
For ii As Integer = _FIXED_BYTES + 1 To _targetBytes.Length - 1
_personalBytes(ii) = _code(Int(Rnd() * (_code.Length - 1)))
Next
End Sub
End Class
Its constructor asks for a copy of target's bytes, a genetic pool to extract from, and a flag to determine if a random initialization must take plase (we'll see why in few lines). The class exposes some properties: PersonalBytes allows access to the "genetic code" of our individual, returning an array of its bytes; Fitness calculates how much the current image and the target are alike.
With the help of _isMutable array, we will set if a particular byte is resistent to mutation (miming a solid genetic acquired trait); FitnessPercent simply expose the preious value in a percentual format; FixedBytes ha a technical-only utility: since the first 42 bytes represent the icon's header, those must be mantained from target to individuals, in order to produce a set of valid icons. In that class we have all it takes to produce full-fledged genetic icons.
We need an environment in which our play could take place. A simple form in which observe how our generations changes will do.
The playground
Let's take a look at the final outlook our Form will have :
The button "Choose target" allows the user to choose an icon, which will be the image we want to reach evolving our solutions. One hundred PictureBox are dinamically created to contain the output of each GAImage. Random GAImages are generated, and we can see none of them has apparent correlation with our target (lots of chaos in them!). Still, evaluating them, we could identify the better samples, and start our loop of breeding.
Let's see a typical execution, to discuss later each part of it.
I will skip entirely the parts related to control's creation or use, which the reader could see in the source code. What will be discussed here is related exclusively to the aspects of GA. Most of the code below will be incomplete for the sake of comprenhension. I suggest to download the source code to have the most possible complete overview.
Create a start solutions set
In opening our target image, we could access to its bytes. The bytes of the target image will be our ideal genetic code, and each byte represent an element of the genetic pool each created element draws by for its personal DNA.
Private Sub LoadImage(path As String)
Dim b As New Icon(path)
pbOriginal.Image = b.ToBitmap
Dim ms As New MemoryStream
b.Save(ms)
ReDim _targetBytes(ms.ToArray.Length - 1)
_targetBytes = ms.ToArray
_GACode = New List(Of Byte)
Dim _tmpCode As New List(Of Byte)
_tmpCode.AddRange(_targetBytes)
_GACode.AddRange(From bt As Byte In _tmpCode Distinct Select bt)
End Sub
The target icon is written in the pbOriginal PictureBox. After that, the image's bytes are accessed through a MemoryStream and written in the _targetBytes array. From that array, we select distinct values to populate a list of possible "genes" to be used when creating new icons. Think about this as our genetic code: DNA is made from adenine, guanine, cytosine, thymine. Our icons genetic code must follow a non completely random path: is the original set doesn't contain the byte AF, for example, our starting byte pool won't contain it either (i.e. it will be impossible to generate a specimen that contains a byte which is missed in the "standard" pool).
Private Sub InitializePopulation()
_GAPopulation = New List(Of GAImage)
For ii As Integer = 0 To _SAMPLE_NUMBER - 1
Dim _ga As New GAImage(_targetBytes, _GACode.ToArray)
_GAPopulation.Add(_ga)
Dim pb As PictureBox = CType(GroupBox3.Controls("pbSample" & ii.ToString("00")), PictureBox)
Try
pb.Image = Image.FromStream(New MemoryStream(_ga.PersonalBytes))
Catch
End Try
Next
End Sub
We populate a start list composed of GAImages, initializing them with the target byte array (which, as we saw, is useful for fitness calculation), and passing the pool from which our bytes could be drawn. Please refer to the GAImage code above for details on the initialization procedure.
The icon i've used in the above example is 318 bytes long: it's very unlikely a member of starting population will produce our final result.
It's necessary to loop through generations, combining the best items of each set.
A proposal for evolution
Let's see my proposal for a evolutive behaviour. In my code, it takes place in clicking Button2 control. Non GA relevant code is omitted (refer to source code for it).
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
While _runFlag
Dim bestSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Descending Take (4)
If bestSamples(0).FitnessPercent > 95 Then Button3_Click(sender, e)
Dim sample01 As GAImage = GenerateImage(bestSamples(0), bestSamples(1))
Dim sample02 As GAImage = GenerateImage(bestSamples(1), bestSamples(0))
Dim sample03 As GAImage = GenerateImage(bestSamples(2), bestSamples(3))
Dim sample04 As GAImage = GenerateImage(bestSamples(3), bestSamples(2))
_GAPopulation.Insert(0, sample01)
_GAPopulation.Insert(1, sample02)
_GAPopulation.Insert(2, sample03)
_GAPopulation.Insert(3, sample04)
Dim worstSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Ascending Take (4)
_GAPopulation.Remove(worstSamples(0))
_GAPopulation.Remove(worstSamples(1))
_GAPopulation.Remove(worstSamples(2))
_GAPopulation.Remove(worstSamples(3))
End While
End Sub
Simply as this, we loop undefinitely through the list of GAImages (or, at least until 95% of result has been reached), taking each time the 4 fittest samples, and crossing their bytes. We insert the new created individuals on top of our list (supposing they are the bests), and removing an equal number of elements judged the worst in terms of fitness. It is possible, if something "goes wrong" during the generation phase, that a new created element is worse than a pre-existent one. In this case, it is discarded anyway.
Private Function GenerateImage(firstGA As GAImage, secondGA As GAImage) As GAImage
Dim _ga As New GAImage(_targetBytes, _GACode.ToArray, False)
For ii As Integer = _ga.FixedBytes To firstGA.PersonalBytes.Length - 1 Step 1
_ga.PersonalBytes(ii) = IIf(secondGA.isByteMutable(ii), Mutation(secondGA.PersonalBytes(ii), 2), Mutation(secondGA.PersonalBytes(ii), 20))
Next
Return _ga
End Function
The inbreeding is pretty simple: each accessible byte from the first element is looped, and the resultant byte is calculated as follows: if the same byte from the second element is mutation-resistant, we call on mutation routine with a probability of 1/20. Otherwise, we use a 1/2 probability rate.
Private Function Mutation(bt As Byte, rate As Byte) As Byte
Randomize()
Dim prob As Integer = Int(Rnd() * rate)
If prob Mod rate = 0 Then
Return _GACode(Int(Rnd() * (_GACode.Count - 1)))
Else
Return bt
End If
End Function
Mutation() takes place with a probability: if the mutation rate catches up, the specific byte is randomly extracted from the standard pool. Otherwise, the second element byte will be reproduced in the offspring. The loop through each generation/population for single generation will eventually evolve towards the final (and expected) result. This is only one of the methods that can be used to implement evolutive algorithms: each coder is the judge in defining what are the parameters to be called into question, and what are the general rules to solve a particular problem.
In our case, as long as it can bring valid results, other kinds of approaches are possible.
Sample video
A sample running session could be seen here: http://youtu.be/g6_JebAjtWQ
Download source code
Bibliography
History
- 2015-01-16: First Release for CodeProject