contents
- Genetic Algorithm
- Base implementation, Template class GA<> and GA Selection classes
- Genome of Travel
- TSP Application
- Environment
- Reference
Disclaimer
I am not a GA guru and I do not have any degree in GA so this article can't be used as GA book or GA tutorial. There aren't any mathematics nor logic nor algebra about GA. It's only a programmer's view on Genetic Algorithms and only example of GA coding. Use it carefully! Any comments and criticism are highly appreciated.
Genetic Algorithm, Theory
There are so many books and so many resources on the WEB about Genetic Algorithms. The best that I can do is quote some nice descriptions from my preferred sites.
Definition from Marek Obitko's Site:
"Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of artificial intelligence. As you can guess, genetic algorithms are inspired by Darwin's theory about evolution. Simply said, solution to a problem solved by genetic algorithms is evolved."
Explanation from Generation5.org:
"Genetic algorithms are not too hard to program or understand, since they are biological based. Thinking in terms of real-life evolution may help you understand. Here is the general algorithm for a GA:
Create a Random Initial State
An initial population is created from a random selection of solutions (which are analagous to chromosomes). This is unlike the situation for Symbolic AI systems, where the initial state in a problem is already given instead.
Evaluate Fitness
A value for fitness is assigned to each solution (chromosome) depending on how close it actually is to solving the problem (thus arriving to the answer of the desired problem). (These "solutions" are not to be confused with "answers" to the problem, think of them as possible characteristics that the system would employ in order to reach the answer.)
Reproduce (& Children Mutate)
Those chromosomes with a higher fitness value are more likely to reproduce offspring (which can mutate after reproduction). The offspring is a product of the father and mother, whose composition consists of a combination of genes from them (this process is known as "crossing over".
Next Generation
If the new generation contains a solution that produces an output that is close enough or equal to the desired answer then the problem has been solved. If this is not the case, then the new generation will go through the same process as their parents did. This will continue until a solution is reached."
In my opinion, the GA is easy to understand and easy to implement using C++. The main advantage and disadvantage of GA at the same time is robustness. Even if you implement some features incorrectly, the GA will continue run and sooner or later will solve the problem (or find any local optimum). Definitely such a feature can produce some trouble during debugging and tuning. Another interesting problem is to choose the best algorithms from existing plethora of algorithms of crossover, mutation, gene presentation, etc.
See references topic for some useful links about GA theory.
Genetic Algorithm and Traveling Salesman Problem
About Traveling Salesman Problem
Again a quotation from http://www.math.princeton.edu/tsp/
"The traveling salesman problem, or TSP for short, is this: given a finite number of 'cities' along with the cost of travel between each pair of them, find the cheapest way of visiting all the cities and returning to your starting point."
Genome and Algorithm
We can't use a traditional presentation and algorithm for the TSP problem, because every city must be unique in a gene, and can't be duplicated.
For a gene presentation, I used a sequential representation where the cities are listed in the order in which they are visited. It's common way for TSP Genome.
Example: [9 3 4 0 1 2 5 7 6 8]
For a crossover operation after several tests and researching I selected the Greedy Crossover by J. Grefenstette.
The citation from Sushil J. Louis
"Greedy crossover selects the first city of one parent, compares the cities leaving that city in both parents, and chooses the closer one to extend the tour. If one city has already appeared in the tour, we choose the other city. If both cities have already appeared, we randomly select a non-selected city."
From my experience it's a very effective method.
Mutation
We can't change the gene's bits as the usual traditional mutation does. Instead we must swap the order of cities in a path.
Example:
Before mutation [0 1 2 3 4 5 6]
After mutation [0 1 3 2 4 5 6]
There are a lot of ways of doing such a swapping operation. Easiest way in using random swap. Unfortunately, such a strategy is unable to achieve an optimum quickly but can prevent convergence into a local optimum. Additionally I used a greedy-swap mutation.
Once more citation from Sushil J. Louis
"The basic idea of greedy-swap is to randomly select two cities from one chromosome and swap them if the new (swapped) tour length is shorter than the old one"
While browsing the web I discovered research into GA "A fast TSP solver using a genetic algorithm" by Hiroaki Sengoku and Ikuo Yoshihara. It's the fastest algorithm I ever saw. Unfortunately, It's a Java implementation and without any source. I could find only one PDF document with a description about this algorithm: arob98.pdf After reading and studying I used the "Mutation by 2opt" idea in my code. This method has the same idea as greedy-swap mutation but more expansive and more effective. After adding into code I can improve speed of my program greatly on small and middle sets (till 200 cities). However for big sets (1000 and more) this heuristics is very slow method.
Selection
I implemented three selection methods : routlette rank, roulette cost and tournamnet. Of course I used elitism too.
Roulette Wheel Selection
Definition from
Marek Obitko's Site
"Cost Selection : Parents are selected according to their fitness. The better the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where are placed all chromosomes in the population, every has its place big accordingly to its fitness function
Rank Selection : The previous selection will have problems when the fitnesses differ very much. For example, if the best chromosome fitness is 90% of all the roulette wheel then the other chromosomes will have very few chances to be selected. Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population)."
Tournamnet Selection and Elitism
Definition from
W. B. Langdon, University College, London
"A mechanism for choosing individuals from a population. A group (typically between 2 and 7 individuals) are selected at random from the population and the best (normally only one, but possibly more) is chosen An elitist genetic algorithm is one that always retains in the population the best individual found so far, Tournamnet Selection is naturally elitist."
In my opinion and after testing the roulette rank and tournament selections are slightly faster for TSP case. For another problems, others selection algorithms can be best.
Co-evolutions. Migrations
I didn't find many documents about these methods in WEB. The base idea: allow evolving of several populations at the same time. The description found in Generation5.org Site
"Genetic algorithms are neat, but they do come with their own set of problems. One big problem is that genetic algorithms have a tendency to get stuck at local optima. In other words, they will find a reasonable solution, but not the best solution. There are several methods that have been devised to counter this problem, and the one we will look at is coevolution"
Simultaneously, co-evolution idea allows utilizing the SMP ability of WinNT\2k machines with multi CPUs. We can easily run several GA in separate threads without any penalty. For exchange data between different GA we can migrate the best genes in population.
Base implementation
Template class GA<> and GA Selection classes
The GA<>
class implements a base logic of Genetic Algorithms: recombination, mutation. User can manage the gene population via methods of GA<>
class. It holds a gene population and gene context, selection methods, and method of randomization. User can specify behavior of this class via template parameters.
template <typename Traits, typename Selection> class GA {�}
Template parameter Traits
must define typedefs for Gene
class, Random
class, Population
container class and Thread Synchronize class.
Template parameter Selection
must provide the GA selection algorithm. Now there are three such classes: selection_tournament<>
, selection_roulette_cost<>
, selection_roulette_rank<>
.
GA<>
interface
init
- initializes population
update
- computes fitness's values and return optimal gene or end()
find_best
- finds the gene with best fitness
epoch
- makes next population (selection, crossover, mutation, etc)
recombine
- makes selection, produce new genes, removes non elite parents and removes twins gene
mutate
- attempts to mutate genes
migration
- exchanges the best genes between populations
sort
- orders the genes in population depending on fitness value (moves the best into a beginning of population)
begin
- returns iterator at the first (best) gene of the population
end
- returns a iterator that points just beyond the end of the population
size
- returns the size of populations
The example of using GA<>
class
typedef ga_traits<RandomCRT> traits;
typedef GA<traits, selection_tournament<traits> > tGA;
traits::Gene::Context context;
tGA ga(50, &context);
tGA::iterator it = ga.update();
while(it == ga.end())
{
ga.epoch();
it = ga.update();
}
traits::Gene* gnp = (*it);
Genome of Travel
class TSPData
Context of travel holds a travel data, auxiliary data and methods for crossover operation.
struct TSPBase
Base gene class has thread specific memory pool for memory optimization In process of computing GA creates and destroys a lot of dynamic gene objects. It's very ineffective to use default memory allocation routine. Instead of it I used a special memory pool class, which pre-allocated a large block of memory from process heap and then caches a freeing blocks.
class TSPGene<> : TSPBase
Gene's implementation
Every gene holds a path (travel) of salesman and fitness value of this travel. Of course the lower the cost of travel the better fitness of gene. It has some constructors and methods for mutation and heuristics computing. The default constructor creates gene with random travel and for crossover operation uses another constructor:
TSPGene* gnp = new TSPGene(parent1, parent2).
It creates offspring from genetic materials of parents' genes.
TSP Application, GA thread
For every co-evolution _Main
class creates a separate thread with exemplar of GA<>
class. Depending on user's setting it creates GA with one of three selection methods and sets a population size, elite size, migration's size, heuristics value, mutation and crossover probability.
Then every turn (or AI term "epoch") it calls recombination, mutation, and heuristics methods and update the Application's board with result of best gene. Also it tries to prevent convergence to a local optimum and kills 50% population if fitness stays unchanged some time.
TSP Application, UI interface
User can change a setting of GA for tuning it depending on size and shape of set of points.
- Co-evolution field: a number of co-evolutions (separate threads) from 1 to 16. Default value is number of CPU * 2.
- Population field: a size of population per co-evolution, from 10 to 1000
- Elite field: an elite size in the population, from 0 to size of population.
- Migration field: a size of migrated genes, from 0 to size of population.
- Heuristics filed: a size of best genes improved via heuristics method in every epoch. Note: in large populations it can be extremely slow in calculating the solution.
- Crossover field: probability of crossover, from 0 to 100
- Mutation filed: probability of mutation. From 0 to 100
- Selection combo: selection method (roulette rank, roulette cost and tournament)
- Remove Twins checkbox: set using a natural selection algorithm. It eliminates similar genes to avoid the immature convergence.
Environment
I used VC++ 6.0. SP5, Win2k SP2, MS Platform SDK April 2001.
And tested on Win2k SP2 IE 6.0,Win2k SP1 IE 5.0, Win ME IE 5.5, Win 98 SE IE 5.0.
References
- S.Hsiung and J.Matthews, Generation 5 - Genetic Algorithms and Genetic Programming
- Marek Obitko, Introduction to Genetic Algorithms
- Hiroaki Sengoku and Ikuo Yoshihara, A fast TSP solver using a genetic algorithm
- Sushil J. Louis and Rilun Tang,
- Sergey Isaev, Genetic Algorithm
- W. B. Langdon, Genetic Programming and Data Structures
- Solving Traveling Salesman Problem