Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / machine-learning
Print

Lesson to Learn: Introduction to Genetic Algorithms

4.97/5 (15 votes)
15 Dec 2018MIT13 min read 22K   680  
An introduction to Genetic Algorithms with brief reference to biology and example of finding one solution for complex mathematical equation

C# and C++ examples were created using MS Visual Studio Community 2015. Java example was created using Eclipse neon.3. Source codes are compatible with different platforms.

Introduction

This article is an introduction to Genetic Algorithms that are widely used in such modern technologies as Machine Learning and Artificial Intelligence. The article is written for students starting to learn programming as additional reading to their main course to motivate them to study advanced subjects. It can also be useful for all who would like to improve their knowledge and skills.

The author tried to make the article and program source codes as simple as possible and hope the reader will enjoy reading the text and running the codes. The author will be glad to get any feedback from the readers to improve this reading.

Contents

Background

Motivation

Being a lecturer in Baku State University, the author observes that the students have different levels of the programming skills and talent. Education of young professionals require not only knowledge transfer, but also motivation to study and to self-develop. It becomes important to provide additional readings for the students to expand their vision and to understanding informational technologies. These additional readings are not obligatory for them, but are highly recommended. Talented and skilled students gain practical experience while others enrich their knowledge.

How to Read the Article

The article consists of two main parts: "Biological Background" and "Implementation of the Algorithm". Chapter "Biological Background" considers an example of getting a dog specimen with required quality. The example is very simple. Real biological processes are significantly complex. But the main purpose of the example is explanation of conceptions that are used in the Genetic Algorithms. The chapter could seem unnecessary and annoying for the experienced professionals, but for newbies, this introduction may be essential.

Chapter "Implementation of the Algorithm" explains how Genetic Algorithm can be programmed and run. The chapter describes implementation of the algorithm using three programming languages: C#, Java, and C++. Source code is available for download (see links at the top of the article).

Biological Background

The idea of the Genetic Algorithms was peeped from the nature. All living organisms can be changed from generation to generation that allow them to adapt to the changes in the environment. People have been using this aspect for breeding new varieties of animals and plants with desirable qualities. Let us consider a simple example. This example gives a clue about how the process works and what some terms mean.

Suppose there is a flock of gray dogs and it is required to get white ones. The picture below shows the existing flock.

Image 1

If studying the flock, the reader can find some specimens which are not totally gray but have small white spots. This is because there are mutations that have taken place at their reproduction (it is a law of nature). Terms "mutation" and "reproduction" are used in the Genetic Algorithms also. Thus, these conceptions will be used in future description. Term "population" will be also used instead of "flock" due to the same reason.

As the next step, let us select from the population the specimens with the white spots. "Selection" is also a term used in the Genetic Algorithms (as well as in biology and agriculture). Selected specimens are shown in the picture below.

Image 2

If selected specimens are reproduced, new population will be got. While qualities of new specimens (children) are combinations of qualities of their parents, casual changes in the qualities, named "mutations" take the place during reproduction (It is a law of nature. Scientists use radiation to amplify mutation). Because mutations are random, final population will have three types of specimens:

  • Specimens with less or no white spots
  • Specimens with the same size of white spots (Majority of specimens in most cases. We can speak these specimens were not mutated.)
  • Specimens with bigger white spots

The picture below shows new population. The reader can compare it with the first population to see the differences.

Image 3

Next step is selection of specimens with the bigger white spots. The picture below shows the selected specimens. The reader can also compare this selection with the previous one.

Image 4

And reproduce them to get the next population, that is shown in the picture below.

Image 5

Selection of "better" specimens, reproduction of them to new population and repeat this process again and again will lead to getting "better" and "better" specimens till fully white ("the best") ones appear. Pictures below show the progress.

Selection 3.

Image 6

Population, burn to the selection 3.

Image 7

Selection 4.

Image 8

Population, burn to the selection 4.

Image 9

Selection 5.

Image 10

Population, burn to the selection 5.

Image 11

For this population, some fully white specimens appear at last. The problem is solved. But if try to continue the process, "white" population will appear. But some species will have gray spots because of the mutation. Although continuation of the process can be useful in biology and agriculture, many practical Genetic Algorithms stop as soon as the solution is found. Only small amount of specific cases require Genetic Algorithms to continue (for example, AI solutions simulating continues processes).

"White" selection.

Image 12

"White" population.

Image 13

Summary for the Chapter

The summary for the chapter can be as follows:

  • Term "specimen" means one instance/item.
  • Term "population" means pack of specimens generally of the same generation.
  • Term "selection" means set of specimens, selected from the population according to the desired criteria for future reproduction.
  • Term "reproduction" means getting new population (new generation) from last selection to increase number of specimens with creating new specimens, combined from their parents.
  • Term "genes" means internal structures of specimens which are responsible to form the qualities of their owners.
  • Term "mutation" means casual changes in qualities of the specimens. In other words, mutation is casual changes in genes.
  • Continues Selection, Reproduction with casual Mutations lead to getting specimens with the desired qualities.

Implementation of the Algorithm

Clear task statement is essential before implementation of the Genetic Algorithm. After the problem is completely clear, the model must be build. The model must reflect the biological process adopted to computation. The model requires formulating the following:

  • What specimen is
  • What genes are
  • How specimens are reproduced to build the population
  • How mutation can be implemented
  • What size the population is
  • What size the selection is
  • How the specimen can be quantitative evaluated for affinity to the solution
  • When the specimen can be considered as the solution

For simple problems, the model can be coded directly into computer program. Complex problems might require mathematician modeling, analyses and design with special software tools. Considering example is simple and the reader can find elements of the model in the text and source files.

Task Statement

Let's consider the following equation:

19.39281272 * X5 + 7.82018991 * X4 + 35.12849546 * X3 - 28.09127103 * X2 + 3.30129489 * X = 20351.07006276

This is a fifth degree equation that is hard to solve by simple mathematical methods. It is a good example to start studying Genetic Algorithms because it is simple to understand, model, and program.

Implementation

Genetic Algorithm is implemented using Object Oriented Programming. Although three programming languages were used, main classes, properties and methods are quite similar. Description of the source code, that is stated below allows to understand the model in detail (source codes can be downloaded by the links at the top of the article).

Class Specimen

Class Specimen represents an instance of the equation. The required equation can be represented in the following view:

F = 19.39281272 * X5 + 7.82018991 * X4 + 35.12849546 * X3 - 28.09127103 * X2 + 3.30129489 * X - 20351.07006276

In this case, the equation is designated to be a specimen where distance of F to zero is the affinity of X to the solution. Because X is a variable, it is selected to be a gene.

Initial value for the gene is set to 10 (see source codes), because the value 10 powered to 5 is 100000 that is close to 20351.07006276 by the range. This value for the gene is not important - it influences only number of iterations and computation time (white dogs can be got even from flock of black dogs). For the current example, it can be set to any value instead of 0 (current Mutation function is simple and cannot mutate zero genes). The reader can set gene to different values, run the example, and observe how the algorithm works.

Class Specimen contains method CalculateAffinity() that implements calculation of the equation for current gene value. CalculateAffinity() sets calculated value to the member Affinity that is used for sorting the population in Class SpecimenWorkshop for future selection. Calculation of Affinity before sorting significantly reduces calculation time.

C++ version of Class Specimen contains method Clone() that makes the memory management code better readable. C# and Java have the Garbage Collector that eliminates the need to care about the memory management in our simple case (some memory management techniques can improve performance in multithreading high-load solutions even for C# or Java).

using System;

namespace L2L_I2GA
{
    // Class that represents the Specimen
    public class Specimen
    {
        // The genes
        public double[] Genes;

        // Affinity to the solution
        public double Affinity;

        // Constructor that creates the Specimen
        // instance with initial genes
        public Specimen()
        {
            // Assume one double value as the genes
            Genes = new double[1];

            // Set up initial value to the genes
            Genes[0] = 10;
        }

        // Calculates affinity of this instance to the solution
        // Contains the equation formula
        public void CalculateAffinity()
        {
            double s1 = Math.Pow(Genes[0], 5) * 19.39281272;
            double s2 = Math.Pow(Genes[0], 4) *  7.82018991;
            double s3 = Math.Pow(Genes[0], 3) * 35.12849546;
            double s4 = Math.Pow(Genes[0], 2) * 28.09127103;
            double s5 = Genes[0] * 3.30129489;

            double f = s1 + s2 + s3 - s4 + s5;

            // Need positive value to evaluate proximity
            // 0 is the best value
            Affinity = Math.Abs(f - 20351.07006276);
        }
    }
}

Class SpecimenWorkshop

Class SpecimenWorkshop is designed to manipulate by the specimens, their population, and selection. Reproduction and mutation of specimens are random processes. Thus, C# and Java versions of the class contain the random number generator. C++ version of the class uses function rand() instead. PopulationSize and SelectionSize represent sizes of population and selection accordingly. MaxMutationPercent is degree of gene change. It can be set to any value. MutationLikelyhoodPercent assigned likelihood of mutations that allows to have mutated only part of specimens while other part is kept intact. Epsilon is used to set accuracy when affinity compares with the target value (value that meaning finding the solution). Genetic algorithm has huge volume of computations. Increasing of accuracy that can be set by reducing the Epsilon can significantly increase computational time. The reader can change the value and observe how the example works.

The Class implements two methods GeneratePopulation. One method builds initial population. Mutation for all specimens for initial population produces the most-different specimens that increase the chance to get specimens which are close to the solution. Otherwise, the population will contain similar specimens that are useless for future calculations (selection of dogs from the population where all the specimens have white spots is better than from the population where majority of them will have no white spots).

The second method GeneratePopulation produces new population on the base of reproduction of pairs taken randomly from the selection. Reproduction algorithm used in this example is simple but works fine. It selects two parents randomly and creates a child for them. Another reproduction strategy could improve the algorithm. There are lots of different strategies that can be used. Reproduction of the first (the best) specimen in pairs with other ones in the selection, re-reproduction of the children can be examples of other strategies that can also be used. Keeping the specimens from the selection in new population or removing them are also two different strategies. Skilled reader can test different strategies.

Method ReproduceNew creates a new "child" specimen for two parents. It sets the child's gene to average of genes of both its parents. After creating, the gene mutates (or not) according to the values of MutationLikelyhoodPercent and MaxMutationPercent.

Method Select is implemented by performing two main actions. First of all, the method sorts the population in way the array contains the best specimens at the beginning and the worst specimens at the end. At the second step, the best specimens are being copied to selection for future calculations.

using System;

namespace L2L_I2GA
{
    // Class that performs main actions with the specimens and the population
    public static class SpecimenWorkshop
    {
        // Random generator
        static Random rnd = new Random();

        // Number of specimens in the population
        public static int PopulationSize;

        // Number of specimens in the selection 
        public static int SelectionSize;

        // Power of mutations
        // 0 - no mutations
        // 100 - mutations up to whole gene value
        // 200 - mutations up to twice gene value
        public static double MaxMutationPercent;

        // Likelihood of mutation
        // 0 - no mutations, 100 - mutations for all cases
        public static int MutationLikelyhoodPercent;

        // Maximal affinity that is considered 
        // for the solution found
        public static double Epsilon;

        // Generate initial population
        public static Specimen[] GeneratePopulation()
        {
            // Creates array representing the population
            Specimen[] p = new Specimen[PopulationSize];

            // Creates specimens
            // Mutation of all specimens in initial population
            // increases variance that increases chance to
            // get better instance.
            for (int i = 0; i < PopulationSize; i++)
            {
                p[i] = new Specimen();
                Mutate(p[i]);

                // Calculate Affinity for new specimens
                p[i].CalculateAffinity();
            }

            return p;
        }

        // Generate population by reproduction of selection
        public static Specimen[] GeneratePopulation(Specimen[] selection)
        {
            // Creates array representing the population
            Specimen[] p = new Specimen[PopulationSize];

            // Copy instances from the selection to keep them
            // in new generation of population
            for (int i = 0; i < SelectionSize; i++)
            {
                p[i] = selection[i];
            }

            // Creates new specimens by reproducing two parents
            // Parents are selected randomly from the selection.
            int chield_index = SelectionSize;
            int parent1_index;
            int parent2_index;

            while(chield_index < PopulationSize)
            {
                // Select two parents randomly in way
                // they are different instances
                do
                {
                    parent1_index = rnd.Next(selection.Length);
                    parent2_index = rnd.Next(selection.Length);
                } while (parent1_index == parent2_index);

                // Creates new specimen
                p[chield_index] = ReproduceNew(
                          selection[parent1_index], 
                          selection[parent2_index]);

                chield_index++;
            }

            return p;
        }

        // Reproduce new specimen on base of two parents
        public static Specimen ReproduceNew(Specimen a, Specimen b)
        {
            Specimen s= new Specimen();

            // Inherit genes as the average oh the parents' genes
            s.Genes[0] = (a.Genes[0] + b.Genes[0]) / 2;

            // Mutate if likelihood allows
            int ml = rnd.Next(101);
            if (ml <= MutationLikelyhoodPercent)
            {
                Mutate(s);
            }

            // Calculate Affinity for new specimen
            s.CalculateAffinity();

            return s;
        }

        // Select the best specimens from the population
        public static Specimen[] Select(Specimen[] population)
        {
            // Sort population by increasing the affinity
            // The best specimens are moving to start of the array
            Sort(population);

            // Create set of selected specimens
            Specimen[] selected = new Specimen[SelectionSize];

            // Copy best specimens into the selection
            for (int i = 0; i < SelectionSize; i++)
            {
                selected[i] = population[i];
            }

            return selected;
        }

        // Sort the population
        public static void Sort(Specimen[] population)
        {
            // Use standard procedure
            Array.Sort<specimen>(population, CompareSpecimens);
        }

        // Comparison function for the standard Sort procedure
        public static int CompareSpecimens(Specimen a, Specimen b)
        {
            if(a.Affinity < b.Affinity)
            {
                return -1;
            } else
            if (a.Affinity > b.Affinity)
            {
                return 1;
            }
            else
            {
                return 0;
            }
        }

        // Mutate the specimen
        public static void Mutate(Specimen sp)
        {
            // Calculate Mutation Factor that is random value between 0 and MaxMutationPercent
            // calculates as ratio between 0 and 1
            double MutationFactor = (MaxMutationPercent / 100.0) * (rnd.Next(1001) / 1000.0);

            // Set mutation to negative with 50% likelihood
            if (rnd.Next(10) < 5)
            {
                MutationFactor = -MutationFactor;
            }

            // Calculate new gene
            sp.Genes[0] = sp.Genes[0] * (1 + MutationFactor);
        }
    }
}

Class Solver

Class Solver represents the Genetic Algorithm at the highest abstraction level. Method Initialize() initializes the algorithm by setting up options and generating initial population. Method Run() runs the calculation loop until the solution is found. Calculation loop consists of two main steps - selection of the best specimens from the current population and renew the current population by reproduction of the selected specimens. The loop stops if affinity of the best specimen from the selection is close enough to the zero (for the given example). Gene of this specimen contains the solution for the equation. C++ version of the Class contains also code for memory management (deletion of old objects).

using System;

namespace L2L_I2GA
{
    // Class that implements the Genetic Algorithm
    // at the highest abstraction level
    public static class Solver
    {
        // Current Population
        static Specimen[] Current_Population;

        // Current Selection
        static Specimen[] Current_Selection;

        // Current Proximity
        static double Current_Proximity;

        // Current Iteration
        static int Current_Iteration;

        // Initialize the algorithm
        public static void Initialize()
        {
            // Set up options for the algorithm
            SpecimenWorkshop.PopulationSize = 10000;
            SpecimenWorkshop.SelectionSize = 1000;
            SpecimenWorkshop.MaxMutationPercent = 500;
            SpecimenWorkshop.MutationLikelyhoodPercent = 90;
            SpecimenWorkshop.Epsilon = 0.000000001;

            // Generate initial population
            Current_Population = SpecimenWorkshop.GeneratePopulation();
            Current_Iteration = 0;
        }

        // Run the algorithm
        public static void Run()
        {
            // Set Current_Proximity to the biggest value
            Current_Proximity = double.MaxValue;

            // Loop while Current_Proximity is not less than the Epsilon
            while (SpecimenWorkshop.Epsilon <= Current_Proximity)
            {
                // Select the best specimens
                Current_Selection = SpecimenWorkshop.Select(Current_Population);

                // Calculate proximity for the top-selected (the best) specimen
                Current_Proximity = Current_Selection[0].Affinity;

                // Report proximity and found solution for the current iteration
                Console.WriteLine(
                                         "{0}\t{1}\t{2}",
                                         Current_Iteration,
                                         Current_Proximity,
                                         Current_Selection[0].Genes[0]);

                // End the calculations if Current_Proximity is less than the Epsilon
                if (Current_Proximity < SpecimenWorkshop.Epsilon)
                {
                    break;
                }

                // Generate new population by reproducing specimens from the selection
                Current_Population =
                                     SpecimenWorkshop.GeneratePopulation(Current_Selection);

                Current_Iteration++;                
            }
        }
    }
}

Method Main

Method Main is quite simple. It just contains invocation of methods of class Solver and commands to close the console window after pressing Enter key.

using System;

namespace L2L_I2GA
{
    class Program
    {
        static void Main(string[] args)
        {
            Solver.Initialize();
            Solver.Run();

            Console.WriteLine("Press Enter to exit.");
            Console.ReadLine();
        }
    }
}

Output

Genetic Algorithm contains many random operations. Because of this fact, the output will be different for each run. Output of one of the runs looks like the picture below:

Image 14

Possible Drawbacks

Genetic Algorithm contains fuzzy and random calculations. Although it can solve very difficult problems, it can be unstable and falling down into infinite loop. Picture presented in chapter Output shows that iterations [1, 2, 3], [7, 8], [9, 10], [13, 14], [17, 18], [19, 20], and [25, 26] contain the same best solution (look like copies). This fact is evidence that for some cases, the reproduction and mutation did not produce any specimen that was better than the existing and did not the process go closer to the solution. In other words, 8 of 27 of reproduced populations were not useful and the calculation time for processing of these populations was lost.

By change of the algorithm options that are located in method Initialize() of Class Solver, the reader can study different modes of working of the example. Reducing of PopulationSize and SelectionSize for example, as well as reducing of MaxMutationPercent and MutationLikelyhoodPercent can put the algorithm into an infinite loop.

Fortunately, the values for the options can be set by trying running the example and observe how it works. Other real problems (other equations and so on) can be easier or harder to solve and require selection of other options accordingly. There are also problems impossible to solve. Genetic Algorithms are very flexible to be adapted for better solving specific kind of problems (see my other article "A Look into the Future - Source Code Generation by the Bots").

Mathematic methods can be used to evaluate solveability of the specific problem and adapt the algorithm for it. But this is outside the scope ot this article.

Homework

The name of this chapter may evoke a smile. I have just used the term from the education sphere. But this chapter can be considered as a reminder that the reader can not only read the text and run the examples, but also think of "How deep the rabbit hole is" and try to do some extra studies by himself.

First of all, the reader can review and find equalities between the source code and the biological processes. The author described the main conceptions, but Biology Processes and Genetic Algorithms are quite varied and deep for finding something interesting that was not mentioned.

As the second, the reader can run the sources with different options and for different equations. Such exercise helps him to get the practical experience.

Those who are skilled in programming can try other strategies of reproduction and mutations. They can also try to solve systems of equations with more than one variable, in other words, with more than one gene (the examples uses array of genes providing some scalability).

My article "A Look into the Future - Source Code Generation by the Bots" (opens in new window) describes the use of Genetic Algorithm for automatic generating of new function according to the requirements. Although the article seems quite advanced, it shows some interesting abilities of the Genetic Algorithms.

History

  • 15th December, 2018: Initial release

License

This article, along with any associated source code and files, is licensed under The MIT License