Homework 2

Exploring Genetic Algorithms


Due: Monday, Oct. 2, 10:00 AM


Disclaimer

Admittedly, this isn't the BEST example of a domain in which to use genetic algorithms because they are most often used in situations where we don't know what the "right" answer is. In this problem we will.

Having said that, this homework is very manageable in size and difficulty and therefore makes a good chance for you to explore the key ideas of Genetic Algorithms easily.

Introduction

There are several ways to encode the value of a particular color on the computer. One of these is the RGB method. In this method each of the three color channels (Red, Green, and Blue) is stored in a single byte or 8 bits. Since a byte stores 256 different values this means that each of the color channels is represented with a value from 0 to 255 representing none of that color at 0 to "full on" at 255.

Thus, Black is (0,0,0), pure red is (255,0,0) and medium gray would be (128,128,128).

If you want to learn a little bit more about this idea you can play with the simple applet at:

http://www.rapidtables.com/web/color/RGB_Color.htm

Simply change the values of R, G, and B to be in the range of 0-255 and see what color comes out. Alternately, play with the color picker tools (the slider and "pointer" on the left side) to see the RGB values of a given color.

 

These colors are actually stored as a length 24 bit string. 8 bits (one byte) for Red, 8 bits for Green, and 8 bits for Blue. Thus, Black would be stored as 000000000000000000000000, full on red would be stored as 111111110000000000000000 and medium gray would be stored as 100000001000000010000000.

This bit string is an ideal mechanism to explore a genetic algorithm as it is a very manageable yet meaningful chromesome.

 

A Fitness Function

In this assignment you will be attempting to build a target bit string (a target color) using a population of bit strings (other colors) and the genetic algorithm process.

One of the key elements of any genetic algorithm is a "fitness function" This is a function which provides a score that we use to assess the "quailty" of any given chromosome in the population. Of course, quality is an assesment of how "close" you are to the "goal." In many GA sitautions, where the actual goal isn't fully understood, this is some sort of score where a high score means you are performing better than other chromsomes. BUT, in this situation we have a known, target color. In this case we want to know how close we are to that color. There is actually a formula for determining the distance between two colors

Notice that this is simply a three dimensional extension of Euclidean distance formula (related to Pythagorean's theorem).

Thus, pure red (255,0,0) and medium gray (128,128,128) are a bit larger than 221 units apart. (Check my math and let me know if you disagree).

Also note that if we are storing our chromsomes as bit strings you will need to provide code that converts a length 24 bit string into it's three integer components (converts "1111111110000000010000000" to (255, 0, 128) to make these calculations).

 

Ways to generate new chromosome's from population to population

In the reading from your textbook you read about three genetic operators - methods for moving chromosomes from one generation to the next.

  1. Crossover - two DIFFERENT chromosomes produce two children through a process of "mating" A random cross point is selected and the front "half" of one of the pair is merged with the back "half" of the other one of the pair. Two pairs of examples are shown here
  2. Mutation - a chromosome is moved from generation N to generation N+1 with one or more of the elements of the chromosome are "mutated" (in our case, a bit is changed from 0 to 1 or 1 to 0). The mutation rate can be adjusted from low mutation (only 1 gene/bit is modified) to high mutation (up to every element of the bit string). We will start by only considering mutation rate of 1.
  3. Selection - a chromosome is moved from generation N to generation N+1 with no changes at all. This is actually just a special case of mutation with a mutation rate of 0.

The first two are almost always used. The third one is used in certain situations. I am also going to ask you to consider one additional operator:

  1. New Blood - This is really just an adaptation of Mutation with a mutation rate of length N.

 

Selecting a Chromosome to modify

In order to mimic nature and the concept of "survival" of the fittest" we need to come up with a technique which allows every chromosome at least a limited chance of being picked to undergo a genetic change through one of the methods explained above. But those chromosomes that are "fittest" (in our case, close to the target color) should have a much higher chance than those that aren't (those that are farther away).

For this particular assignment I would like you to use a technique known as "tournament selection" The concept is easy, although the implementation may be a bit harder.

To put this in an easy to understand context, let's consider five people who have just bowled a game of bowling. First, rank your bowlers (or, in the context of your GA, your chromosomes) based on their fitness. In this context the fitness is their score in the last round of the game.

Rank Bowler Score
1 Ringo 224
2 Paul 202
3 George 185
4 Yoko 171
5 John 155

Now, we want to select one of these bowlers, but we want to give preference for Ringo over John. The way we do this is:

  1. randomly pick a number from 1-N (in this case, 5)
  2. do it again, paying no attention to what number you picked the first time
  3. Select whichever bowler has the higher rank. If you actually picked the same bowler both times than you have your selection.

If you think about this, you have 25 possible pairs of bowlers (5x5) and among this 25 pairings each bowler is selected at least once, although Ringo is going to be selected 9 of the 25 times while John will only be selected once.

  First Bowler Picked
Ringo Paul George Yoko John
Second Bowler Picked Ringo R R R R R
Paul R P P P P
George R P G G G
Yoko R P G Y Y
John R P G Y John

 

You should use this technique for picking which chromosomes are used for selection and new blood. You should perform this techinique twice to see which two chromosomes are selected for crossover.

 

 

Instructions

For this homework you should write a program that fits the following requirements

  1. Asks the user for a length 24 bit string that represents the target color.
  2. Asks the user for the number of chromosomes in the population
  3. In THIS ORDER:
    1. Asks the user for the number of chromosomes that should undergo selection (mutation rate of 0) per generation
    2. Asks the user for the number of chromosomes that should undergo mutation (mutation rate of 1) per generation
    3. Asks the user for the number of "newblood:" chromsomes (mutation rate of 24) that should be introduced per generation
    4. Asks the user for the number of "crossover pairs" that should be introduced per generation (1 pair uses and produces two unique chromosomes).
  4. Asks the user for how many generations the genetic process should be run.
  5. Conducts the genetic search that was described:
  6. Prints the fitness function (color distance) value of the best member of the population after each generation.
  7. Prints the value of the best chromosome found after the process is completed.

 

For example, one run might look like this:

 

You must also produce a short "read me" file (readme.txt) which explains to me, the tester, how to use your program. 

 

Submitting Your Work

Prior to the deadline, use the homework submission system (https://www.cs.uni.edu/~schafer/submit/which_course.cgi) to upload any files you have produced.  Make sure you submit all of the files needed to use your code.

In addition please submit to me a paper copy of your read me file at the start of the class on the due date.