Disclaimer: I'll be taking a departure from my usual weight loss/running posts this month to talk about my latest Scala project. So those of you who are not interested in programming or neural networks can feel free to tune out.
...Continued
My first post on neural networks and genetic algorithms explained the basics. The second one showed some code and talked a little about the implementation of the XOR function. For my final post, I've created a simple game and trained a neural network to play it.The Game
The game is simple. It consists of a square area that is populated with targets in random locations, and the player. The objective is to move the player such that it "picks up" (touches) some targets while avoiding others. Picking up a good target awards the player 3 points, while touching a bad target subtracts 5 points. The player is judged on how many points he can accumulate after 100 turns.
The above image shows what the board looks like. The black circle is the player. The green circles are the "good" targets (+3 points) and the red circles are the bad ones (-5 points). The light green diamonds are the good targets that have been picked up and the orange stars are the bad targets that have been touched.
The AI
The AI for the player consists of a neural network trained by a genetic algorithm. The network itself has a number of inputs including player position, the player's last move, the closest 4 good targets (the dark green circles), and the closest 4 bad targets (dark red circles). The hidden layer consists of 2 layers of about 20 and 5 neurons respectively. The output layer is 2 neurons. One for the horizontal movement and one for the vertical movement.
The game itself constrains how far the player can move on a turn, which means that the output neurons mostly just give a direction vector and not an actual offset position. However, if the magnitude of this direction vector is less than the maximum turn distance, the player will use that for its actual distance. This allows the player to potentially make fine-tuned moves.
The training algorithm ran for 1000 generations with a population size of about 500. The training data was a set of 4 randomly generated boards and 1 static board. The fitness of each individual is basically the score for each board plus a small factor based on how quickly the player accumulates his score. This selects primarily for highest score, but secondarily for speed at which and individual can find the good targets.
The Results
The algorithm worked well. Here's a sample of the best individuals at the end of several generations:
This is basically the best individual out of a set of 500 randomly generated networks. As you can see it does a pretty good job of avoiding the bad targets, but it gets stuck on the left side pretty quickly, not knowing where to go.
By the 20th generation, the best individual is a little better at picking up the good targets. But towards the end, it gets a little confused, oscillating back and forth as it sees different targets.
Generation 100 has stopped caring about the bad targets so much. It looks like it's preferring to move in a single direction for a number of turns before making a change. This probably has to do with the fitness function's secondary selector which is based on the speed at which the score is accumulated.
Here are links to generations 200 and 500. You can see the player getting better at quickly picking up the good targets.
By generation 1000 the player is almost able to get all of the good targets in the 100 turns. It is also reasonably good at avoiding the bad targets although there are some odd moments where it seems like it's deliberately picking them up.
Lessons Learned
You've probably noticed that the neural network only moves the player diagonally. This is largely because of the activation function that limits the output between -1.0 and 1.0. Meaning that excessively large numbers are around 1 while excessively low numbers are around -1. Comparatively, the -1 to 1 range is a small target to hit. This means that the <-1,-1> <-1,1> <1,-1> and <1,1> moves are somewhat selected for because they are the easiest for the network to attain. If I were to do it again, I'd probably drop the activation function entirely and just use the output neurons as a direction vector.
You also probably noticed that there is a giant leap in ability from the first generation to the 20th and 100th generations, but only a smaller leap to the 500th and 1000th generations. This is because most of the improvement in a genetic algorithm happens quickly with only small refinements in later generations. I actually had to tweak the size of mutations so that smaller increments were possible as the generations increased.
Finally, the entire training period took about 6 hours on my quad-core desktop PC. You might think that's a long time, but just think about how long it might take to actually implement the decision logic in code. I was able to do something else for those 6 hours while my PC worked tirelessly toward a solution. The brain power in a neural network and genetic algorithm is mapping the inputs and outputs to the problem at hand, and figuring out how to determine fitness of an individual. But once you have those things, the solution is found automatically by the algorithm. The results might not be a perfect but you can get pretty good results.
Next Steps
I noticed a shortcoming of neural networks almost immediately. The outputs are a series of additions and multiplications. You can't divide, you can't take a square root, you can't loop over a dynamic-length list, and with an acyclic network, you can't story anything resembling a memory. You can get fairly good approximations for a lot of problems, but it can still be fairly limiting. My next area of research is something called Genetic Programming. It's the same process of evolving a population over a number of generations, but instead of the "chromosome" representing a neural network, it is the source code itself. It is an actual runnable program that changes its structure and execution to improve on the original. And since it is just a program, it is not limited by the additions and multiplications that comprise a neural network.
And that's all for now. We'll be returning you to your regularly scheduled fitness talk next time. Thanks for bearing with me as I strayed from the path a little bit.