Thursday, March 31, 2011

Blog<Programming> Neural Networks - Part 1


Disclaimer: I'll be taking a departure from my usual weight loss/running posts this month to talk about my latest Scala project. So those of you who are not interested in programming or neural networks can feel free to tune out.

Neural Networks

Motivation
Last year, I participated in a Google-sponsored AI contest. The point of the contest was to create a computer-controlled player in a competitive game based on Galcon (there's lots more info at the link I posted). The server played semi-random games against all the entries and ranked them using the Elo ranking system (used in chess). I ended up coming in 43rd in the fierce but friendly competition. It was an awesome experience, and I will definitely be competing again this year if I have the time.

One of the things I learned from the contest was that I knew absolutely nothing about artificial intelligence. Instead of teaching a program how to play the game, I basically studied it for strategies and implemented the logic directly in the program. From a few contestants' post-mordems, I would bet that most of the contestants did something similar. There were a few exceptions (such as the winner), and another entry based on a genetic algorithm (which I believe finished around 200th).

Coming away from the contest, I took a shallow dive into some AI programming techniques, and came away with a desire to learn more about neural networks. Mostly because they are easy to understand and implement, but also because they can be used to solve some interesting and difficult problems. They are useful is because they can learn to solve a problem based on inputs with known solutions. Then, for inputs without a known solution, they can make predictions based on the previously learned behavior.

So what is a neural network?
Neural networks, as you might imply from the name, are based on the cellular structure of the brain and nervous system. A neuron is essentially a cell that acts as an on-off switch. It has a number of incoming connections called dendrites that carry an electric signal. Depending on the signals of all the dendrites, the neuron may or may not send a signal itself (turn on or off). The output of the neuron then connects to other dendrites, which will cause other neurons to turn on or off. Thus forming a network structure. That's the simplified view anyway.

In computer science, this structure is called an artificial neural network (ANN), but for the purposes of this article, when I say "neural network," I'm referring to the artificial (and not the biological) version.

In a neural network, the most basic component is the Neuron. It traditionally has a number of inputs, a weight for each input, and an activation function to determine when the neuron should output a value.

In the above diagram, the output is calculated by first multiplying each input by its corresponding weight, and then summing these weighted values over all the inputs. This sum is then passed into the activation function f(x). A higher weight value gives an input a greater importance for determining the output. Conversely a lower weight means less importance.

The activation function basically takes a value and outputs another value. Generally it is used to normalize the summed input value to a range of (-1,1) or maybe (0,1). Sometimes this is a step function, meaning that if the input value is above a certain threshold, the output will be 1, otherwise it will be zero. Most neural networks use the sigmoid function because it is continuous and normalizes any input to the range (-1,1). So, if the sum of the weighted inputs is something like 20, the activation function will output ~1. If the sum is between -1 and 1, the output will be close to the original value.

A network is created by connecting the output of several neurons to the inputs of other neurons.
As you can see, the outputs of the neurons on the left (I1, I2, and I3) are fed into the inputs of the neurons on the right (O1 and O2). The final result is the output of the 2 neurons on the right.

This type of network is called a feedforward network because there are no circular loops connecting any of the neurons. There are a lot of variations to how you connect the neurons to form a network, however the most common addition is something called a hidden layer. This is basically a set of neurons that sits between the input and output. The hidden layer provides a degree of abstraction and additional weights that can aid in the learning process.

Learning
A neural network's purpose is to solve a problem. If you just create a random network of neurons and input weights, you won't get very good results. However, there are a number of techniques for teaching a network to provide a better solution to the problem. A network "learns" by computing a result for a given input, determining how close the result is to the desired answer, and then adjusting its weights to hopefully give a better result the next time it is computed. This process of calculate -> adjust weights -> calculate, is performed many times until the desired result is achieved.

Backpropagation
This is a fancy name that really just means determining the error of the result, and then working backward to reduce the error at every neuron in the network. In practice it's a somewhat-complicated process, involving not-so-fun math (for the lay-person anyway; I'm sure mathematicians get a kick out of it.) Fortunately backpropagation was figured out a long time ago, so all of us hobbiest computer scientists have some conveniently tall giants to stand on.

Backpropagation is useful when you have a training set with known output values. For instance, the exclusive-or (XOR) function has 2 inputs and one output. Whenever one input has a value of 1 and the other is 0, the output is 1. Conversely, if the inputs are both 1 or both 0, the output is 0. Since we know the desired output values, we can easily determine a network's error by subtracting the generated result from the known output.

Backpropagation can be a useful way to teach a neural network, but it is limited by a few issues. The first is that it will sometimes train the network to provide a solution that is not valid (so-called local minima). The second is that it cannot teach networks if the optimal solution is not easily known when giving it test/training data.

Take the Galcon game as an example of the second problem. On any given turn, you have a list of planets, a list of enemy planets, a list of neutral planets, incoming fleets, planet growth rates, distances, etc... Every decision you make on this turn will affect future turns' decisions. There are so many variables that it would be nearly impossible to determine the perfect action to take for every situation. In problems like this a good way to train a neural network is to directly compare it against another network. This is the idea behind the second way to teach a network, genetic algorithms.

Genetic Algorithms
Genetic algorithms are based in the natural evolutionary processes of selection (based on fitness), mating, crossover, and mutation. The basic process is to create a population of genetic sequences (chromosomes) that correspond to parts of the problem's solution. This population is then used to create a new generation which is tested for fitness (how well it solves the problem). Over a number of generations, the top individuals will be able provide very good solutions to the problem.

In the case of a neural network, a chromosome can be represented as the network's collection of input weights. Each chromosome (list of weights) is tested and assigned a fitness value. After all chromosomes are tested, they are sorted so that the ones with the highest fitness values are first. To create a new generation, these chromosomes are selected as "mates", with a preference given to the ones that appear highest in the list (the ones with the highest fitness). The two mates combine their chromosomes in a process called crossover to produce a "child" chromosome. The resulting "child" may be further changed by mutation.

By the end of a number of generations, the top chromosomes should provide good solutions for the network.

This type of learning also overcomes the problems with backpropagation, but it requires a much longer time-frame to complete.

And that's the overview of what neural networks are. This post was pretty light on details, so if you are interested in creating your own, here are some more resources that I found useful:

In the next part, I'll provide some Scala source code for my implementation, as well as some examples of both backpropagation and genetic algorithm learning.