Disclaimer: I'll be taking a departure from my usual weight loss/running posts this month to talk about my latest Scala project. So those of you who are not interested in programming or neural networks can feel free to tune out.
...Continued
Last time, I talked about what neural networks are and what they might be used for. I also talked about a couple of the methods used to train them. If you haven't read it, I recommend you start there, because part 2 will focus heavily on the implementation.
Implementation
I've been playing around with Scala for about 2 years now. I've got a couple of unfinished projects that I've been using to learn my way around. It seemed like a natural fit, and it's more fun than Java, so that's what I went with.
I started by separating the project into 2 core concepts. The first was the neural network implementation. Basically, the collection of input neurons, the hidden layer, the output neurons, and the connections between neurons of sequential layers. The implementation of a neural network is actually pretty simple. It's basically a series of multiplications and sums. I tried to keep it simple so that it focuses solely on the calculation.
The second was the mechanism by which the neural network learns. For this, I implemented both backpropagation and a genetic algorithm. However, I soon realized that the genetic algorithm could be used for other purposes besides just training a neural network. So I pulled that out into its own module.
The final structure consists of 3 parts:
- The neural network.
- The genetic algorithm.
- The neural network learning mechanism
The learning mechanism can further be organized into:
- Backpropagation learning
- Genetic algorithm learning
Most everything is implemented as Scala traits, making it easy to mix and match different implementations.
The Code
So, that's the boring stuff, let's see some code (to see more, you can check out the repository on github.) The XOR function is used in just about every example out there on the web when looking for problems to solve with a neural network. It's well-understood, simple, and non-linear. And it's also easy to show example code in a blog. :)
This first example shows how to set up a neural network that learns how to calculate the XOR function via backpropagation:
object XORBackProp {
def main(args : Array[String]) : Unit = {
val inputKeys = Seq("1","2");
val outputKeys = Seq("out");
val hiddenLayers = Seq(4,2)
val testData = IndexedSeq(
(Map("1" -> 1.0, "2" -> 0.0),Map("out" -> 1.0)),
(Map("1" -> 0.0, "2" -> 0.0),Map("out" -> 0.0)),
(Map("1" -> 0.0, "2" -> 1.0),Map("out" -> 1.0)),
(Map("1" -> 1.0, "2" -> 1.0),Map("out" -> 0.0)))
val network = new Perceptron(inputKeys,outputKeys,hiddenLayers)
with BackPropagation[String] with StringKey;
//Initialize weights to random values
network.setWeights(for(i <- 0 until network.weightsLength) yield {3 * math.random - 1})
println(network.getWeights);
var error = 0.0
var i = 0
var learnRate = .3
val iterations = 10000
while(i == 0 || (error >= 0.0001 && i < iterations) ){
error = 0
var dataSet = if(i % 2 == 0) testData else testData.reverse
for(data <- testData){
val actual = network.calculate(data._1)("out")
error += math.abs(data._2("out") - actual)
network.train(data._2, learnRate)
}
if(i % 100 == 0){
println(i+" error -> "+error
+" - weights -> " + network.getWeights
+" - biases -> " + network.getBiases);
}
i+=1
}
println("\nFinished at: "+i)
println(network.toString)
for(data <- testData){
println(data._1.toString+" -> "+network.calculate(data._1)("out"))
}
}
}
The key is this line:
val network = new Perceptron(inputKeys,outputKeys,hiddenLayers)
with BackPropagation[String]
It sets up a simple neural network (Perceptron) with the inputs, outputs, and the number of neurons in each hidden layer. The BackPropagation trait gives it the ability to learn using backpropagation.
The rest of the network initialization is configuration for the backpropagation. "learnRate" determines the amount to change the weights based on the error from the test data. Finally, at the end, we are printing the results of the neural network when run against the test inputs:
Map(1 -> 1.0, 2 -> 0.0) -> 0.9999571451716337
Map(1 -> 0.0, 2 -> 0.0) -> 4.248112596677567E-5
Map(1 -> 0.0, 2 -> 1.0) -> 1.0000125509003892
Map(1 -> 1.0, 2 -> 1.0) -> 1.0286171998885596E-7
And here's a graph showing the error versus backpropagation iterations for a few different executions.
Notice that run 2 never reached an acceptable error. This is because of the local minimus problem with backpropagation. Fortunately, each of these runs took about a second, so it's relatively easy to just reset the training until you get an acceptably close solution. This may not be the case for every problem however.
This second example shows how to set up an XOR neural network that learns via a genetic algorithm:
object XORGeneticAlgorithm2 {
def main(args : Array[String]) : Unit = {
val xorTestData = IndexedSeq(
(Map("1" -> 1.0, "2" -> 0.0),Map("out" -> 1.0)),
(Map("1" -> 0.0, "2" -> 0.0),Map("out" -> 0.0)),
(Map("1" -> 0.0, "2" -> 1.0),Map("out" -> 1.0)),
(Map("1" -> 1.0, "2" -> 1.0),Map("out" -> 0.0)))
val popSize = 1000 //The number of individuals in a generation
val maxGen = 100 //number of generations
//Anonymous type that extends from PagedGANN which is an implentation of
//GANN (Genetic Algorithm Neural Network)
val gann = new PagedGANN[WeightBiasGeneticCode,String,Perceptron[String]]()
with ErrorBasedTesting[WeightBiasGeneticCode,String,Perceptron[String]]
with GAPerceptron[WeightBiasGeneticCode,String]{
override def getTestData = xorTestData
override val inputKeys = Seq("1","2");
override val outputKeys = Seq("out");
override val hiddenLayers = Seq(6,3);
override val populationSize = popSize
override def mutationRate:Double = { 0.25 }
override def mutationSize:Double = {0.025 + 2.0 * (math.max(0.0,(50.0 - getGeneration)/1000.0)) }
override def crossoverRate:Double = { 0.9 }
override def elitistPercentile = {0.02}
override def minNeuronOutput:Double = -0.1
override def maxNeuronOutput:Double = 1.1
override def concurrentPages = 4
override def setupNetworkForIndividual(network:Perceptron[String],individual:WeightBiasGeneticCode){
network.setWeights(individual.weights.geneSeq)
network.setBiases(individual.biases.geneSeq)
}
override def stopCondition():Boolean = {
val gen = getGeneration
val topFive = getPopulation(0,5)
val bottomFive = getPopulation(populationSize - 5)
println(gen+" -> "+topFive.map(_._2)+" -> "+bottomFive.reverse.map(_._2))
(gen >= maxGen || topFive.head._2 >= 1000000)
}
override def generateHiddenNeuronKey(layer:Int,index:Int):String = {
"Hidden-"+layer+"-"+index;
}
}
//Genetic Code is 2 chromosomes (1 for weights, 1 for biases)
val initialPop = for(i <- 0 until popSize) yield {
val wChrom = new ChromosomeDouble((0 until gann.weightsLength).map(i => 20.0 * math.random - 10.0).toIndexedSeq)
val bChrom = new ChromosomeDouble((0 until gann.biasesLength).map(i => 2.0 * math.random - 1.0).toIndexedSeq)
(new WeightBiasGeneticCode(wChrom,bChrom),0.0)
}
//Setup the genetic algorithm's initial population
gann.initPopulation(initialPop,0)
//Train the network
val network = gann.trainNetwork()
//Print the result
println(network.toString)
for(data <- xorTestData){
println(data._1.toString+" -> "+network.calculate(data._1)("out"))
}
}
}
Once again, the important part is the anonymous type that extends from PagedGANN. This is an extension of GANN, which is the marriage between the genetic algorithm and the neural network. The PagedGANN can take advantage of machines with multiple processors to calculate fitness and create each new generation.
The various overridden defs tweak the genetic algorithm slightly. For instance, mutationRate determines how frequently an individual might be mutated while creating a new generation. Likewise, mutationAmount determines the maximum change of a network weight if it is mutated. Here's what gets printed at the end of the learning phase:
Map(1 -> 1.0, 2 -> 0.0) -> 1.0014781670169086
Map(1 -> 0.0, 2 -> 0.0) -> 2.588504471438824E-5
Map(1 -> 0.0, 2 -> 1.0) -> 0.9994488547053212
Map(1 -> 1.0, 2 -> 1.0) -> -2.3519634709978643E-5
And here's a graph showing the error versus generations for a few different executions.
For the most part, they all reach an acceptable aproximation of the XOR function. Some take longer than others and some reach a better solution, but the random nature of a genetic algorithm can help to avoid the local minimus problem. It should be noted that it's possible that a genetic algorithm might not solve the problem at all, and that it can take some tweaking of the parameters to get it to produce a good solution.
Next Time
The XOR function is well and good for explaining the basics, but it's also boring as hell. I've cooked up a better example of a neural network in action, which I'll be posting very soon.
Stay tuned!
Kyle,
ReplyDeleteI don't get this, but I know YOU do, so it is awe-inspiring to see your brain at work!
You have an easy teaching style that would work well at a university if you ever chose to do so.
Keep it up, looks fascinating!