I haven’t blogged in over 4 years so I’ve decided to kick-things off again by doing a series on DeepLearning4J which I’m using more and more of recently. DL4J is great because there’s no need to translate the models to something else for production. You deploy them to production by just using them inside a Tomcat web app or inside Camel pipeline right along side what you are deploying today. It’s also very ops friendly, since they can monitor the apps with JMX like they are
used to.
Neural networks are a different kind of beast than the other machine learning algorithms. If you’ve ever played around with them, chances are that they almost always failed compared to other algorithms. There’s a lot of technique that goes into training a neural network effectively. Luckily Deep Learning frameworks have been giving us access to these techniques by name. When you know what the name of the thing you are looking for it’s much easier than doing the matrix math by hand but it’s daunting to newcomers. The basics of the matrix math isn’t too bad however, and it’s worth being familiar with it.
Let’s start this series in the spirit of hello-tensorflow (and also borrowing heavily from this great tutorial). I’m going to make a single neuron and try and train it to output 1 when I input 0 and output 0 when I input 1. Only instead of numpy or tensorflow, I’m going to use ND4J (the matrix library that DL4J is built on top of) so that I can use Scala.
Deep Learning is almost synonymous with GPUs but since we have only a single neuron it’ll run best on a CPU. First you need to create a build.sbt file and put it in an empty directory. It should look something like the following:
name := "hello-nd4j" version := "1.0" scalaVersion := "2.11.8" libraryDependencies ++= Seq( "org.nd4j" % "nd4j-native-platform" % "0.7.2", )
Next create a main class. You can just put a scala file in the same folder as build.sbt
:
object HelloND4J extends App { println("hello world") }
Now run sbt run
and make sure it runs.
Next add some imports:
import org.nd4j.linalg.api.ndarray.INDArray import org.nd4j.linalg.factor.Nd4j._ import org.nd4j.linalg.ops.transforms._
Now you should be able to do the same matrix operations that you can with Octave/Matlab and Numpy! Time to make some artificial neurons. First, lets create the input data and the outputs we want to learn. (Note: I’m embedding the bias into the input arrays as a constant 1.0 in the last position.)
val input1 = create(Array(0.0, 1.0)) val input2 = create(Array(1.0, 1.0)) val inputs = Array(input1, input2) val label1 = create(Array(1.0)) val label2 = create(Array(0.0)) val labels = Array(label1, label2)
Again, in this trivial example, we are trying to learn the NOT function using only multiply and add instead of boolean instructions or if statements. Let’s make the computer figure out what values it needs to make it work.
A neuron is usually defined as some a function that takes an input, multiplies it by a set of weights adds a bias and outputs the value of applying the result through some sort of activation function. Lets define the weights and biases:
var W = create(Array(0.8, 0.5))
Since there is only one input, we only need a single weight and a single bias. I’ve initialized them to something random.
Next lets define a function that does the computation I mentioned above:
def forward(input: INDArray, w: INDArray) = sigmoid(input mmul w.transpose())
I’m using sigmoid as the activation function here but it could be something else. I encourage you to give it a try. You can now use this to make predictions like so:
val prediction1 = forward(input1, W) // 0.62 val prediction2 = forward(input2, W) // 0.79
It gives us 0.62 when we input a 0, which is closer to 1 than 0 but 0.79 when we input a 1, which is not what we want. To get the computer to figure out what it needs to change we need to measure how wrong it is. There are many
different ways to do this but for this example I’m going to use simplest; the difference between the right answer and the prediction:
def loss(prediction: INDArray, label: INDArray) = label sub prediction
Now we need a way to use the information about how wrong it is to find out what direction we need to change W to get a more correct answer. Just like you can use derivatives in physics to get the acceleration of a ball from a start and an end point, you can use the derivative of our forward function to get the acceleration between the right answer and our prediction. Tensorflow can figure out the derivatives for you, but since we are doing things manually we’ll just define it in a function which for sigmoid it’s simply x * (1 – x):
def backward(out: INDArray) = out mul (out add (-1))
Calling backward will give you a delta for W that you can multiply by the right answer to get a better W. This delta is only for a single sample. The W for the current input might not be the right W for the other inputs. For that reason, you usually multiply the delta by a small learning rate, so that you inch closer to the right answer for all the inputs.
Now you can put everything together to make a training routine by wrapping it in a for loop like so:
val learningRate = 0.1 for (i <- 0 until 10000) { val pred = forward(inputs(i % 2), W) val error = loss(pred, labels(i % 2)) val deltaW = backward(pred) val updateW = error mul deltaW W = W sub ((inputs(i % 2).transpose() mmul updateW) mul learningRate) }
After running through each example about 5000 times calling forward on should be really close to the answer we want:
println(s"0.00 => ${forward(input1, W)}") // 0.00 => 0.94 println(s"1.00 => ${forward(input2, W)}") // 1.00 => 0.05
Now if we look at W we can see the values you need to do NOT without an if statement:
println(s"$W")
Which should show you -5.67
and 2.72
if you’ve been following along.
In the next part of the series I’ll go over how to use DL4J’s higher level API to define more complex neural networks.
Leave a Reply