Hello ND4J

I haven’t blogged in over 4 years so I’ve decided to kick-things off again by doing a series on DeepLearning4J which I’m using more and more of recently. DL4J is great because there’s no need to translate the models to something else for production. You deploy them to production by just using them inside a Tomcat web app or inside Camel pipeline right along side what you are deploying today. It’s also very ops friendly, since they can monitor the apps with JMX like they are
used to.

Neural networks are a different kind of beast than the other machine learning algorithms. If you’ve ever played around with them, chances are that they almost always failed compared to other algorithms. There’s a lot of technique that goes into training a neural network effectively. Luckily Deep Learning frameworks have been giving us access to these techniques by name. When you know what the name of the thing you are looking for it’s much easier than doing the matrix math by hand but it’s daunting to newcomers. The basics of the matrix math isn’t too bad however, and it’s worth being familiar with it.

Let’s start this series in the spirit of hello-tensorflow (and also borrowing heavily from this great tutorial). I’m going to make a single neuron and try and train it to output 1 when I input 0 and output 0 when I input 1. Only instead of numpy or tensorflow, I’m going to use ND4J (the matrix library that DL4J is built on top of) so that I can use Scala.

Deep Learning is almost synonymous with GPUs but since we have only a single neuron it’ll run best on a CPU. First you need to create a build.sbt file and put it in an empty directory. It should look something like the following:

name := "hello-nd4j"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.nd4j" % "nd4j-native-platform" % "0.7.2",

Next create a main class. You can just put a scala file in the same folder as build.sbt:

object HelloND4J extends App {
  println("hello world")

Now run sbt run and make sure it runs.

Next add some imports:

import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factor.Nd4j._
import org.nd4j.linalg.ops.transforms._

Now you should be able to do the same matrix operations that you can with Octave/Matlab and Numpy! Time to make some artificial neurons. First, lets create the input data and the outputs we want to learn. (Note: I’m embedding the bias into the input arrays as a constant 1.0 in the last position.)

val input1 = create(Array(0.0, 1.0))
val input2 = create(Array(1.0, 1.0))
val inputs = Array(input1, input2)

val label1 = create(Array(1.0))
val label2 = create(Array(0.0))
val labels = Array(label1, label2)

Again, in this trivial example, we are trying to learn the NOT function using only multiply and add instead of boolean instructions or if statements. Let’s make the computer figure out what values it needs to make it work.

A neuron is usually defined as some a function that takes an input, multiplies it by a set of weights adds a bias and outputs the value of applying the result through some sort of activation function. Lets define the weights and biases:

var W = create(Array(0.8, 0.5))

Since there is only one input, we only need a single weight and a single bias. I’ve initialized them to something random.

Next lets define a function that does the computation I mentioned above:

def forward(input: INDArray, w: INDArray) =
  sigmoid(input mmul w.transpose())

I’m using sigmoid as the activation function here but it could be something else. I encourage you to give it a try. You can now use this to make predictions like so:

val prediction1 = forward(input1, W) // 0.62
val prediction2 = forward(input2, W) // 0.79

It gives us 0.62 when we input a 0, which is closer to 1 than 0 but 0.79 when we input a 1, which is not what we want. To get the computer to figure out what it needs to change we need to measure how wrong it is. There are many
different ways to do this but for this example I’m going to use simplest; the difference between the right answer and the prediction:

def loss(prediction: INDArray, label: INDArray) =
  label sub prediction

Now we need a way to use the information about how wrong it is to find out what direction we need to change W to get a more correct answer. Just like you can use derivatives in physics to get the acceleration of a ball from a start and an end point, you can use the derivative of our forward function to get the acceleration between the right answer and our prediction. Tensorflow can figure out the derivatives for you, but since we are doing things manually we’ll just define it in a function which for sigmoid it’s simply x * (1 – x):

def backward(out: INDArray) =
  out mul (out add (-1))

Calling backward will give you a delta for W that you can multiply by the right answer to get a better W. This delta is only for a single sample. The W for the current input might not be the right W for the other inputs. For that reason, you usually multiply the delta by a small learning rate, so that you inch closer to the right answer for all the inputs.

Now you can put everything together to make a training routine by wrapping it in a for loop like so:

val learningRate = 0.1
for (i <- 0 until 10000) {
  val pred = forward(inputs(i % 2), W)
  val error = loss(pred, labels(i % 2))
  val deltaW = backward(pred)
  val updateW = error mul deltaW

  W = W sub ((inputs(i % 2).transpose() mmul updateW) mul learningRate)

After running through each example about 5000 times calling forward on should be really close to the answer we want:

println(s"0.00 => ${forward(input1, W)}") // 0.00 => 0.94
println(s"1.00 => ${forward(input2, W)}") // 1.00 => 0.05

Now if we look at W we can see the values you need to do NOT without an if statement:


Which should show you -5.67 and 2.72 if you’ve been following along.

In the next part of the series I’ll go over how to use DL4J’s higher level API to define more complex neural networks.

How Cappuccino handles pass-by-reference APIs in Cocoa.

I’ve been using Cappuccino at work recently and I came across a corner of Objective-J that I previously didn’t know about: how to use enumerateObjectsUsingBlock.

One of the nice things about Objective-C is that it has blocks.  Blocks allow you to make functions that take other functions as arguments, all while preserving scope. One of the most popular block-using APIs is the NSArray method enumerateObjectsUsingBlock. It takes a block that has three arguments: the current value of the iteration, the current index and a reference to a Boolean stop variable. If you set the variable to YES the loop will stop (much like the break keyword, which only works to break out of while and for loops.)

This is a problem for JavaScript because it always passes by value. Since it’s very cheap to create objects in JavaScript you could just pass an object with a property you can set, but that means that you have to remember the property name to set and the caller has to remember to copy that value back out. The Cappuccino developers thought of using a function that closes over an object in the parent scope via a macro called AT_REF.

To use a cappuccino AT_REF you simply call it and pass in the new value you want to set. Like so:

[array enumerateObjectsUsingBlock:function (item, i, stop) { 

In the implementation of enumerateObjectsUsingBlock you’ll see something like the following:

var stop = NO; 
for (var i = 0; i < length; i++) { 
    block(arr[i], i, BY_REF(stop)); 
    if (stop) 

The AT_REF macro expands to something like:

(function(val) { 
    if (arguments.length) 
        return (stop = val); 
    return val; 

That way you can both set the value by calling it with an argument and get the value by calling it with no arguments.

It’s brilliant if you ask me. But then again, pretty obvious when you think about it.

In the future the devs would like to integrate this into the language by using @ref() and @deref(), (hence the name of the macro).

To define methods that use AT_REF in your own code you can simply @import <Foundation/Ref.h>.  Clients of your API just need to know to call the argument to set and get it’s value.

What worries me about Scala

I’ve recently been rediscovering podcasts.  I’m particularly fond of the NodeUp podcast, but I also listen to podcasts about Rails, Clojure, and Scala.  The Scala podcast is by far the most academic; taking a shot every time they say the word “type” makes for a pretty nice drinking game.

I must admit I like academic and theoretical discussion as much as the next guy, but I worry about how little attention is paid to practical matters.  NodeUp on the other-hand is almost completely about practical things: libraries, frameworks, scaling, deployment, etc.  I’ve found them all to be pretty informative.  The Rails podcast is also fairly practical, but they do tend to cover object-oriented design patterns from time to time.  I admit that JavaScript and Ruby aren’t exactly paradigm-shifting languages, but still, programming is about more than just the mathematics.

For example, the Scala world has two great web frameworks: Lift and Play.  It would be nice to hear how they feel, not just about how type safe they are.  Granted type safety is one of the main reasons I’m interested in Scala over node/ruby, but programming is about tradeoffs.  Things like programmer productivity anecdotes are really worth something to new-comers and are a great fit for a discussion-centric medium like a podcast.

If Scala is going to become a mainstream language, there needs to be more discussion on these touchy-feely and practical subjects. If you know any, please let me know in the comments.  Podcasts about Scala are really hard to find.

Password Maker

With all the recent database compromises going on lately, I’ve decided to step my passwords up a notch.

My old “password scheme” was pretty simple, I had 4 different levels of passwords so that if a lower one got compromised the more sensitive passwords would still be safe.  The flaw in this scheme is that if one of your sensitive passwords are exposed then all your other sensitive sites are.  I used to think that my banking websites would have pretty hardened systems, but that’s not necessarily the case.

The only safe way to handle passwords online is to have a unique password for every site you use.  A lot of people complain that memorizing passwords is hard, but I find it actually pretty easy (I memorized thousands of Kanji after all). The real hard part is remembering which password you used for what site.  When you have 4 passwords, you can just iterate through them without getting locked out but it doesn’t work when you have tens or hundreds of passwords to try.

Luckily there are a few companies with products to solve this problem. I looked at services like LastPass and 1Password. 1Password was way to expensive and LastPass was hard to use, ugly, and for some inexplicible reason stored your passwords on a server.  No matter what encryption they use today, it’ll be completely inadequate in 5 or 10 years.

Enter PasswordMaker, it’s an open-source extension for Firefox that generates passwords for websites based on the URL of the site you are visiting hashed with a master password.  This is probably the only scalable solution to the password problem.  It also does this without storing your password anywhere which makes it the most secure solution I’ve found yet. Plus the algorithm for generating passwords is simple so if I’m ever in some kind of a pinch, I can regenerate my passwords by hand (with the help of the md5 utility).

The main problems with PasswordMaker is that it doesn’t work very well on mobile phones.  There’s an Android port, but it’s about as bare as can get.  It could use some UX love. For example, It could take advantage of Android’s Share intent to get rid of the whole URL input dance that you have to do now.

Although this setup is working for me without too much pain, I still think that the whole username/password system is way too complicated.  The fact that I need a program to manage them definitely makes it a smell.  I loved OpenID, but apparently it’s also too complicated for users, so now I’m really rooting for Mozilla’s recently announced BrowserID.  As more and more software and services move to the cloud, BrowserID, or something like it, is pretty much going to be required to keep login-creep at bay. It should help reduce the anxiety that people like my wife have when thinking about whether or not they should sign-up for some new site.