If you read the papers[1] on RProp it seems like a great algorithm that should tremendously speed up the time to convergence of a large neural network. I thought it was a no-brainer to apply this to modern CNNs that were becoming so popular, like GoogLeNet, VGG, and ResNet. When I went looking it seemed like almost everyone was using SGD. There was almost no mention of RProp.
The promise of RProp is fast convergence with no hyperparameter tuning. The penalty is that it is designed to operate on the whole training set as one unit, not on mini-batches. Does it live up to its promise, and can we do anything about the penalty?
Monday, May 30, 2016
Wednesday, May 11, 2016
Using the optim and dp packages together in Torch 7
When I first started learning about Torch I immediately wanted to use the dp package. It provides many common data sets and a nice framework for quickly running experiments. I was also interested in the optim package because it contains an implementation of rprop, which is what I wanted to experiment with.
With many things to learn, I was hoping to find a quick example of how to use these two packages together so that I could get started right away. I found plenty of examples of how to use each package on its own, but I couldn't find one that combined the two. It turns out that it's actually pretty easy, but when you're just getting started, don't know lua, and have a bunch of source to wade through to figure out how things interact, it is not very intuitive.
With many things to learn, I was hoping to find a quick example of how to use these two packages together so that I could get started right away. I found plenty of examples of how to use each package on its own, but I couldn't find one that combined the two. It turns out that it's actually pretty easy, but when you're just getting started, don't know lua, and have a bunch of source to wade through to figure out how things interact, it is not very intuitive.
Subscribe to:
Posts (Atom)