Saturday, December 13, 2014

Accelerate your LIBSVM classification models

I guess I finally have to admit it.  I'm not as clever as I thought.  The statistical classification algorithms I created, more or less from scratch, about nine years ago, are nowhere near as clever as a support vector machine (SVM).  And they may not even be as accurate either!

Yes, sad to say, LIBSVM is more accurate than libAGF.  I believe this is because it is maximizing the accuracy with respect to actual data whereas "adaptive Gaussian filtering" (AGF) is not.

About half a year ago I made a major addition to libAGF.  This was to generalize the pre-trained binary models for multi-class classifications.  I wrote a paper on my approach to the problem and submitted it to the Journal of Machine Learning Research (JMLR) as a "software paper."  Sadly it was rejected.  One of the objections the reviewer had was that the method couldn't be applied to other binary classifiers than the AGF "borders" classifier.  Actually it could, but only at the software level, not the command line, interface level, and not easily.

So I set out to remedy this in libAGF version "0.9.8."  The software can now be combined with the two binaries from LIBSVM, svm-train and svm-predict, or anything that looks like them.  Whereas in LIBSVM you can choose from exactly one multi-class classification method, libAGF lets you choose between slightly more than nc! where nc is the number of classes.

So lets say you've built your hybrid libAGF/LIBSVM model that's tailored exactly to the classification problem at hand.  Since we're ultimately still calling svm-predict to perform the classifications, you might find it a bit slow.

Why exactly is SVM so slow?  After all, in theory the method works by finding a linear discrimination border in a transformed space, so classifications should be O(1) (constant time) efficient!  Unfortunately, it's the caveat, "transformed space" that destroys this efficiency.

The truly clever part about support vector machines is the so-called "kernel trick."  With the kernel trick, we implicitly generate extra, transformed variables by performing operations on the dot product.  Consider a squared dot product of two, two-dimensional variables:

[(x1, x2) • (y1, y2)]
= x12y12 + 2x1y1x2y2 + x22y22
= (x12, √2x1x2, x22) • (y12, √2y1y2, y22)

Clever huh?  The trouble is, unlike in this example, you usually can't calculate these extra variables explicitly.  By the same token, the discrimination border, which is linear only in the transformed space, is also neither calculated nor stored explicitly.  Rather, a series of "support vectors", along with matching coefficients, are stored and used to generated each classification estimate.  These support vectors are simply training vectors, usually close to the border, that affect the solution.  The more training data, the more support vectors.

The data I'm using to test my "multi-borders" routines contain over 80 000 training samples.  Using LIBSVM with these data is very slow because the number of support vectors is some non-negligible fraction of this eighty thousand.  If I'm testing LIBSVM algorithms I rarely use the full compliment, rather some sub-set of it.

LibAGF multi-borders to the rescue!

The libAGF binary classifiers work by first generating a pair of samples on either side of the discrimination border, then zeroing the difference in conditional probabilities along the line between these two points.  In this way it is possible generate a group of points lying directly on the discrimination border.  You can collect as many or as few of these points as you need.  In practice,this is rarely more than 100.  In combination with the gradient vectors, it is straightforward, and very fast, to use these points for classification.

The crux of this algorithm is a function which returns estimates of the conditional probabilities. You can plug in any function you want.  Since the idea here is to accelerate the LIBSVM classification stage, we plug in the LIBSVM estimates as generated by the svm-predict command along with the pre-trained, SVM model.  Training this model, incidentally, is still just as slow, so you'll just have to live with that, although training the new model is not.

The gradient vectors are generated numerically.  Unfortunately, this produces a slight degradation in accuracy.  In principle, it should be possible, easy even, to produce these gradients analytically.  Later versions may allow this but it will require modifying the LIBSVM executable.

But speed increases are monumental.  We have swapped an implicit discrimination border, specified through the support vectors, with an explicit one, directly represented by a small set of samples.

Check out the latest version of libAGF for all the juicy details.

No comments: