Saturday, December 13, 2014

Accelerate your LIBSVM classification models

I guess I finally have to admit it.  I'm not as clever as I thought.  The statistical classification algorithms I created, more or less from scratch, about nine years ago, are nowhere near as clever as a support vector machine (SVM).  And they may not even be as accurate either!

Yes, sad to say, LIBSVM is more accurate than libAGF.  I believe this is because it is maximizing the accuracy with respect to actual data whereas "adaptive Gaussian filtering" (AGF) is not.

About half a year ago I made a major addition to libAGF.  This was to generalize the pre-trained binary models for multi-class classifications.  I wrote a paper on my approach to the problem and submitted it to the Journal of Machine Learning Research (JMLR) as a "software paper."  Sadly it was rejected.  One of the objections the reviewer had was that the method couldn't be applied to other binary classifiers than the AGF "borders" classifier.  Actually it could, but only at the software level, not the command line, interface level, and not easily.

So I set out to remedy this in libAGF version "0.9.8."  The software can now be combined with the two binaries from LIBSVM, svm-train and svm-predict, or anything that looks like them.  Whereas in LIBSVM you can choose from exactly one multi-class classification method, libAGF lets you choose between slightly more than nc! where nc is the number of classes.

So lets say you've built your hybrid libAGF/LIBSVM model that's tailored exactly to the classification problem at hand.  Since we're ultimately still calling svm-predict to perform the classifications, you might find it a bit slow.

Why exactly is SVM so slow?  After all, in theory the method works by finding a linear discrimination border in a transformed space, so classifications should be O(1) (constant time) efficient!  Unfortunately, it's the caveat, "transformed space" that destroys this efficiency.

The truly clever part about support vector machines is the so-called "kernel trick."  With the kernel trick, we implicitly generate extra, transformed variables by performing operations on the dot product.  Consider a squared dot product of two, two-dimensional variables:

[(x1, x2) • (y1, y2)]
= x12y12 + 2x1y1x2y2 + x22y22
= (x12, √2x1x2, x22) • (y12, √2y1y2, y22)

Clever huh?  The trouble is, unlike in this example, you usually can't calculate these extra variables explicitly.  By the same token, the discrimination border, which is linear only in the transformed space, is also neither calculated nor stored explicitly.  Rather, a series of "support vectors", along with matching coefficients, are stored and used to generated each classification estimate.  These support vectors are simply training vectors, usually close to the border, that affect the solution.  The more training data, the more support vectors.

The data I'm using to test my "multi-borders" routines contain over 80 000 training samples.  Using LIBSVM with these data is very slow because the number of support vectors is some non-negligible fraction of this eighty thousand.  If I'm testing LIBSVM algorithms I rarely use the full compliment, rather some sub-set of it.

LibAGF multi-borders to the rescue!

The libAGF binary classifiers work by first generating a pair of samples on either side of the discrimination border, then zeroing the difference in conditional probabilities along the line between these two points.  In this way it is possible generate a group of points lying directly on the discrimination border.  You can collect as many or as few of these points as you need.  In practice,this is rarely more than 100.  In combination with the gradient vectors, it is straightforward, and very fast, to use these points for classification.

The crux of this algorithm is a function which returns estimates of the conditional probabilities. You can plug in any function you want.  Since the idea here is to accelerate the LIBSVM classification stage, we plug in the LIBSVM estimates as generated by the svm-predict command along with the pre-trained, SVM model.  Training this model, incidentally, is still just as slow, so you'll just have to live with that, although training the new model is not.

The gradient vectors are generated numerically.  Unfortunately, this produces a slight degradation in accuracy.  In principle, it should be possible, easy even, to produce these gradients analytically.  Later versions may allow this but it will require modifying the LIBSVM executable.

But speed increases are monumental.  We have swapped an implicit discrimination border, specified through the support vectors, with an explicit one, directly represented by a small set of samples.

Check out the latest version of libAGF for all the juicy details.

Friday, December 12, 2014

Cloud cover index

The other day I was out running after dark.  It was -8 degrees C and I was expecting it to be bitterly cold.  Instead, it felt quite mild, pleasant even.  It was overcast.

One of the things I learned while working on sea ice retrieval, is that air temperature is just one of many factors that determine how an object exchanges heat with its environment, and a relatively minor one at that.  We've all heard of the wind chill factor.  We feel of course, much colder during a windy day than a calm one, even if the air temperature is the same.  Another factor is humidity, hence the humidex. The one I'm going to discuss here is cloud cover.  I'm trying to design a temperature correction for cloud cover, much like that for wind chill and humidity.

Every warm object emits radiation which is why stove elements glow red and incandescent light bulb filaments glow white.  At room temperatures this radiation is too low in frequency to see with the naked eye.  When an object emits radiation, an equivalent amount of heat energy is lost.  When it absorbs radiation, the energy from the radiation goes towards warming it. On cold, calm days, radiative cooling is actually a much more significant source of heat loss than direct conduction into the air because air is a poor conductor.  This is why down is so warm: it's not the down itself that's a good insulator, it's actually the air spaces between.

Clouds reflect a lot of the long-wave radiation we emit, making overcast days feel warmer than clear days, especially during the evenings when the sun's rays are not there to compensate.  This of course is the well-worn explanation for climate change and the "greenhouse effect."

OK, so maybe you've heard all that.  Here is the more technical explanation.  But first, for the less technically inclined, lets go directly to the table:

Cloud cover
Air temp. 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
-20.0 -20.0 -19.8 -19.6 -19.2 -18.8 -18.2 -17.6 -16.8 -16.0 -15.0
-19.0 -19.0 -18.8 -18.6 -18.2 -17.8 -17.2 -16.6 -15.8 -14.9 -14.0
-18.0 -17.9 -17.8 -17.6 -17.2 -16.8 -16.2 -15.5 -14.8 -13.9 -12.9
-17.0 -17.0 -16.8 -16.5 -16.2 -15.7 -15.2 -14.5 -13.8 -12.9 -11.9
-16.0 -15.9 -15.8 -15.5 -15.2 -14.7 -14.2 -13.5 -12.7 -11.9 -10.9
-15.0 -14.9 -14.8 -14.5 -14.2 -13.7 -13.2 -12.5 -11.7 -10.8 -9.8
-14.0 -13.9 -13.8 -13.5 -13.2 -12.7 -12.1 -11.5 -10.7 -9.8 -8.8
-13.0 -12.9 -12.8 -12.5 -12.2 -11.7 -11.1 -10.5 -9.7 -8.8 -7.7
-12.0 -11.9 -11.8 -11.5 -11.2 -10.7 -10.1 -9.4 -8.6 -7.7 -6.7
-11.0 -10.9 -10.8 -10.5 -10.2 -9.7 -9.1 -8.4 -7.6 -6.7 -5.7
-10.0 -9.9 -9.8 -9.5 -9.2 -8.7 -8.1 -7.4 -6.6 -5.7 -4.6
-9.0 -8.9 -8.8 -8.5 -8.1 -7.7 -7.1 -6.4 -5.6 -4.6 -3.6
-8.0 -7.9 -7.8 -7.5 -7.1 -6.7 -6.1 -5.4 -4.5 -3.6 -2.6
-7.0 -6.9 -6.8 -6.5 -6.1 -5.6 -5.1 -4.3 -3.5 -2.6 -1.5
-6.0 -5.9 -5.8 -5.5 -5.1 -4.6 -4.0 -3.3 -2.5 -1.5 -0.5
-5.0 -4.9 -4.8 -4.5 -4.1 -3.6 -3.0 -2.3 -1.5 -0.5 0.6
-4.0 -3.9 -3.8 -3.5 -3.1 -2.6 -2.0 -1.3 -0.4 0.5 1.6
-3.0 -2.9 -2.8 -2.5 -2.1 -1.6 -1.0 -0.3 0.6 1.5 2.6
-2.0 -1.9 -1.8 -1.5 -1.1 -0.6 0.0 0.8 1.6 2.6 3.7
-1.0 -0.9 -0.8 -0.5 -0.1 0.4 1.0 1.8 2.6 3.6 4.7
0.0 0.1 0.2 0.5 0.9 1.4 2.0 2.8 3.7 4.6 5.7

In other words, the presence of cloud cover can make it seem several degrees warmer.  How does that work exactly?

The basic idea is to calculate the rate of heat loss for a cloudy atmosphere and then ask the question, "what would the clear sky air temperature need to be in order to produce the equivalent level of heat loss?"

Since the human body exchanges heat with the air using approximately the same mechnanisms, let me describe the thermodynamic models I used for sea ice.  These were applied for various purposes: the first one I wrote was actually for an ocean model.  Surface heat exchange is the number one mechanism forcing ocean circulation.  I took the same model and used it to predict the temperature profile of Antarctic icepack and also to model the growth rate and salinity profile of sea ice.

There are four  mechanism by which ice (and other things, including us) exchange heat with the environment: latent heat, sensible heat, shortwave radiation and longwave radiation.  Latent heat, also called evaporative cooling, is the heat lost through evaporation.  Since water has a very high heat of vapourization, when water evaporates, for instance from the surface of pack ice, from the surface of the ocean or from our bodies, it takes a lot of heat with it.  The main thing determining evaporative cooling is the difference in vapour pressure between the air and the thing being cooled.  High humidity will slow evaporation.

Sensible heat is direct heat transfer through conduction.  In still air, sensible heat is basicly nil.  As the wind picks up, it can get quite high as the wind chill index will tell us.

Shortwave radiative flux is just the sun shining on us.  This is determined partly by geometry: objects pointing directly at the sun will be heated faster than those at an angle.  Since the sun is lower in the sky earlier and later in the day and at higher latitude, ground heating is lower.  It is also determined by cloud cover.  In this model, we are deliberately ignoring solar heating: lets just assume it's night time.

Finally, the one that interests us the most is longwave flux.  This is heat that is radiated directly off us in the form of electromagnetic radiation.  The rate at which this occurs is proportional to the 4th power of temperature times the Stefan-Boltzmann constant: this is the Stefan-Boltzmann law.  This is the equation I used which also accounts for both clouds and air re-radiating the heat back:

 Qlw=-ε σ [0.39*(1-ccccccc)Ts4+Ts3(Ts-Ta)]

where ε is the emissivity coefficient, σ is the Stefan-Boltzmann constant, ccc is the "cloud-cover-coefficient," cc is cloud cover as a fraction between 0 and 1, Tis skin temperature and Ta is air temperature.  I just got this from a paper on surface heat flux, but you can clearly see the Stefan-Boltzmann law embedded in there.

In the sea ice growth simulation I used net heat flux to determine the surface temperature of the ice.
The difference in temperature between the ice and the water (which is at a constant freezing temperature) is given by the heat flux times the conductivity and divided by ice thickness. Since all of the flux calculations also involve a temperature difference, this time between skin (surface) temperature and air temperature in one form or another, we need to solve for the skin temperature.  In other words, we need to invert the equation for flux in order to answer the question, "what value of skin temperature will match heat conduction through the ice (between the ice-water interface and the ice surface) to heat flux between the ice and air?"

The cloud cover index was generated in a similar way, except in this case the skin temperature was held constant at 37 deg. C while it was the air temperature that was being solved for.  It was also much simpler in that I didn't assume any conductive interfaces: heat was lost directly from the skin to the outside air.

While it might be possible to generate an analytic solution, I just fed the whole thing to a numerical root-finding algorithm, in this case bisection.  I didn't account for latent heat since it's hard to say how high it would be: it depends very much on whether the person is sweating or not.  Leaving out sensible heat so that only longwave is present, however, gave very high values, so I set the wind speed at a moderate 4 m/s (14.4 km/h) to produce some sensible heat loss.  You could also assume that the person is wearing a coat and try to match air temperature to conductive heat loss, much as I did with the sea ice growth model.  This would achieve a similar result.

Here is the Python program if you want to mess around with it.