Thursday, February 14, 2013

Additional thoughts on software design: keeping it simple

There are some aspects of software design that are quite difficult to teach and even more difficult to learn.  Like elegance in design.  I remember helping a colleague with a subroutine to randomly permute a list--just like shuffling a deck of cards.  He had this enormous subroutine that generated random numbers then checked them against other numbers.  I just shook my head.  It's counter-intuitive--at least until you figure it out--but the best way I've found is to generate a list of random numbers, sort them, and then use the shifts so generated to permute the list.  In the language we were working with (Interactive Data Language) it's a one-liner.

I don't remember having trouble coming up with this simply because I've worked with simulated random numbers for so long.  I've also used sorting for a unique application: calculation of equivalent latitude.  Check out the Wikipedia page (which I wrote).  I think the algorithm for calculating equivalent latitude is the epitome of elegance and I've discovered a number of very similar tricks for quite different applications.

Not to say that I haven't created my share of bloated code.  I remember once writing a piece of code to find local maxima in a gridded, two dimensional field.  What I came up with involved dividing the data values into several ranges and using a domain-filling algorithm (not of my own design).  Not very accurate and even less fast.  A simple, gradient ascent algorithm is faster by far (I know because I eventually rewrote it) or what about just checking for a local maximum in the eight nearest-neighbours??  I recall having tried both at the time, but couldn't get either working.  That's the other lesson: sometimes it takes patience to ferret out all the bugs in a piece of code.  It also takes confidence in believing that a particular approach will work.

Another particularly embarrassing piece of bloatware I've written involved binning measurement locations from satellite data in the process of interpolating them to a regular grid.  The program did work quite well, but I'm pretty sure the binning actually made the algorithm slower.  I'm not certain of that since I haven't rewritten it yet.  Meanwhile, the indices it generated were gigantic.  (I'm still convinced that some type of binning can produce significant speed gains in nearest-neighbour type problems, but I haven't yet been able to come up with the goods.)

No comments: