Archive for March, 2008

Square root without sqrt

Monday, March 17th, 2008

A few months back, I was given an interesting project to work on in an interview test. The test required me to build a small particle system for the Nintendo DS, using the homebrew tool chain. Well I don’t know if I was stupid or just plain blind, but I was unable to find a sqrt in their standard math.h include file. Posting questions on a forum resulted in no replies and time was running short. I had only about a week to make my submission. So I decided to write my own implementation. After all I remember doing this in high scool.

The old school method…

As usual I headed over to google and after a little searching I found it on this website. Well this was exactly what I was looking for, but unfortunately it required too much guessing at every step. Now as we all know computers are not good at guessing. All hopes of implementing a square root function quickly evaporated.

Next day in office, with some help from my coleague Ashwini Kumar, we came up with a very promising solution.

We found something interesting about the patterns a square root takes

sqrt(1.0) = 1.0
sqrt(10.0) = 3.1622
sqrt(100.0) = 10.0
sqrt(1000.0) = 31.622

Now for those of you who have not realized yet, sqrt(1000.0) = sqrt(10.0) * 10.0

Well this could work pretty well and all we needed to do then is just generate a look up table of the first 100 numbers and then just do a multiplication based on it’s 10th power. So I came up with this code…

// LUT for square roots below 100. Takes 396 bytes

const static float SQRT_LUT[] =
{
       1.000000f,      1.414214f,      1.732051f,      2.000000f,
       2.236068f,      2.449490f,      2.645751f,      2.828427f,
       3.000000f,      3.162278f,      3.316625f,      3.464102f,
       3.605551f,      3.741657f,      3.872983f,      4.000000f,
       4.123106f,      4.242640f,      4.358899f,      4.472136f,
       4.582576f,      4.690416f,      4.795832f,      4.898980f,
       5.000000f,      5.099020f,      5.196152f,      5.291502f,
       5.385165f,      5.477226f,      5.567764f,      5.656854f,
       5.744563f,      5.830952f,      5.916080f,      6.000000f,
       6.082763f,      6.164414f,      6.244998f,      6.324555f,
       6.403124f,      6.480741f,      6.557438f,      6.633250f,
       6.708204f,      6.782330f,      6.855655f,      6.928203f,
       7.000000f,      7.071068f,      7.141428f,      7.211102f,
       7.280110f,      7.348469f,      7.416198f,      7.483315f,
       7.549834f,      7.615773f,      7.681146f,      7.745967f,
       7.810250f,      7.874008f,      7.937254f,      8.000000f,
       8.062258f,      8.124039f,      8.185352f,      8.246211f,
       8.306623f,      8.366600f,      8.426149f,      8.485281f,
       8.544003f,      8.602325f,      8.660254f,      8.717798f,
       8.774964f,      8.831760f,      8.888194f,      8.944272f,
       9.000000f,      9.055386f,      9.110434f,      9.165152f,
       9.219544f,      9.273619f,      9.327379f,      9.380832f,
       9.433981f,      9.486833f,      9.539392f,      9.591663f,
       9.643651f,      9.695360f,      9.746795f,      9.797959f,
       9.848858f,      9.899495f,      9.949874f,
} ;

float SqrtLUT(const float fNum)
{
     int      nIdx ;
     int      nApproxLog = 0 ;
     float    fCopy = fNum ;
     float    fMultiplier = 1.0f, fDivider = 0.1f ;

     if (fNum < 100.0f && fNum >= 1.0f) // Use LUT

     {
         nIdx = static_cast<int>(fNum) ;
         if (nIdx)
         {
             --nIdx ;
             return SQRT_LUT[nIdx] ;
         }
     }

     // Number is above 100, bring it down and just multiply the answer

     if (fNum >= 100.0f)
     {
         while (fCopy > 1.0f)
         {
             fCopy *= 0.1f ;

             if ((nApproxLog & ~3) && (nApproxLog & 1))
             {
                 fMultiplier *= 10.0f ;
                 fDivider *= fDivider ;
             }

             ++nApproxLog ;
         }
     }
     // Number is below 1. Bring it with in the 1-100 range

     else
     {
         // ...
         // Something very similar to the if block above
         // ...
     }

     return GuessSqrt(fNum * fDivider) * fMultiplier ;
}

This worked really well for numbers under 10^3, but as the numbers grew larger the errors started growing larger and larger. Even a LUT of doubles instead of floats just offset the errors to a slightly larger numbers.

Time to look for a new solution…

The Babylonian Method

A quick recap of old college textbooks revealed a suprising simple method. This method only produces an estimate, but the accuracy of the estimate grows rapidly on every iteration. For those of you who need a quick recap of the algorithm, here is a recap from http://pballew.net/oldsqrt.htm

1) Guess a number for the square root
2) Divide the number by the guess
3) Average the original guess and the new guess
4) make this average value your new guess and
5) Go back to step 2 if the accuracy of the result is not satifactory…

The problem with this method is that it still requires a guess like the old scool method, but this requires a guess only once and the rest of the guesses are relatively small calculations which don’t need any sort of loop. Besides with our LUT method we can now provde a fairly accurate guess relatively fast. So the initial implementation looked something like this…

const static int gIterations 20 ;

float mSqrt(const float fNum)
{
 float x1 ;

 x1 = GuessSqrt(fNum) ; // Just uses the LUT function mentioned above

 for (int i=0; i<gIterations; ++i)
 {
 x1 = ( ( fNum / x1 ) + x1 ) * 0.5f ;
 }
}

After a little trial and error I decided the the number of iterations are best kept at 12.

The end result

Turns out it aint that bad. On my old 1.6Ghz system this implementation is only 0.002 seconds (approximately) slower than sqrt() when calculating 10000 square roots.

My implementation (available in the attachment below) has a much more complicated implentation that does a loop unwind using C++ templates. How else can I justify spending large amounts of time spent trying to understand Andrei’s book, Modern C++ design . As of now that book has become my latest programming fad. :)

You can download the VS.Net 2005 project of my implementation (Attached at the end of this blog entry). The .cpp file has code to check the speed and accuaracy of the results against the standard CRT sqrt function. Most of the code that you should be interested in, is in main.cpp. If you want to use the function in your own project, just copy everything except the main function in main.cpp

Until next time

Download HBO’s new series ‘In Treatment’ from outside the USA

Sunday, March 9th, 2008

For the impatient… head over directly to the direct download links

For those of you who don't know, HBO recently put the first 15 episodes of their new show 'In Treatment' online. That was good news so the first thing I did was head over there and checkout their 2 min preview. Looked quite interesting, except for the part where the full episodes can't be watched from my country (India), or for that matter any country other than USA [Haven't confirmed this yet].

Now this has got to be one of the lamest attempts to block people from outside USA watching your shows. This is the probably one of the biggest reasons thepiratebay is doing so well. Besides this goes directly against one of my most fundamental beliefs that once information is online anyone in the world should be able to access it. Sure I could catch the episodes on youtube, but then youtube has never been my favorite for offering such low quality videos without giving you a choice. HBOs online stream was much bigger and was on the border line of acceptable. So this is what I wanted and this is what I was going to get.

The first step is to find the .flv files they were linking to. I thought that if I get a direct link to the .flv file I may not have to even watch it online.

Unfortunately I realized that the .flv file was actually a redirect script, to redirect the person based on his IP address. This is another lame attempt to block people. So I just went and found a proxy in USA and managed to get the direct links to the actual .flv file. Turns out that once you have the direct links to the files, you don't even need to be behind a proxy. Might as well make HBO pay for the bandwidth.

In fact even if you are in USA, this will probably be a more convenient method of watching the show, as you can just queue up all the episodes in your favorite download manager and then watch it in your favorite media player.

And before I forget, probably the reason you came here, the…

Direct download links

Laura:
Week 1
Week 2
Week 3

Alex:
Week 1
Week 2
Week 3

Sophie:
Week 1
Week 2
Week 3

Jake and amy:
Week 1
Week 2
Week 3

Paul and Gina:

Week 1
Week 2
Week 3