A few months back, I was given an interesting project to work on in an interview test. The test required me to build a small particle system for the Nintendo DS, using the homebrew tool chain. Well I don’t know if I was stupid or just plain blind, but I was unable to find a sqrt in their standard math.h include file. Posting questions on a forum resulted in no replies and time was running short. I had only about a week to make my submission. So I decided to write my own implementation. After all I remember doing this in high scool.
The old school method…
As usual I headed over to google and after a little searching I found it on this website. Well this was exactly what I was looking for, but unfortunately it required too much guessing at every step. Now as we all know computers are not good at guessing. All hopes of implementing a square root function quickly evaporated.
Next day in office, with some help from my coleague Ashwini Kumar, we came up with a very promising solution.
We found something interesting about the patterns a square root takes
sqrt(1.0) = 1.0
sqrt(10.0) = 3.1622
sqrt(100.0) = 10.0
sqrt(1000.0) = 31.622
Now for those of you who have not realized yet, sqrt(1000.0) = sqrt(10.0) * 10.0
Well this could work pretty well and all we needed to do then is just generate a look up table of the first 100 numbers and then just do a multiplication based on it’s 10th power. So I came up with this code…
// LUT for square roots below 100. Takes 396 bytes
const static float SQRT_LUT[] =
{
1.000000f, 1.414214f, 1.732051f, 2.000000f,
2.236068f, 2.449490f, 2.645751f, 2.828427f,
3.000000f, 3.162278f, 3.316625f, 3.464102f,
3.605551f, 3.741657f, 3.872983f, 4.000000f,
4.123106f, 4.242640f, 4.358899f, 4.472136f,
4.582576f, 4.690416f, 4.795832f, 4.898980f,
5.000000f, 5.099020f, 5.196152f, 5.291502f,
5.385165f, 5.477226f, 5.567764f, 5.656854f,
5.744563f, 5.830952f, 5.916080f, 6.000000f,
6.082763f, 6.164414f, 6.244998f, 6.324555f,
6.403124f, 6.480741f, 6.557438f, 6.633250f,
6.708204f, 6.782330f, 6.855655f, 6.928203f,
7.000000f, 7.071068f, 7.141428f, 7.211102f,
7.280110f, 7.348469f, 7.416198f, 7.483315f,
7.549834f, 7.615773f, 7.681146f, 7.745967f,
7.810250f, 7.874008f, 7.937254f, 8.000000f,
8.062258f, 8.124039f, 8.185352f, 8.246211f,
8.306623f, 8.366600f, 8.426149f, 8.485281f,
8.544003f, 8.602325f, 8.660254f, 8.717798f,
8.774964f, 8.831760f, 8.888194f, 8.944272f,
9.000000f, 9.055386f, 9.110434f, 9.165152f,
9.219544f, 9.273619f, 9.327379f, 9.380832f,
9.433981f, 9.486833f, 9.539392f, 9.591663f,
9.643651f, 9.695360f, 9.746795f, 9.797959f,
9.848858f, 9.899495f, 9.949874f,
} ;
float SqrtLUT(const float fNum)
{
int nIdx ;
int nApproxLog = 0 ;
float fCopy = fNum ;
float fMultiplier = 1.0f, fDivider = 0.1f ;
if (fNum < 100.0f && fNum >= 1.0f) // Use LUT
{
nIdx = static_cast<int>(fNum) ;
if (nIdx)
{
--nIdx ;
return SQRT_LUT[nIdx] ;
}
}
// Number is above 100, bring it down and just multiply the answer
if (fNum >= 100.0f)
{
while (fCopy > 1.0f)
{
fCopy *= 0.1f ;
if ((nApproxLog & ~3) && (nApproxLog & 1))
{
fMultiplier *= 10.0f ;
fDivider *= fDivider ;
}
++nApproxLog ;
}
}
// Number is below 1. Bring it with in the 1-100 range
else
{
// ...
// Something very similar to the if block above
// ...
}
return GuessSqrt(fNum * fDivider) * fMultiplier ;
}
This worked really well for numbers under 10^3, but as the numbers grew larger the errors started growing larger and larger. Even a LUT of doubles instead of floats just offset the errors to a slightly larger numbers.
Time to look for a new solution…
The Babylonian Method
A quick recap of old college textbooks revealed a suprising simple method. This method only produces an estimate, but the accuracy of the estimate grows rapidly on every iteration. For those of you who need a quick recap of the algorithm, here is a recap from http://pballew.net/oldsqrt.htm
1) Guess a number for the square root
2) Divide the number by the guess
3) Average the original guess and the new guess
4) make this average value your new guess and
5) Go back to step 2 if the accuracy of the result is not satifactory…
The problem with this method is that it still requires a guess like the old scool method, but this requires a guess only once and the rest of the guesses are relatively small calculations which don’t need any sort of loop. Besides with our LUT method we can now provde a fairly accurate guess relatively fast. So the initial implementation looked something like this…
const static int gIterations 20 ;
float mSqrt(const float fNum)
{
float x1 ;
x1 = GuessSqrt(fNum) ; // Just uses the LUT function mentioned above
for (int i=0; i<gIterations; ++i)
{
x1 = ( ( fNum / x1 ) + x1 ) * 0.5f ;
}
}
After a little trial and error I decided the the number of iterations are best kept at 12.
The end result
Turns out it aint that bad. On my old 1.6Ghz system this implementation is only 0.002 seconds (approximately) slower than sqrt() when calculating 10000 square roots.
My implementation (available in the attachment below) has a much more complicated implentation that does a loop unwind using C++ templates. How else can I justify spending large amounts of time spent trying to understand Andrei’s book, Modern C++ design . As of now that book has become my latest programming fad.
You can download the VS.Net 2005 project of my implementation (Attached at the end of this blog entry). The .cpp file has code to check the speed and accuaracy of the results against the standard CRT sqrt function. Most of the code that you should be interested in, is in main.cpp. If you want to use the function in your own project, just copy everything except the main function in main.cpp
Until next time