Square root without sqrt

March 17th, 2008

A few months back, I was given an interesting project to work on in an interview test. The test required me to build a small particle system for the Nintendo DS, using the homebrew tool chain. Well I don’t know if I was stupid or just plain blind, but I was unable to find a sqrt in their standard math.h include file. Posting questions on a forum resulted in no replies and time was running short. I had only about a week to make my submission. So I decided to write my own implementation. After all I remember doing this in high scool.

The old school method…

As usual I headed over to google and after a little searching I found it on this website. Well this was exactly what I was looking for, but unfortunately it required too much guessing at every step. Now as we all know computers are not good at guessing. All hopes of implementing a square root function quickly evaporated.

Next day in office, with some help from my coleague Ashwini Kumar, we came up with a very promising solution.

We found something interesting about the patterns a square root takes

sqrt(1.0) = 1.0
sqrt(10.0) = 3.1622
sqrt(100.0) = 10.0
sqrt(1000.0) = 31.622

Now for those of you who have not realized yet, sqrt(1000.0) = sqrt(10.0) * 10.0

Well this could work pretty well and all we needed to do then is just generate a look up table of the first 100 numbers and then just do a multiplication based on it’s 10th power. So I came up with this code…

// LUT for square roots below 100. Takes 396 bytes

const static float SQRT_LUT[] =
{
       1.000000f,      1.414214f,      1.732051f,      2.000000f,
       2.236068f,      2.449490f,      2.645751f,      2.828427f,
       3.000000f,      3.162278f,      3.316625f,      3.464102f,
       3.605551f,      3.741657f,      3.872983f,      4.000000f,
       4.123106f,      4.242640f,      4.358899f,      4.472136f,
       4.582576f,      4.690416f,      4.795832f,      4.898980f,
       5.000000f,      5.099020f,      5.196152f,      5.291502f,
       5.385165f,      5.477226f,      5.567764f,      5.656854f,
       5.744563f,      5.830952f,      5.916080f,      6.000000f,
       6.082763f,      6.164414f,      6.244998f,      6.324555f,
       6.403124f,      6.480741f,      6.557438f,      6.633250f,
       6.708204f,      6.782330f,      6.855655f,      6.928203f,
       7.000000f,      7.071068f,      7.141428f,      7.211102f,
       7.280110f,      7.348469f,      7.416198f,      7.483315f,
       7.549834f,      7.615773f,      7.681146f,      7.745967f,
       7.810250f,      7.874008f,      7.937254f,      8.000000f,
       8.062258f,      8.124039f,      8.185352f,      8.246211f,
       8.306623f,      8.366600f,      8.426149f,      8.485281f,
       8.544003f,      8.602325f,      8.660254f,      8.717798f,
       8.774964f,      8.831760f,      8.888194f,      8.944272f,
       9.000000f,      9.055386f,      9.110434f,      9.165152f,
       9.219544f,      9.273619f,      9.327379f,      9.380832f,
       9.433981f,      9.486833f,      9.539392f,      9.591663f,
       9.643651f,      9.695360f,      9.746795f,      9.797959f,
       9.848858f,      9.899495f,      9.949874f,
} ;

float SqrtLUT(const float fNum)
{
     int      nIdx ;
     int      nApproxLog = 0 ;
     float    fCopy = fNum ;
     float    fMultiplier = 1.0f, fDivider = 0.1f ;

     if (fNum < 100.0f && fNum >= 1.0f) // Use LUT

     {
         nIdx = static_cast<int>(fNum) ;
         if (nIdx)
         {
             --nIdx ;
             return SQRT_LUT[nIdx] ;
         }
     }

     // Number is above 100, bring it down and just multiply the answer

     if (fNum >= 100.0f)
     {
         while (fCopy > 1.0f)
         {
             fCopy *= 0.1f ;

             if ((nApproxLog & ~3) && (nApproxLog & 1))
             {
                 fMultiplier *= 10.0f ;
                 fDivider *= fDivider ;
             }

             ++nApproxLog ;
         }
     }
     // Number is below 1. Bring it with in the 1-100 range

     else
     {
         // ...
         // Something very similar to the if block above
         // ...
     }

     return GuessSqrt(fNum * fDivider) * fMultiplier ;
}

This worked really well for numbers under 10^3, but as the numbers grew larger the errors started growing larger and larger. Even a LUT of doubles instead of floats just offset the errors to a slightly larger numbers.

Time to look for a new solution…

The Babylonian Method

A quick recap of old college textbooks revealed a suprising simple method. This method only produces an estimate, but the accuracy of the estimate grows rapidly on every iteration. For those of you who need a quick recap of the algorithm, here is a recap from http://pballew.net/oldsqrt.htm

1) Guess a number for the square root
2) Divide the number by the guess
3) Average the original guess and the new guess
4) make this average value your new guess and
5) Go back to step 2 if the accuracy of the result is not satifactory…

The problem with this method is that it still requires a guess like the old scool method, but this requires a guess only once and the rest of the guesses are relatively small calculations which don’t need any sort of loop. Besides with our LUT method we can now provde a fairly accurate guess relatively fast. So the initial implementation looked something like this…

const static int gIterations 20 ;

float mSqrt(const float fNum)
{
 float x1 ;

 x1 = GuessSqrt(fNum) ; // Just uses the LUT function mentioned above

 for (int i=0; i<gIterations; ++i)
 {
 x1 = ( ( fNum / x1 ) + x1 ) * 0.5f ;
 }
}

After a little trial and error I decided the the number of iterations are best kept at 12.

The end result

Turns out it aint that bad. On my old 1.6Ghz system this implementation is only 0.002 seconds (approximately) slower than sqrt() when calculating 10000 square roots.

My implementation (available in the attachment below) has a much more complicated implentation that does a loop unwind using C++ templates. How else can I justify spending large amounts of time spent trying to understand Andrei’s book, Modern C++ design . As of now that book has become my latest programming fad. :)

You can download the VS.Net 2005 project of my implementation (Attached at the end of this blog entry). The .cpp file has code to check the speed and accuaracy of the results against the standard CRT sqrt function. Most of the code that you should be interested in, is in main.cpp. If you want to use the function in your own project, just copy everything except the main function in main.cpp

Until next time


Download HBO’s new series ‘In Treatment’ from outside the USA

March 9th, 2008

For the impatient… head over directly to the direct download links

For those of you who don't know, HBO recently put the first 15 episodes of their new show 'In Treatment' online. That was good news so the first thing I did was head over there and checkout their 2 min preview. Looked quite interesting, except for the part where the full episodes can't be watched from my country (India), or for that matter any country other than USA [Haven't confirmed this yet].

Now this has got to be one of the lamest attempts to block people from outside USA watching your shows. This is the probably one of the biggest reasons thepiratebay is doing so well. Besides this goes directly against one of my most fundamental beliefs that once information is online anyone in the world should be able to access it. Sure I could catch the episodes on youtube, but then youtube has never been my favorite for offering such low quality videos without giving you a choice. HBOs online stream was much bigger and was on the border line of acceptable. So this is what I wanted and this is what I was going to get.

The first step is to find the .flv files they were linking to. I thought that if I get a direct link to the .flv file I may not have to even watch it online.

Unfortunately I realized that the .flv file was actually a redirect script, to redirect the person based on his IP address. This is another lame attempt to block people. So I just went and found a proxy in USA and managed to get the direct links to the actual .flv file. Turns out that once you have the direct links to the files, you don't even need to be behind a proxy. Might as well make HBO pay for the bandwidth.

In fact even if you are in USA, this will probably be a more convenient method of watching the show, as you can just queue up all the episodes in your favorite download manager and then watch it in your favorite media player.

And before I forget, probably the reason you came here, the…

Direct download links

Laura:
Week 1
Week 2
Week 3

Alex:
Week 1
Week 2
Week 3

Sophie:
Week 1
Week 2
Week 3

Jake and amy:
Week 1
Week 2
Week 3

Paul and Gina:

Week 1
Week 2
Week 3


Quick fix for asynchronous subtitle (.srt) files

February 10th, 2008

So, yesterday I was getting down to watch a relaxing movie and I realized that (yet again), that the SubRip file was not properly synced with the video. I decided that I've had enough of this. It's happened far too many times. So I opened up the .srt file in TextPad and realized everything is right there in plain text. That's probably because M$ or the MPAA did not have a say in the matter. Anyway, so instead of spending 30 mins searching for a new subtitle file on DivxSubtitles or OpenSubtitles I decided to write my own app that will offset the timings in an srt file.

For those of you in a similar situation, I'll save you 30 mins of searching or writing a quick script.
Just click here and follow the instructions on the following page. You should be done in under a minute.


Compile time assertion

January 18th, 2008

Yesterday I finally found a use for compile time checking. I remembered reading this up in Andrei's book 'Modern C++ Design'. So I got down to implementing it in VS.NET 2005.

His code (code below) seemed to work well in theory, but knowing MS, that's as far it goes.

template<bool> struct CompileTimeChecker
{
     CompileTimeChecker(...);
};
template<> struct CompileTimeChecker<false> { };
#define STATIC_CHECK(expr, msg)
     {
          class ERROR_##msg {};
          (void)sizeof(CompileTimeChecker<(expr) != 0>((ERROR_##msg())));
     }

Turns out that MS VS.Net had some complaints about the code above…

error C2066: cast to function type is illegalerror C2070: 'CompileTimeChecker (main::ERROR_ONE_NOT_EQUAL_TO_ONE (__cdecl *)(void))': illegal sizeof operand

At first I thought I’d take the easy way out and just throw an error using a #pragma. But then that would beat the very purpose of the code being compiler independent. After some 30 mins of trial and error, managed to get the same code (code below) working with a few minor modifications.

template<bool> struct CompileTimeChecker
{
     CompileTimeChecker(...){};
};

template<> struct CompileTimeChecker<false> { };

#define STATIC_CHECK(expr, msg)
{
     class ERROR_##msg {};
     ERROR_##msg y ;
     (void)sizeof(CompileTimeChecker<(expr) != 0>((y)));
}

Now calling something like…

STATIC_CHECK(sizeof(char)>sizeof(int), SIZEOF_CHAR_NOT_GT_SIZEOF_INT) ;

will result in a compile time error that looks something like this…

error C2440: '<function-style-cast>' : cannot convert from 'main::ERROR_SIZEOF_CHAR_NOT_GT_SIZEOF_INT' to 'CompileTimeChecker'> No constructor could take the source type, or constructor overload resolution was ambiguous

Until next time, happy coding!


Random thought: Some advice from experts before you become that dark hacker

December 8th, 2007

I always find it hard as to why there are not so many genius hackers out there making a gazillion dollars a day hacking high profile government servers. And then I read this Q&A session with Bruce Schneier . Well basically he says…

"If I make a computer security mistake in a book, for a consulting client, at BT it’s a mistake. It might be expensive, but I learn from it and move on. As a criminal, a mistake likely means jail time time I can’t spend earning my criminal living. For this reason, it’s hard to improve as a criminal. And this is why there are more criminal masterminds in the movies than in real life."

I highly recommend that you read the Q&A session . It's about the past, present and future of computer security, selecting the right passwords and digital security for the ignorant common man like you and me.

PS: For those of you who don't know Bruce Schneier is one of the most respected computer security experts. I take his advice pretty seriously.


Trekking at the Sayhadri range

November 30th, 2007

STA_7767-STO_7781
Just came back from a trek in the Sayhadri range in Maharashtra, India. Took a 10 day break from the routine. This is after a two year break from trekking. I almost forgot what it was like to be out in the wilderness. It was a mind blowing experience, sleeping under the stars, breathing unpolluted air and visiting caves and forts built in 200BC. I thought that such stuff existed only in video games. All in all, it was an easy trek. It's one that I have wanted to do for some time now and one that I’d like to do again in the monsoons. The grass is greener (literally) and the landscape is ten times more beautiful. As you might guess there is a photo album where you can check out the pics.

For those of you interested, the trek was organized by YHAI Malad unit. The trek schedule was something like the one below, but we deviated from plan here and there, but below is what happened during the trek.

1st Day (17th Nov 2007) Base Camp Karjat
Arrival at Karjat.

2nd Day – Rajmachi Camp.
The journey started with a half an hour auto ride. The actual adventure began from Kondivade with with a walk with awe-inspiring views of Ulhas Valley. Along the way to Rajmachi we visited Kondhana Caves. In the evening there was a sight seeing program of lake and temple. We even took a dip in the lake.

3rd Day – Rajmachi Darshan.
In the morning there was a sightseeing program of twin forts of Rajmachi. In the evening we were kept busy with a treasure hunt.

4th Day – Ekvira Camp.
There was a rather flat walk through the Valvan village on the bank of Shirota Lake. Even though it was flat I would say it was the toughest route in the trek as we had very limited water and it was blazing hot. Then after a gradual climb we reached the top of Ekvira mountain. From there we were able to see the Valvan dam and Lonavala city. After that we climbed down to the famous Karle caves. After seeing the Ekvira caves and Ekvira temple we reached Ekvira camp.

5th Day – Lohagad
The trek started towards Village Bhaje via a tempo ride. After visiting beautiful Bhaje caves group we climbed Fort Visapur and then reached Lohagad wadi camp.

6th Day – Back To Civilization.
In the morning we visited Fort Lohgad. After this we had a certificate distribution program after which everyone split up and went their way

After the trek, I spent a few days in Bombay. Bombay reiterated it's coolness. That is where things happen and where you live your life to the fullest. I will be uploading a few pics from that trip as well.

In the mean while I uploaded an old album of an office party to the site. It’s one of those never ending jobs where I need to sync up all the albums from my HDD to my site.


Coding best practices: Minimizing the use of macros

November 30th, 2007

This coding ‘best practice’ is the first in the series of many posts that I will be making. Many of them have been introduced to me only recently, so it’s unlikely you will see this in my old code. I am making these posts as I think they are essential to every C++ programmer.

Lets get started on the first one…
Some may call this an opinion, but I call this a good programming practice.
Don’t declare a macro unless you cannot do without them. Below I have presented some problems plaguing macros and some workarounds.

1. Macros modify the code before the compiler sees it. This makes it harder for you to debug your program, step into code and find bugs.

2. Macros don't obey scoping rules. So the following example

// The macro below could very well be in another header file
// that you never seen or totally forgot about
#define width 20

// And then you name a variable the same name,
// forgetting there was a macro with the same name
// It’s than you think to forget, especially if the macro is
// hidden in one of those header files
void RenderRect(int height, int width) // …

Workarounds
Below are some workarounds for some common uses you might have for macros

// 1 //// Global strings
// macro:
#define SZ_HELLO "Hello world"
// workaround:
char const szHello[] = "Hello world";

// 2 //// Global setting
// macro:
#define NMAX 5
// workaround:
const int nMax = 5;

// 3 //// Generic functions
// macro:
#define SQUARE( x ) ((x)*(x))
// workaround:
template<typename T>
T inline square(T &x)
{
  return x * x ;
}

End note

Of course there are some places where only a macro can get the job done. In these cases go ahead and use a macro, but follow a strict naming convention for your macros so you will not confuse them with a standard variable later on.


My experience with optimizing a game engine

October 26th, 2007

"You're bound to be unhappy if you optimize everything"
Donald Knuth

As with every new programmer I would spend endless hours trying to squeeze optimization out of every small piece of code on my engine . I wanted it to be the fastest. I always thought that writing every line of code in assembly would make a super fast engine, not knowing that the compiler would probably optimize my code better. I would spend long hours studying SSE instructions. Well one advantage is that I am no longer afraid of ASM , but I was generally unhappy that my engine was not be super optimized and nearly as fast as a commercial game, even though it was probably doing only 25% of the tasks. Colleagues (just as inexperienced as me at the time) would come up to me and say 'Hey, you know this guy, he wrote a whole OS in assembly'. And we would all think that this guy wrote the best OS in the world. But hey, wait a minute. Then why are over 80% of the desktops still using Windows?


Restricted access to albums

August 26th, 2007

I had some free time this weekend so I spent the time learning Drupal API and rolled out a restricted access module. Most images in my albums with pictures of my family, friends and me will have restricted access. Writing a drupal module was not as hard as I thought it would be and took me only one day to write with other dependencies.

This was a much needed module. I was never happy with the fact that some unknown person out there could have such a close eye on my private life. I'm not so sure my friends would have liked it as well. My friends/family will get URLs that will log them in automatically. This way i'm keeping the hastle down to a minimum.


Code is a tricky friend…

June 7th, 2007

Well, it's been a long time since I have added anything more than pictures to my blog.

A lot has happened since I first started this blog.

The most important realization being that my old programming style is utterly junky.
The main problem is that I always sit down and start coding without any design and planning what so ever. And finally when I hit a road block I change the code. In fact the engine that I wrote went through so many changes, it's impossible to say it follows any solid programming style.
To a large extent I knew my programming style was quite crappy, but just didn't know any alternate way. I was always against C++ inheritance and was always for a static program structure, but really didn't know any proper implementation technique. Luckily for me, I discovered a whole new world of templates. Boy are they great!!! I think inheritance should be left at college level just so that we pass our exams.
Anyway this means that my precious project is a whole pile of binary junk. And now I have to start over. Oh well. Better late than never.
And then I learn that a whole bunch of other techniques are actually a big no-no. Some of which I am still quite hesitant to give up.

Luckily for me I met a really talented guy and he explained some good programming practices and I soon realized I was breaking most of them. There are one or two I don't agree with him on, but those don't make such a big difference in any case IMHO. On the other hand I was quite happy to realize that my basic priciples like having static code and getting the compiler to do as much work for me were the best principles to follow. It's just that I had no idea or techniue of how to do these things. (I'm actually making excuses for bad programming here)
I plan to write out the good programming practices that were told to me, so that everyone can start writing better code in future.

I also seem to have a decent amount of exposure to Unreal Engine 2.0 and learned a lot of new tricks in the trade. One thing I was surprised to learn is how well they followed most/all the good programming practices. It's amazing. I always knew that they were doing it right. This also means that my precious project is junk overnight and I have to start over. I have no idea if I am going to and if so when. I am thinking of keeping it small and simple and making just a rendering framework to test shaders and other things I write. Besides I still have another project that I'd like to release. I really need to open this one up and see how badly it has been contaminated by my bad planning and poor programming practices.

Other than that I put up an article that I wrote a long time back. I tried submitting it to DevMaster.net and gamedev.net. Both still haven't put it even after nearly 6 months after initial submission was made. DevMater.net showed interest in putting it up strangely. I got tired of waiting and put it up on my own site. At least it is out there, and boy I am happy I did not put any code in to the article. In any case part 2 will be coming soon. Just need to get some formatting done on it.