The speed freak
November 9th, 2006When I first started coding the Replica framework, I wanted it to be the fastest meanest framework out there. Due to some misconceptions I used to spend hours and hours optimizing things like the Win32 message loop, write code in ASM. AT one point I even thought I'd write the entire engine in ASM, to get the most out of the CPU. I spent almost 2 months in R&D on optimal vector maths and matrix multiplication.
Looking back I now realize that these are not the things that define the speed of the engine. Yes, they are important, but not as important as an improved graphics technique. Coming to realize it now, writing the whole framework in ASM may have given me just 1 FPS more (if at all), where as a simple improvement in a graphics technique will give me much more than that. Not to mention the inconveniences that I would have to go though managing ASM code. Don't get me wrong, an optimized vector class is important, but something like this is fairly straightforward and definitly did not need months of R&D. Just run it through VTune and you will get the most optimized code in the world.
Everytime the latest Quake source code came out, I'd go running to the vector class and see how they did it… And then be majorly dissapointed. For a long time, I even though that they used to replace the actual optimized code and put in the lame C code in it's place. I was most dissapointed when I started writing a mod for UT2K4. I realized that they were doing vector normalization in the .uc scripts. Mahn, what were they thinking!!!
Now after reading several presentations, articles and looking at some of the world's best engines, I'd say there is a looong list of optimizations that are needed, but none of them are near the vector class or the Win32 message loop as these are something that we will never be able to change.
IMO, a major amount of time spent is used to shovel data in memory and out of it, and just by using memory wisely you can improve the performace of any engine. And be warned this is not just in the CPU. Data shoveling takes place even in the GPU. Just try switching on mipmaps and see what a huge improvement in speed you get.