One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
static gpi_int compute_checksum_original (unsigned char *buf, gpi_int size) { gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++) sum += n * buf[n]; return sum; }
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time= 42.1 sum= 46473731586140096 compute_checksum_original time= 35.0 sum= 46473731586140096 compute_checksum_unrolled time= 18.8 sum= 8551919141529536 compute_checksum_native time= 12.2 sum= -2963523148067389 compute_checksum_shift time= 7.8 sum= -852473248 compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
The unrolled version is pretty trivial, chops about 7 seconds off and has no consequences.
Switching to native int instead of gpi_int affects the value of the checksum, reducing it's precision somewhat, but given the purpose of the checksum, I dont think it would impact on its purpose. This drops the time down by a further 16 seconds. The negative is it changes the existing gpi checksum, meaning all gpis would have to be recompiled.
Switching to using shift instead of multiply drops the time a further 6 seconds. However the PowerPC is very good at shifts (and quite good at multiplys for that matter), so this would be something that should be tested at least on an Intel system to see how it compares.
Switching to just basic addition drops the time a further 4 seconds, but at the loss of quite a lot of bits in the checksum.
I would recommend that the loop be unrolled (trivial and no consequences), and then one of the native checksum routines, or something like it, be used, and the code can then use the new routine for new gpi's, and when reading existing gpi's first check if the checksum matches the new value, or failing that if it matches the old value. That way users will never notice the change. A possible patch for this is included below. I implemented this and it dropped the time to build from 6 to 5 minutes.
I've attached the testchecksum.c file I used to check various checksum times, which could be used to compare the speed to the different routines on an Intel platform to see how the relative speed compares so that a generally good solution could be chosen.
Please note, I dont have any attachment to the particular checksum routines included in testchecksum.c, they were just some quick routines I wrote to try out various solutions to see the affect they would have. Any checksum routine that was fast would be fine - a table driven CRC might work for example.
Enjoy, Peter.