compute_checksum speed

5 Aug 2005


      One thing that stood out in the profile was that compute_checksum, a 
two line function that is called once for each loaded gpi is 
responsible for 12% of the total time.  It scans through the entire 
gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for 
each of the 250 units in my project), that adds up to around 43 
seconds of the 6 minute rebuild time.
static gpi_int
compute_checksum_original (unsigned char *buf, gpi_int size)
{
   gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++)
     sum += n * buf[n];
   return sum;
}
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time=  42.1   sum=   46473731586140096   compute_checksum_original
time=  35.0   sum=   46473731586140096   compute_checksum_unrolled
time=  18.8   sum=    8551919141529536   compute_checksum_native
time=  12.2   sum=   -2963523148067389   compute_checksum_shift
time=   7.8   sum=          -852473248   compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling 
compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
The unrolled version is pretty trivial, chops about 7 seconds off and 
has no consequences.
Switching to native int instead of gpi_int affects the value of the 
checksum, reducing it's precision somewhat, but given the purpose of 
the checksum, I dont think it would impact on its purpose.  This 
drops the time down by a further 16 seconds.  The negative is it 
changes the existing gpi checksum, meaning all gpis would have to be 
recompiled.
Switching to using shift instead of multiply drops the time a further 
6 seconds.  However the PowerPC is very good at shifts (and quite 
good at multiplys for that matter), so this would be something that 
should be tested at least on an Intel system to see how it compares.
Switching to just basic addition drops the time a further 4 seconds, 
but at the loss of quite a lot of bits in the checksum.
I would recommend that the loop be unrolled (trivial and no 
consequences), and then one of the native checksum routines, or 
something like it, be used, and the code can then use the new routine 
for new gpi's, and when reading existing gpi's first check if the 
checksum matches the new value, or failing that if it matches the old 
value.  That way users will never notice the change.  A possible 
patch for this is included below.  I implemented this and it dropped 
the time to build from 6 to 5 minutes.
I've attached the testchecksum.c file I used to check various 
checksum times, which could be used to compare the speed to the 
different routines on an Intel platform to see how the relative speed 
compares so that a generally good solution could be chosen.
Please note, I dont have any attachment to the particular checksum 
routines included in testchecksum.c, they were just some quick 
routines I wrote to try out various solutions to see the affect they 
would have.  Any checksum routine that was fast would be fine - a 
table driven CRC might work for example.
Enjoy,
    Peter.
-- 
http://www.stairways.com/  http://download.stairways.com/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

compute_checksum speed