Re: compute_checksum speed

7 Nov 2005


      Waldek Hebisch wrote:
...
Peter N Lewis wrote:
...
One thing that stood out in the profile was that compute_checksum, a
two line function that is called once for each loaded gpi is
responsible for 12% of the total time.  It scans through the entire
gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for
each of the 250 units in my project), that adds up to around 43
seconds of the 6 minute rebuild time.
static gpi_int
compute_checksum_original (unsigned char *buf, gpi_int size)
{
   gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++)
     sum += n * buf[n];
   return sum;
}
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time=  42.1   sum=   46473731586140096   compute_checksum_original
time=  35.0   sum=   46473731586140096   compute_checksum_unrolled
time=  18.8   sum=    8551919141529536   compute_checksum_native
time=  12.2   sum=   -2963523148067389   compute_checksum_shift
time=   7.8   sum=          -852473248   compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling
compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
I did a little test trying also a few other checksums. Remarks:

I do not know if we need a checksum at all
loading interfaces should not be a bottleneck, we should be
able to compile many modules in a single run, loading
interfaces just once in the whole run
On 32-bit AMD when computing 64-bit checksum the bottlneck is
lack of registers, while the fastest checksum is probably
limited by DRAM speed
I slightly surprised by Mac results: G5 Mac can (and should)
use 64-bit arithmetic even for 32-bit applications, also
Mac has many registers
On 64-bit machines current checksum seem to be reasonably fast
The current checksum can be computed using only additions
(compute_checksum_ladd) and on AMD 64 it is the second
On 32-bit machines current checksum can be computed using
mostly 32-bit operations (compute_checksum_short and
compute_checksum_sadd)
If we want a checksum but do not care which one we use
then summing 32-bit words (compute_checksum_lladd) may be
good solution

Two more notes:
1. compute_checksum_lladd is not endianness-neutral (but that may not 
(yet) be a problem).
2. gpidump.pas must be changed also, e.g. to match 
compute_checksum_lladd
{$local R-, X+, W-}
    function ComputeChecksum (const Buf: array of Byte) = Sum: GPIInt;
    var
    	i, iLoopCount: GPIInt;
    	p: PCInteger;
    begin
    	Sum := 0;
    	iLoopCount := ( High( Buf) + 1) div SizeOf( CInteger);
    	p := PCInteger( @Buf);
    	for i:= 1 to iLoopCount do
    	begin
    		Sum := Sum + p^;
    		p := p + 1
    	end
    end;
    {$endlocal}
What I don't understand is that gpidump.pas uses the MedCard type, 
which is four bytes on Mac OS X.
type
      GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
Using gpidump (with an unchanged gpc-200521104) returns "invalid 
endianness marker".
Regards,
Adriaan van Os

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: compute_checksum speed