Waldek Hebisch wrote:
Peter N Lewis wrote:
One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
static gpi_int compute_checksum_original (unsigned char *buf, gpi_int size) { gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++) sum += n * buf[n]; return sum; }
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time= 42.1 sum= 46473731586140096 compute_checksum_original time= 35.0 sum= 46473731586140096 compute_checksum_unrolled time= 18.8 sum= 8551919141529536 compute_checksum_native time= 12.2 sum= -2963523148067389 compute_checksum_shift time= 7.8 sum= -852473248 compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
I did a little test trying also a few other checksums. Remarks:
- I do not know if we need a checksum at all
- loading interfaces should not be a bottleneck, we should be able to compile many modules in a single run, loading interfaces just once in the whole run
- On 32-bit AMD when computing 64-bit checksum the bottlneck is lack of registers, while the fastest checksum is probably limited by DRAM speed
- I slightly surprised by Mac results: G5 Mac can (and should) use 64-bit arithmetic even for 32-bit applications, also Mac has many registers
- On 64-bit machines current checksum seem to be reasonably fast
- The current checksum can be computed using only additions (compute_checksum_ladd) and on AMD 64 it is the second
- On 32-bit machines current checksum can be computed using mostly 32-bit operations (compute_checksum_short and compute_checksum_sadd)
- If we want a checksum but do not care which one we use then summing 32-bit words (compute_checksum_lladd) may be good solution
Two more notes:
1. compute_checksum_lladd is not endianness-neutral (but that may not (yet) be a problem).
2. gpidump.pas must be changed also, e.g. to match compute_checksum_lladd
{$local R-, X+, W-} function ComputeChecksum (const Buf: array of Byte) = Sum: GPIInt; var i, iLoopCount: GPIInt; p: PCInteger; begin Sum := 0; iLoopCount := ( High( Buf) + 1) div SizeOf( CInteger); p := PCInteger( @Buf); for i:= 1 to iLoopCount do begin Sum := Sum + p^; p := p + 1 end end; {$endlocal}
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Regards,
Adriaan van Os