Re: compute_checksum speed

8 Nov 2005

      Frank Heckenbach wrote:
...
Waldek Hebisch wrote:
...
Frank Heckenbach wrote:
...
Yes (both frontend and backend version). Much more in GPI files
depends on endianness (besides checksums). We could detect and
convert at runtime, but of course, it would slow down things even
more (which I'm sure you wouldn't like too much, Adriaan). The only
benefit would only to people who cross-compile *and* can't recompile
on each host system for some strange reason ...
Well, I belive that we can make GPI reader/writer both faster and
more portable. For example we could bulk convert integers between
endiannes, so that only folks that need comatibility will pay for
it. But ATM I am looking low hanging fruits ...
Yes, it's not so trivial, as the nodes don't consist only of
integers, also of bytes/bit fields and strings. So at least some
parsing effort would be required, comparable in size to
store_node_fields and load_node, and has to be kept in sync with
them (=> more maintenance effort for future changes).
...
...
As far as I'm concerned, feel free to change the checksums. I don't
insist on the current algorithm (I think the comment in module.c
doesn't really suggest I do ;-). But I wouldn't like to abandon
checksums. AFAICS, they do catch some cases which would otherwise
lead to obscure bugs. Perhaps GP will avoid such cases in the future
(when all GP bugs are fixed :-), but even though --automake will
then be faded out, such problems can still arise with hand-made make
rules (which will probably always be used).
IIUC the most likely bug avoided due to checksums is reading
inconsistent GPI file. That can be detected putting a random number
(stamp) in the GPI header and checking that at the end of reading
(or when the reader finds something wrong) the stamp is still the same.
This wouldn't protect against corruption of the file's contents
(which could happen on bad media, after system crashes, etc.). You
must be thinking about an entirely different problem, when a number
in the header changes while reading the file IIUYC!? Are you
thinking about two simultaneous processes writing to the same file
or something like that?
About writing to GPI during compilation. It caused by multiple
recompilations during a run but probably makes problem with
multiple recompilations much worse. Yes, random stamp does not
detect bad media. It can protect against crashes (if you write
a copy at the end of file).
...
But anyway, there are several uses of checksums, see the notes in
internals.texi. Perhaps we've been talking about different things
all the time. Protection against inconsistent GPI files is just one,
and IMHO the least important, one. The more important one is to
protect against inconsistent imports, including indirect imports.
AFAICS checksums buys you only one advantage over random stamps: 
you can verify that the checksum you wrote out agrees with the
content you read in. In other words, checksums protect against
bad media (buggy OS counts as bad media too). For all other uses
random stamps work as well.
I am not sure how important is protection against bad media: most
external media use ECC, so probability of error is quite low, and
when undetected error occur it is likely to be so serious that
we detect incosistency on reading.
...
...
Concering GP versus `--automake': I think we need to fix main automake
problems if we want GP to work well. Basically:
Just to clarify what we're talking about:

If you mean by "automake problems" problems with the current
`--automake' implementation, I can't see why we need to fix them
in order to make GP work well, as GP is there to replace automake.

Main automake problems to me are difficulties handling indirect
recompilation needs. -- That's obvious from the way of doing
things: Automake only has a local view, so the best it can do is
try to rescue things in the last minute, i.e. recompiling other
modules when reading their GPI files was already started. Add
indirect requirements and cyclic dependencies to it, and you get
all the problems we have with automake. Whereas GP (just like
make) has a global view and can do things in the right order from
the beginning.

My point is: cyclic dependencies are an artifact which will vanish
when we properly distinguish implementations from interfaces. That
is fundamental problem, common to both automake and GP. Neither
make nor automake will work well with cyclic dependencies.
...
...

to have separate GPI file for implementation (so that interface

GPI stays consistent during compilation)
This would enable a `-j' option to compile the implementation of A
while another process compiles a B that uses A. IMHO, that would be
a minor optimization, otherwise I can't see it as a big problem.
ATM GP want to recompile the same file over and over again. You think
that more clever check in GP will solve this. Maybe. But the traditional
method, which works quite well is: compare timestamps. If timestamps of
dependencies are earlier then of target, then target is up to date. 
That works well if dependencies form a dag.
Mixing implementation with interface breaks timestamps. I understand
that in GP you want to do better than timestamps allow. But I find
it slightly inconsistent that you want checksums in GPI files (here
you want redundancy for better error checking), but in case of GP
you belive that you can ignore possibility of using timestamps as
a sanity check.
We wrote about this multiple time without a conlusion. So a simple
question is: would you object if I add a new GPI file, say
`module-imp.gpi' and put all implementation info here?
...
...

compile iterface and implementation separately (even if in the same

file)
Now referring to your 1% above ... ;-) I think this adds to
compilation time (e.g., by having to load all imported GPIs twice),
and I'd guess it would be a little more than 1%. Therefore I'd
prefer (and that's what GP does) to do so only when needed, i.e.
with cyclic imports, and not in the normal case.
Sure, you can try to optimize. But the process should be equivalent,
otherwise you will have spurious recompilations and lose more time.
...
...
For make (and possibly GP):

have option to print _all_ dependencies (includes + imports)

It might have some advantages, but also drawbacks. In particular,
AFAICS, we'd need a do-nothing parse run in GPC, i.e., basically a
nop-flag check in every nontrivial action.
We do have `-fsyntax-only' already. In fact, prefered way is to
compile file and find dependencies at the same time -- then make
uses old dependencies, if they are OK (most of the time) then
it goes on, if not make compiles new dependencies and again compiles
the file.
-- 
                              Waldek Hebisch
hebisch@math.uni.wroc.pl 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: compute_checksum speed