Re: GPC speed

25 Oct 2004


      Silvio a Beccara a écrit:
...
I ran some tests that make extensive use of large (up to 10000 elements) 
arrays, up to three dimensions both in GPC and in Fortran (Intel).
Even turning on optimisations, GPC is at least five times slower than Fortran 
in completing the tests. 
Anyone can tell me why, and possibly some way to make GPC recover some of this 
lag?
There is one issue which is frequently overlooked when comparing
matrix computations in Pascal or C versus Fortran.
Fortran stores matrixes colum wise while the others store matrices
row wise. Only a matter of taste ? not quite.
The problem is with page breaks inside the CPU (First and Second order
internal caches), not the ones reported by linux which pertain
to multitasking issues.
To diminish such page break one needs to access matrix elements
as consecutively as possible. When algorithmes are taken from well
optimized fortran, and translated into C or pascal line by line,
like in the popular "Numerical Recipes" book, one would frequently
need to reverse the order of nested loops,  and even to rewrite
somewhat the algorithmes in order to access the elements in right order.
Sometimes more subtle changes are very useful. If you look for example to
algorithmes for diagonalizing symmetric matrices which occur in two steps,
triadiagonalizion (Tred2), and solution of the tridiagonal form (Tqli),
one "wise improvement" is to use the fact that only half the matrix
is really needed, due to symmetry, to store in the first step results
used in the second. But one side effect is that some matrix elements are then
accessed in an (reversed) L fashion instead of pure comlumn (or row)
order. It is then vastly better to discard the intermediate results
(a loss of order N^2) and to recompute them in the tqli step
(N^2 recomputations in an order N^3 algorithm for eigenvectors),
to enable to access entirely in right order.
The improvement when computing eigenvectors is of a factor of two or three,
with respect to original "Numerical Recipes" algorithmes, whether in C
or in Pascal. Exact mileage depend on the version of the cpu (in the x86 series)
on the system (djgpp, mingw, W98, XP..), and above all of course on the sizes of
the caches and the matrices, but it remains always important.
To give an example.
On my machine Pentium4, 256kb L2 cache, 1Gb central memory,
running djgpp on a W98 dos Box, with gcc 3.2.3 compilers,
and the last gpc (20041017)
Diagonalisation of random 1000x1000 double precision matrices,
computing eignvalues and eigenvectors
always with options -march=pentium4 -O3
Using first "Numerical Recipes" routines, taken from fortran and translated
(by them), as unmodified as possible, to C and Pascal: each number is an average
of 4 runs to evaluate the effect of different randomization algorithmes:
the dispersion of results is quite small
C:    143 seconds
F77:  60  sec.
Gpc:  120 sec.
Gpc with the two modifications indicated (swap of loop orders and elimination
of a "wise trick"):
       41 sec. !!!
So making meaningfull speed comparisons is more tricky than you seem to suspect.
Maurice
-- 
        Maurice Lombardi
Laboratoire de  Spectrometrie Physique,
Universite Joseph Fourier de Grenoble, BP87
38402 Saint Martin d'Heres Cedex     FRANCE
Tel: 33 (0)4 76 51 47 51
Fax: 33 (0)4 76 63 54 95
mailto:Maurice.Lombardi@ujf-grenoble.fr

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: GPC speed