Silvio a Beccara a écrit:
I ran some tests that make extensive use of large (up to 10000 elements) arrays, up to three dimensions both in GPC and in Fortran (Intel). Even turning on optimisations, GPC is at least five times slower than Fortran in completing the tests. Anyone can tell me why, and possibly some way to make GPC recover some of this lag?
There is one issue which is frequently overlooked when comparing matrix computations in Pascal or C versus Fortran. Fortran stores matrixes colum wise while the others store matrices row wise. Only a matter of taste ? not quite.
The problem is with page breaks inside the CPU (First and Second order internal caches), not the ones reported by linux which pertain to multitasking issues.
To diminish such page break one needs to access matrix elements as consecutively as possible. When algorithmes are taken from well optimized fortran, and translated into C or pascal line by line, like in the popular "Numerical Recipes" book, one would frequently need to reverse the order of nested loops, and even to rewrite somewhat the algorithmes in order to access the elements in right order.
Sometimes more subtle changes are very useful. If you look for example to algorithmes for diagonalizing symmetric matrices which occur in two steps, triadiagonalizion (Tred2), and solution of the tridiagonal form (Tqli), one "wise improvement" is to use the fact that only half the matrix is really needed, due to symmetry, to store in the first step results used in the second. But one side effect is that some matrix elements are then accessed in an (reversed) L fashion instead of pure comlumn (or row) order. It is then vastly better to discard the intermediate results (a loss of order N^2) and to recompute them in the tqli step (N^2 recomputations in an order N^3 algorithm for eigenvectors), to enable to access entirely in right order.
The improvement when computing eigenvectors is of a factor of two or three, with respect to original "Numerical Recipes" algorithmes, whether in C or in Pascal. Exact mileage depend on the version of the cpu (in the x86 series) on the system (djgpp, mingw, W98, XP..), and above all of course on the sizes of the caches and the matrices, but it remains always important.
To give an example. On my machine Pentium4, 256kb L2 cache, 1Gb central memory, running djgpp on a W98 dos Box, with gcc 3.2.3 compilers, and the last gpc (20041017) Diagonalisation of random 1000x1000 double precision matrices, computing eignvalues and eigenvectors always with options -march=pentium4 -O3 Using first "Numerical Recipes" routines, taken from fortran and translated (by them), as unmodified as possible, to C and Pascal: each number is an average of 4 runs to evaluate the effect of different randomization algorithmes: the dispersion of results is quite small
C: 143 seconds F77: 60 sec. Gpc: 120 sec. Gpc with the two modifications indicated (swap of loop orders and elimination of a "wise trick"): 41 sec. !!!
So making meaningfull speed comparisons is more tricky than you seem to suspect.
Maurice