I've built GPC 3.4.6 and 4.3.6 on Debian Stretch (64bit)
Running benchmarks on a Perlin Noise routine I get
3.4.6    5.6 secs
4.3.6   11.0 secs   !!
The difference seems to be mainly down to a failure of 4.3.6 to inline.
Call functions themselves may not take much time but I believe the real
problem is that without the subroutines inlined, the main Noise function
is no longer a leaf function, and therefore under register pressure.
â¦.....
      a := Lerp (sx, u10, v10);
      b := Lerp (sx, u11, v11);
      d := Lerp (sy,  a,  b);
    Noise3 := Lerp(sz, c, d);
End;
This code produces with 3.4.6
	{ Hard to know where it starts without line numbers ! }
	addsd	%xmm9, %xmm7
	addsd	%xmm3, %xmm2
	subsd	%xmm0, %xmm1
	subsd	%xmm7, %xmm2
	mulsd	%xmm14, %xmm1
	mulsd	%xmm14, %xmm2
	addsd	%xmm0, %xmm1
	addsd	%xmm7, %xmm2
	subsd	%xmm1, %xmm2
	mulsd	%xmm2, %xmm12
	movsd	%xmm12, %xmm0
	addsd	%xmm1, %xmm0
	ret
which is pretty lean & mean.
However, with 4.3.6 I get
	movsd	112(%rsp), %xmm2
	movsd	104(%rsp), %xmm1
	movsd	48(%rsp), %xmm0
	call	_p__M0_S5_Lerp
	movsd	128(%rsp), %xmm2
	movsd	%xmm0, 152(%rsp)
	movsd	120(%rsp), %xmm1
	movsd	48(%rsp), %xmm0
	call	_p__M0_S5_Lerp
	movsd	152(%rsp), %xmm1
	movapd	%xmm0, %xmm2
	movsd	56(%rsp), %xmm0
	call	_p__M0_S5_Lerp
	movsd	40(%rsp), %xmm1
	movapd	%xmm0, %xmm2
	movsd	64(%rsp), %xmm0
	call	_p__M0_S5_Lerp
	addq	$240, %rsp
	popq	%rbx
	popq	%rbp
	popq	%r12
	popq	%r13
	popq	%r14
	popq	%r15
	ret
( Stack spills tend to be expensive on [my] AMD processor as the level
0/1 cache isn't that fast.
Maybe I have broken my build of 4.3.6   Can anyone else conform the
status of inlining on linux x86 with 4.x.x compilers?  The simple
example from the info file
program InlineDemo;
    function Max (x, y: Integer): Integer; attribute (inline);
    begin
        if x > y then
           Max := x
        else
           Max := y
    end;
 begin
        WriteLn (Max (42, 17), ' ', Max (-4, -2))
 end.
Also does not work for me with 4.3.6. It still produces a call instruction.
	call	_p__M0_S0_Max
I recall also that inlining did not work with the official 4.1 Debian
package. I was thinking of reporting this as a (Debian) bug a while
back, but GPC was then removed from the archive, which made that moot.
Going forward, I'm wondering which gcc version to base my builds on.
4.3.6 supports potentially a few more architectures,
ARMel
PowerPC
SH4
and supports the -m32 switch, but a 100% slowdown on the CPU intensive
stuff I use the compiler for is too much a penalty for me.
>From further tinkering around, I notice that 3.4.6 often inlines even
when not asked to do so, whereas 4.3.6 very rarely if ever inlines.
Neither compiler seems to obey the inline attribute!
Anyone any thoughts on this?
Hoping of course that its an easy to fix typo type bug...
Regards,
Peter B