Re: GPC, CGI units, and program size

23 Aug 2006

      Prof. A Olowofoyeku (The African Chief) wrote:
...

Is it possible to have reduced functionality versions of libgpc

(perhaps produced with a switch when building the compiler?)? If so, 
is it possible to choose which features shall be built into it? 
(e.g., via a simple text configuration file, of the kind you have 
when building the linux kernel, or busybox, etc.).
If so, then configure options might be the obvious way to go.
...
What I mean is
something like (please note, this is off the top of my head, and is 
not properly thought through - it may even be impossible or the 
necessary features may not be in libgpc at all, but rather in the 
compiler):
They're in both of them which doesn't help. I.e., if you built an
RTS without string support, some operations that don't look like RTS
calls (e.g. "+" for strings) would lead to undefined linker
references.
One could add explicit checks in the compiler, but (slightly
tangential to the topic) I'm thinking of a different route here: So
far, the compiler creates RTS calls based on "magic" linker names
("_p_Set_Union" etc.) and makes implicit declarations for them which
have to match the RTS declarations. Now we could instead properly
import the RTS declarations from GPI files. (Previously, due to lack
of qualified identifiers and selective import, this would have
created namespace conflicts, but now that these features are
available, it can be done.) The compiler would then call the RTS
based on (still "magic") Pascal-level declarations, so when a
version of the RTS omits them, the compiler will simply notice the
absence of those declarations in the RTS GPI files and could emit
somewhat clearer errors.
There would be some strange effects, though. E.g., comparing two
strings requires an RTS call, but comparing one string against ''
does not because it can be optimized to a comparison of its length
against 0. Removing the respective RTS routine would mean that the
latter would still work, but the former wouldn't. Of course, one
could explicitly forbid the former as well when the RTS routine
isn't found (not sure if rather useful or annoying).
...
# enable support for pascal strings
STRINGS=y
# Pascal file I/O
FILES=y
The problem is that parts of them depend on each other. E.g., most
file, some string, and many more routines can generate runtime
errors. Runtime error handling uses strings ... and files ... etc.
... So omitting either strings or files would be difficult, and
removing both of them would mean replacing the runtime error
management with a version that doen't use (Pascal) strings and
files. So you quickly arrive at a very bare-bones RTS (which some
people use for special cases, indeed, but e.g. the CGI unit would
not find easy to use -- e.g., it obviously uses strings quite a lot,
as well as files, for POST uploads, output buffering, runtime error
mailing, etc.).
One idea I have in mind WRT the RTS is to reorganize its units, so
we'd have a (clearly visible) core of routines that are interrelated
and provide the basic support, and put this in the lowest-level RTS
unit. This would include runtime errors, and the necessary amount of
string and file capabilities to support them, whereas e.g.
additional string and file features not strictly needed here would
be one level higher.
In particular, this should get rid of cyclic dependencies in the
RTS. Currently there are a few explicit ones, but many more implicit
ones, via magic compiler calls. E.g., an RTS routine does a file
operation, and the compiler translates it to a call of an RTS file
routine that it just assumes exists, although it's in a unit that
will be compiled later and probably uses the current unit. By
implementing my above plan (Pascal-level imports), such dependencies
would become visible, in this example forcing the RTS unit to import
the respective RTS file routines, and thus (at first) create a lot
of cyclic dependencies in the RTS. By resolving them manually (by
reorganizing the RTS units), we'd get close to the unit structure I
described.
In such a setting, one could ideally omit whole RTS units that are
not needed. (But, of course, declaration-level smart-linking would
still give somewhat besser results, so I still have it on my list
...)
...

What things can be done in GPC without libgpc - for example, if

one produced an include file of libc exports and doesn't use units or 
Pascal strings or objects or file I/O at all?
Basically yes, though there isn't an "official" list of which
internals require RTS calls (and this might change slightly over
time), so one can only try, looking at the linker errors (undefined
references).
There are a few routines that always must be provided as they're
called by automatic initialization etc. This is the list I used in a
small standalone project last year. (You can omit the range check
error stuff if you disable range checking, of course, OTOH you might
need other runtime errors when linker errors tell you so.) The RTS
version (here, 20050331) has to be matched to the version of the RTS
replaced, and the list of declarations and their parameters may
change slightly with new GPC versions, so the code isn't exactly
maintenance-free (the main reason for the requirement of the RTS
version check).
var
  VersionCheck: Integer; attribute (name = '_p_GPC_RTS_VERSION_20050331');
procedure Initialize (ArgumentCount: CInteger; Arguments, StartEnvironment: PCStrings; Options: CInteger); attribute (name = '_p_initialize');
begin
end;
procedure DoInitProc; attribute (name = '_p_DoInitProc');
begin
end;
procedure Finalize; attribute (name = '_p_finalize');
begin
end;
procedure CExit (Status: CInteger); external name 'exit';
procedure RangeCheckError; attribute (name = '_p_RangeCheckError');
begin
  CExit (42)
end;
...

This follows from #2 - how can one write a different libgpc? Is

there a special thing that has to be done to make it work (i.e., how 
is it different from any bog-standard .a or .so file?).
I hope the above answers this. For the most part, it isn't very
special, except for the explicit linker names. (But when we change
it as described, any RTS replacement also needs changing, of course,
e.g. using magic Pascal names then. Also, the parameters of some RTS
routines change occasionally, in accordance with compiler calling
changes.)
As you can see, the C part of the RTS is quite small now (one file
(rts.c, plus interfaces in rtsc.pas), and not fundamentally
different from C code and interfaces called by other Pascal units
(except that it uses more C headers and does many more portability
conditionals, mostly using autoconf settings, than typically
otherwise, due to its purpose of interfacing to different libc's).
The library building part (.a or .so) is nothing special in the RTS.
You could link the list of .o files instead (manually) if you
wanted.
gpc.pas is a bit "magical" in that it's generated by a script from
the interfaces of the other units, excluding parts enclosed in
"{@internal}" .. "{@endinternal}" comments. (These are just the
parts with the magic linker names, more precisely those which are
meant to be called only by the compiler, not from user code directly
via gpc.pas. With my suggestion above, this would change, and the
"{@internal}" comments probably disappear. gpc.pas could then
probably switch to proper re-exporting instead of being
script-generated.)
Another special thing is the units' name-attributes which are just
there to avoid namespace conflicts with user-units of the same name
(which are perfectly valid, of course, so they must not break).
You have to be careful of unintended recursion due to internal
compiler calls. E.g. doing file I/O from a routine to implement file
I/O is a recursion though it doesn't look like one ordinarily -- it
might be OK if it's to a different file, and your routines (with
according data structure) are reentrant, but in general you have to
be careful there.
Also, during the initialization and finalization of the RTS, RTS
services may not be available as expected, so you have to be very
careful of the order of doing things. E.g., obviously before the
memory manager is initialized, dynamic memory allocation won't work;
this includes all uses of New and GetMem, of course, and also RTS
routines that do them (which are under your control then, of course;
e.g. in the current RTS some file routines).
Initialization is started via _p_initialize (see above) which has to
call the RTS units' intializers (as needed). In the default RTS
that's the strange "InitInit" call in init.pas which calls the
implicit initializer of init.pas, which in turn calls all the other
initializers automatically (in the regular way) as init.pas uses all
the other units.
...

Would there be any mileage in producing a libc standard unit?

libc is a rather vague term here. Such a unit could be anything from
a non-portable interface of the 6 most important libc calls (open,
close, read, write, fork, exec, according to Linus ;-) to a fully
portable interface to all known libc's on this planet, with
interface to all functions supported by any of them plus
emulations/errors where not supported ...
The RTS's rts.c file (plus rtsc.pas interafaces) is somewhat closer
to the latter extreme (though there are still many areas of libc not
covered yet). It actually makes available many functions in "C
style" (e.g. OpenHandle etc., visible in gpc.pas), besides being
used in the RTS to implement the higher-level routines. So to some
extent this is such a unit already.
I suppose you're more thinking of a rather minimal unit. A problem
is that different programmers (and even different projects by the
same programmer) will often disagree just how minimal it should be.
In the end, a bigger unit may fare better when automatically
removing the unused parts. Yes, I know, we need smart linking ...
Frank
-- 
Frank Heckenbach, f.heckenbach@fh-soft.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE  D101 CD02 4C9D 0FE0 E5E8

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: GPC, CGI units, and program size