According to Frank Heckenbach:
Peter, what about making virtual constructors soon? Should be relatively easy (I hope), just "combining" the properties of constructors and virtual methods? ;-)
It's even more trivial: At the moment, I artificially forbid constructors to be virtual. To enable them, I would just have to take out the error message.
The only problem with this: As I wrote, I am initializing the object *outside* the constructor's body. If I enable virtual constructors, this must not change any more.
What can be assumed about data fields after FillChar'ing them with 0? I suppose, for all ordinary types, one can assume the value with "ord 0", right? For real numbers, probably nothing can be assumed!? What about sets, can [] be assumed? Pointers=NIL? ...
Okay for ordinal types, sets and pointers, not sure about Reals. Strings initialized to zero this way are BROKEN because they get a capacity of zero which makes them useless. :-( That's one reason why I would like to have `ShortString's in GPC.)
... and if it doesn't add misfeatures to other people, or the misfeatures can be turned off completely, as with:
`--store-object-names', and switched ON in Delphi compatibility mode.
... and OFF otherwise!
Of course! That's what I meant. :-)
I'm not sure if this notion really helps. Doing it your way, the compiler has to distinguish between regular object types and interfaces. And I think the code gets clearer if interfaces are clearly recognizable as such. I don't see an advantage of your way: interfaces can't be instantiated, anyway (since they're abstract), and a (regular) object type that should be "inherited" form an interface, can simple implement this interface and inherit from no type.
Internally, objects are ordinary records anyway, so this makes no difference in hacking GPC - except that if I introduce an `interface' type I have to remember that something *is* an interface for no other purpose than outputting error messages. In this sense, I would prefer not to introduce interfaces as another data type but to allow MI for some special cases which I would have to check anyway. (See below for a relativation of this.)
However, the above was not meant as a suggestion how to implement interfaces into GPC. It was just my attempt to understand what they are.
[...]
Correct. :-)
(-: Great! It seems that I have got it! This increases the chance that I will be able to implement all this into GPC. :-)
(Of course, there can be conflicting method identifiers with interfaces that are implemented, but these should simple generate "duplicate identifier" errors.)
(Agreed.)
As I said above, I'd vote for the second idea ("interface" is a keyword anyway). AFAICS, the Delphi syntax as shown in the example by David looks like we could adopt most of it -- as I said, I'm not sure about the IUnknown bit, but I think it can be optional (just declared as an empty interface for compatibility reasons).
Agreed. If a language dialect already exists which (i) supports what we want and (ii) does it in a clean way, we should adopt it rather than inventing a new method.
Also, I think Delphi's syntax
IWhatever = interface(...) [ID];
looks like a convenient way to declare the ObjID, also for regular object types.
To sum up (again) what I now think about ObjIDs:
ObjID (or whatever it will be called) is an object constant of every object type and interface type.
Its type is a 64/128 bit integer.
Really? In David's example it's a string constant!
const SIID_IActiveScriptSiteWindow = '{D10F6761-83E9-11cf-8F20-00805F2CD064}';
type IActiveScriptSiteWindow = interface(IUnknown) [SIID_IActiveScriptSiteWindow]
[...]
This removes all needs for class registration, and perhaps solves some problems with interfaces. I think I like that!
I agree, *and* having that we could claim more compatibility to Delphi thus making GPC more attractive for a lot of possible users.
(* Hmm ... the above rules for "careful use of MI" could be useful for *) (* C++ programmers ... perhaps we should tell them? :*)
Hmm ... I think I know some more rules for "careful use of C[++]". Should we tell them? Would they listen to us? Would they laugh at us? ...
Who knows ...
AFAICS, the only thing that really makes problems are variables (or parameters) of interface types.
What's the problem? An instance of an interface would be an empty object, containing nothing besides the VMT pointer.
No! There aren't any instances of interfaces!
Only if we explicitly forbid them. There is no technical reason why they shouldn't be instantiated. (And there is no practical reason why they should be, so it's safe to forbid it.;)
A variable of a pointer-to-an-interface type must point ot the actual object (which can be of any type that implements that interface), and (somehow) give the information where in this type's VMT the methods of that interface are located.
Then the interface must appear as an additional VMT field either in each instance of the object or in its VMT. In the first case, the assignment
PointerToInterface:= PointerToObject;
would add some number to the value of `PointerToObject', so `PointerToInterface' will point to the VMT of the interface, not that of the object. In the second case, the same assignment would implicitly dereference `PointerToObject', look up the VMT of the interface within the VMT of the object and let `PointerToInterface' point to that ... ah - no! This would not work, because then
PointerToInterface^.SomeMethod;
would have no chance to locate the implicit `Self' parameter. Okay, so forget about the second idea above; each object gets an additional VMT field for each interface it inherits. (Even then I am not sure that the above mechanism will work in all cases ... :-/ )
Do you mean: If an object implements an interface (in Java sense) it must always be accessed through a pointer? If so, why?
No, if type T implements interface I, there can be a variable V of type T, no problem. But V can't be of type I (since interfaces can't be instantiated). You can, however, declare a variable P of type ^I and assign @V to P (since V has all the properties that I demands).
This is no special rule, it follows from the fact that interfaces can't be instantiated. The same holds for abstract object types. Assuming TObject is abstract, there can't be a variable of type TObject, but there can be variables of type PObject, and there can be VAR parameters of type TObject.
Ah - now I understand. :-) I tend to believe now that the above mechanism (assigning the address of an additional VMT field to `PointerToInterface') can work ...
What about this: Instead of an additional VMT field, the interface is represented in each instance of the object as an integer field which holds the offset of itself inside the object:
Type MyObj = object ( MyInterface ) (* No "primary ancestor" *) foo, bar: Integer; end (* MyObj *);
is represented as
[vmt field] [interface offset] [foo] [bar] byte# 0 4 8 12 value @vmt_MyObj 4 foo bar / \ / \ `PointerToObject' \ is pointing here. `PointerToInterface' is pointing here and can look up the address of the whole object by substracting the integer value pointed to.
Now, how to call a virtual method of the interface? The following steps are needed to perform the call "PointerToInterface^.MyMethod" (each step corresponds essentially to one instruction on a CISC processor like the iX86 - or to one "tree node" passed from the GPC front-end to the GNU compiler's back-end):
- Dereference `PointerToInterface' and get one integer.
- Substract the integer from the pointer and get the address of the object.
- Pass the new pointer as the `Self' parameter.
- Dereference the new pointer and get the address of the VMT.
- Find the address of the VMT of the interface in the VMT of the object at an offset which is derived from the integer we got in the first step.
- Find the address of the virtual method at a fixed place in the VMT of the interface, and do the call.
In contrast, the following steps are needed to call an "ordinary" virtual method "PointerToObject^.MyMethod":
- Dereference `PointerToObject' and get the address of the VMT.
- Pass `PointerToObject' as the `Self' parameter.
- Find the address of the virtual method at a fixed place in the VMT, and do the call.
This implies:
- Calling a virtual method inherited through an interface roughly takes twice the time of calling an "ordinary" virtual method, but it's still O(1) (no real search required).
- Each instance of each object gets one additional integer field for each interface the object inherits from.
- Each VMT gets additional pointer fields pointing to the VMTs of the interfaces the object inherits from.
- Pointers to interfaces have the same format as all other pointers, but they don't point to the beginning of the object but to an integer field inside the object. An explicit pointer conversion from "pointer to object" to "pointer to interface" does actually change the value of the pointer.
A "pointer" to an interface variable consists of two parts: the actual pointer to the variable, and the VMT offset of the first method (or, alternatively, directly the adress of the first method in the VMT).
Disadvantage: The pointer gets twice as big. The difference must be considered when assigning it to another pointer (this could be an untyped pointer or a pointer of one of the "parent" interfaces - in the latter case the VMT offset has to be adjusted).
I'm afraid we can forget about this for that reason.
Why? Is there a rule carved in stone that a pointer must consist only of a memory address?
Yes. Here's the stone (info -f standards -n Portability):
You can assume that all pointers have the same format, regardless of the type they point to, and that this is really an integer. There are some weird machines where this isn't true, but they aren't important; don't waste time catering to them. Besides, eventually we will put function prototypes into all GNU programs, and that will probably make your program work even on weird machines.
This means that we mustn't introduce another format for pointers - and that we can savely assume that you can cast a pointer to an integer (see my mail about 8-byte pointers on the DEC Alpha ... @#*!).
(* BTW, they also say:
As for systems that are not like Unix, such as MSDOS, Windows, the Macintosh, VMS, and MVS, supporting them is usually so much work that it is better if you don't.
which I strongly recommend to ignore! I do not like at all some well-known "operating system" of a well-known company, but just ignoring it is a nice method to commit social suicide among computer users. *)
Actually, I'm going to take this a bit further (I don't think Java has this, but why not):
Let T be any object type and In be some interfaces.
The following variable declarations could all be legal:
T ^T ^I1 ^I1 T ^I1 I2 ^I1 I2 T ...
You mean: legal types for a variable? A list of types separated by spaces?
In general: P can be a variable of type pointer of (n interfaces I1 .. In and optionally one object type T).
Legal assignments to P are objects of any type that implements all I1 .. In (and is T or a descandant of T, if T is given).
The internal representation of P consists of the actual address of the object and n addresses that point to the first method of each of the n interfaces inside the VMT of the actual type of the object.
I hope this was understadable so far -- if not, I can try to explain again.
I am not sure that I have understood it. Further explanation cannot hurt.
How would the virtual method calls
PointerToInterface:= @MyObject; PointerToInterface^.MyMethod;
and
PointerToObject:= @MyObject; PointerToObject^.MyMethod;
work with this representation?
I am sceptical about this "multiple pointer" representation because there are *many* places in the GPC front-end relying on the fact that all pointers have the same format. I don't even know all of them.
That's the problem why I initially asked about MI.
The problem would not be any easier with MI. [Proof: You showed above that interfaces are just a special case of MI. ;-]
I meant: It is easier with interfaces than with MI because they are just a special case.
Does anybody know how C++ solves that problem? Or Java? Or Delphi?
No, but from Delphi's interface IDs I gather it uses something like the second way. (And since it runs under Windoze, efficiency doesn't matter, anyway... ;-)
One day, I will compile some Delphi stuff and look at the generated code in order to figure this out. (But I wouldn't mind if somebody were faster than I and would tell me the result ... ;-)
But this is quite an interesting problem - not a technical one, how to implement this-or-that without interfering with that-or-this syntax from another dialect. Here we have a problem where it is not even clear that a fast (i.e. O(1)) solution exists. :-(-:
There is a O(1) solution -- the first one!
Now we have two of them: The solution with offsets stored in the object is O(1) as well. (-: Implement both? ;-)
It increases the memory needed for (pointer-to-)interface type variables/ parameters, but this might just be the prize we have to pay. It's O(1) in size and speed, and it takes more space than now only when one actually uses interfaces. Doesn't seem too bad to me!
What I am really afraid of is how much of GPC would break when we change the size of some pointers ... :-/
So with the first solution, AFAICS, the ObjIDs for interfaces are not needed (in contrary to what I said above) -- they can be accepted in Delphi compatibility mode, but I see no need for them...
Agreed. (With both solutions discussed above.)
If you used the address everywhere you use the ID now, you would know, wouldn't you?
[...]
Yes, but what do you need to do with IDs?
The unique ID can be stored in a stream; the address cannot.
A valid point! But I think the IDs should be generated within the storing routines, and resolved (to pointers) within the loading routines. This can be done quite efficiently, O(n log n), perhaps O(n^2) worst case. No need to keep the IDs during the (regular) operation, wasting memory and time.
There may be some types that need a persistent ID (i.e. one that cannot simply regenerated with each storing, for whatever reason), but then again, ID should be a field of these special types only.
I have *lots* of such types, but I agree that only those should have that ID. There's no need to equip *all* objects with an ID.
Use: Think of a tree of objects holding numerical data. A method of an object somewhere in that tree wants to calculate something. For this purpose it needs some data stored elsewhere in the tree. Then the unique ID can be used to locate that other data object.
For this purpose, I'd use a pointer to thar other object.
The whole purpose of the ID is to *get* that pointer.
When storing the whole thing to a stream, the pointers can be converted to (numerical) IDs that are unique to this data structure in this stream at this time. While loading the stream, the IDs can be converted back to pointers. (This takes some programming effort, but it's a one-time job! I think I could program these conversions if necessary.)
My problem is that I need a numerical value from some object somewhere in the application where I don't even know whether it already exists. Having IDs, I can do a search for the other object. If it exists, I get a pointer to it; otherwise I get `Nil' and can do some action to make the other object appear ...
All this is done in a generalized `HandleEvent' method which will be in my `BaseObj'. Methods don't use up memory per instance, only per VMT which can be neglected, so this does not harm performance.
Where do you get the SelfIDs from? Perhaps a list of IDs stored in a parent object? You could put the addresses there instead, couldn't you?
The IDs must be arranged in a way that you can read off them what kind of object we have.
??? Now I think you lost me!
I assume, by "kind of object" you mean its type, right?
Yes.
But the type information is already there (through the VMT link), isn't it? Any procedure can check the VMT link (together with the "IS" operator) to examine the type of any object it has a pointer to -- and usually, type destinctions should not be made be the caller at all, but by the called object (through virtual methods).
This example is one method how to implement the `is' operator. Another one - which seems the most practical for me and will probably be the way to go - is to store inheritance information in each VMT.
And calls to `Foo' in instances of `MyObj' would yield a run-time error?
No -- there can't be any instances of MyObj, the compiler should check this. That's the main goal of the whole thing: to make these checks at compile-time, not at run-time.
This is how Delphi behaves?
The OOP way to do this is: if something doesn't suit you, derive a new class, and apply all modifications you want to the new class.
Hmm ... I essentially re-worte Turbo Vision to run in graphics mode, while introducing many extensions. (* I am calling the result "BO4" - Benutzeroberflaeche, 4. Versuch (German) which means user interface, 4. try. *) I paid 600DM (~$400) to get the source because I couldn't stand to derive a new class from just *everything*, re-implementing the same extensions in each new class everywhere which would have been placed most naturally in `tView'. (* Some weeks later, Borland reduced the price of the source to 50DM. *):
[IDs ...]
And what kind of things would you do with an unique integer ID?
Inter-process communication and message passing for one.
No problem with pointers! (Assuming the objects reside in some kind of shared memory, but otherwise an integer ID would be quite useless as well.)
I think the point is to get the address of an object you are not even sure that it already is in memory. All this can be done with a common *method* of all objects (which doesn't waste space, see above). A unique ID in each object can be useful in the implementation of that method, but it does waste space, so I would vote against having it in the "mother of all objects". But since we won't restrict GPC to have exactly one class library, that's a matter of taste, IMHO. (Just my 2Pf.)
[... snip ...]
If things are done as I suggested, you could do things like:
type t=object const c:integer=2; {stored in VMT of x} var cv:integer=3; {class variable; stored in VMT of x; syntax???} v:integer; {stored in data area of o} end;
var o:t; ... o.v:=o.cv+o.v; ...
This syntax seems reasonable for me. However: Other suggestions?
[...]
One of the things I hate about some frameworks that will remain nameless is the multitude of ancestors. It makes it really tedious when you are trying to find out what is really going on in an object (sometimes having to plough through myriads of objects in myriads of units). Sometimes inheritance can be taken too far.
You seem to be doing the other extreme. The former might be tedious and confusing to use, especially at first sight, and it requires good documentation, but the latter can lead to real problems if code from different sources doesn't fit together.
I agree that some way "in between" is the way to go. However, there are many points in Borland's object hierarchies which seem to me as if they were designed with the goal to avoid the exchange of source code. Not considering myself as a "free software extremist" I state that you can forget about many "hooks" in an object hierarchy (e.g. in Turbo Vision which I know best) if you are not worrying about someone else reading your source code (which is still the best documentation you can have for any library anyway). (After having studied TV's source code extensively I understand why Borland first didn't want everybody to read it ... ;-)
Phew! That was a long e-mail ...
Later,
Peter
Dipl.-Phys. Peter Gerwinski, Essen, Germany, free physicist and programmer peter.gerwinski@uni-essen.de - http://home.pages.de/~peter.gerwinski/ [970201] maintainer GNU Pascal [970510] - http://home.pages.de/~gnu-pascal/ [970125]
On Sun, 1 Jun 1997, Peter Gerwinski wrote:
It's even more trivial: At the moment, I artificially forbid constructors to be virtual. To enable them, I would just have to take out the error message.
To those that were thinking about removing the Init constructor... We'd now need an anscestor...
Okay for ordinal types, sets and pointers, not sure about Reals. Strings initialized to zero this way are BROKEN because they get a capacity of zero which makes them useless. :-( That's one reason why I would like to have `ShortString's in GPC.)
Ok, right, so FillChar() is a BAD idea... Ok! :-)
Internally, objects are ordinary records anyway, so this makes no difference in hacking GPC - except that if I introduce an `interface' type I have to remember that something *is* an interface for no other purpose than outputting error messages. In this sense, I would prefer not to introduce interfaces as another data type but to allow MI for some special cases which I would have to check anyway. (See below for a relativation of this.)
But interfaces are there just for that! They are like type information, they are used only by the compiler at compile time to generate error messages... Say you have a class X with methods A, B and C. If you compile the thing, then change the source to add an interface Y that has B and C and make the class implement the Y interface, and recompile, it should generate roughly the same code. The only thing is that now class X is now compatible with all other classes implementing the Y interface.
As I said above, I'd vote for the second idea ("interface" is a keyword anyway). AFAICS, the Delphi syntax as shown in the example by David looks like we could adopt most of it -- as I said, I'm not sure about the IUnknown bit, but I think it can be optional (just declared as an empty interface for compatibility reasons).
Oh, BTW, the IUnknown thing is like TObject, it is a "parent interface"...
This removes all needs for class registration, and perhaps solves some problems with interfaces. I think I like that!
I agree, *and* having that we could claim more compatibility to Delphi thus making GPC more attractive for a lot of possible users.
This "registration number" is a very Windowy thing. BTW, do you have a way to avoid collision of those numbers for precompiled units that don't have sources available? With something like TV registration, if an registration error comes up during registration, I just register the class by hand with a different number, and my problems are gone!
Pierre Phaneuf
"The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense." - Edsger W. Dijkstra.