On Sat, 28 Jan 2006, Frank Heckenbach wrote:
How do they implement PROT_EXEC disable w/o hardware NX bit, I wonder? I've heard of emulations but I never understood how this works. Probably I should do my homeworks ;-)
Why without? AFAIK, PROT_EXEC is (roughly speaking) the software side of hardware NX.
Could be on some platforms. AFAIR, there was a platform (Linux?, it's so dim in my memory) which did not implement PROT_EXEC protection. AFAIK, Linux kernels generally did not until 2.6 versions. But you must have knwon this, and this is a backend issue I suppose.
BTW, according to http://en.wikipedia.org/wiki/NX_Bit:
: Although this sort of mechanism has been around for years in various : other processor architectures such as Sun's SPARC, Alpha, IBM's : PowerPC, and even Intel's IA-64 architecture (also known as Itanium : or Merced processor), the term is actually a name created by AMD for : use by its AMD64 line of processors, such as the Athlon 64 and : Opteron. It seems to have now become a common term used to : generically describe similar technologies in other processors. : Intel's x86 processors included this feature since 80286 processor, : but that memory model is treated as obsolete by modern processors : and operating systems. De facto it could not be used by modern : programs, and AMD re-implemented the feature for the Flat memory : model used now.
So if you thought this was a brand-new hardware feature, it really isn't (apart from the name). Just Intel had screwed up for too long ...
Agreed. Some of the things Intel gets away with are beyond my comprehension. Thank Gawd for AMD :-) who brought NX bit back ...
I've seen an article that explains heap overrun exploit in detail, and how it was made impossible by heap randomizations in Windows XP SP2. But it is slightly off-topic, if we do not decide to introduce a better heap allocator option to GPC RTS.
Randomization might help to spoil particular attacks, or make them less likely to succeed, but cannot provide perfect protection. BTW, this should also be possible with a plug-in MM replacement.
I trust your experience and expertise on that.
DO you know of an example of memory mapping holes being deallocated? In fact, statistically most of the holes will be less than page a size, wouldn't they?
I don't really have statistics. I suppose when deallocating a list, some pages would be freed completely and be available either for returning to the OS, or at least for reuse within the same process (possibly with other chunk sizes, as needed). I suppose typical MMs do at least the latter.
Agreed. OTOH, if we run for example a database, insertions/deletions to/from heap may be seemingly random. I trust a lecture I've heard from my college Professor, and I think he had some references with extensive simulations at least. IMHO general purpose allocator should not rely on "burst" allocations/deallocations, but a more stochastic memory manager use.
I suppose GPC uses default libs's malloc/calloc/free, and AFAIK GPC libc malloc team is also thinking of improvement of default malloc in libc, as this would give immediate improvement even to already linked programs if ABI compatibility is preserved.
This requires indirect pointers. Then all pointers to certain memory area could share a base pointer, and IMHO it is not a big complication nor speed delay compared to enhancements we get!
It is! Any solution that changes that ABI or something like that, and thus at least requires recompilation of all libraries is a big deal in practical terms (and IMHO should not even attempted in a single language on its own, unless that language operates in an isolated world, sandbox or whatever).
OK, I see the point. I will have to do some serious study on this. It won't hurt to learn more about GPC internals and how it uses libc.
For example, all base pointers could fit on few memory pages that would easily fit in processor caches, and they would since often used. So, this form of indirection does not appear to be costly in terms of raw performance.
I'm skeptical. First, the cache used is not available for other purposes. Second, it takes additional statements which take time to execute and consume cache for the code. And since the change is pervasive, you don't even have the option to disable it in tight inner loops; you can only hope the optimizer is smart enough to move as many indirections as possible outside of the loop.
True.
(Frankly, range checking is costly also, isn't it?)
Yes, but you can avoid it, e.g. by typing your counters and index variables with appropriate subranges, thus moving the necessary checks outside of critical inner loops (without needing any compiler options to turn them off temporarily, and not relying on the optimizer, but guaranteed by standard language features). Chuck Falconer has written about this more than once on this list.
I see the difference.
But the idea of memory compaction and de-fragmentation still seems very good to me, even thought it looks like SF now. Probably the best would be to try to implement a library instead change to compiler or RTS as a first step, right?
Yes, I think so.
So we agree on something.
I understand the power-of-two heaps idea, but it still seems to me that there is a vast space for defragmentation.
Power of two is probably not the final word on this issue. But you might want to run some actual tests with real-world programs and current memory managers (say, that of glibc) to get significant numbers.
I am putting it on my TO-DO list. It is a very interesting issue in general, for all operating systems that use paging virtual memory. (OTOH, comming back to releasing unused holes, unused pages will probably be swapped-out and not reloaded again since not used - the problem is stochastic alloc/dealloc of relativelly small fragments of memory. For example, if the average size of records is from 512 to 1023 bytes, this will use a lot on 2^10 heap, but after allocations/deallocations go into asymptotic stable state there will be normal distribution of used and unused areas on each memory physical page. This means that the allocated physical pages of heap will eventually double the program's memory needs. I may seek for literature, right now I speak by memory of those lectures about Unix processes.)
Some of these issues have been discussed here ( http://www.javaworld.com/javaworld/jw-08-1996/jw-08-gc-p3.html ):
[...]
(Sorry if this is too long of a paste)
GC is really a science of its own (literally), and IMHO it's a bit off-topic here.
OK. I will try to stay on focus.
What I had in mind in the beginning is essentially the third strategy: "registered pointers". The new type of pointers would have to be registered, and heap manager would have to update all registered pointers to reflect the heap area copy-and-move. As it would be done on-the-fly, and all pointers would point to the same byte of data after move is made, the defragmentation would be transparent to any program.
But it's also pervasive, i.e. affect all libraries (since they may copy pointers). And it would have to take care of pointers on the stack, in registers, etc.
I see. I realize adding security measures drastically impacts performance (such as making all pointers "volatile" variables which cannot go to registers), but having an important system brought down on it's knees by undetected buffer overrun in an application will hurt me more both as a system administrator and as a software developer than the 20% decrease in program speed. IMHO.
OK, thanks for pointing that. I am trying to study the Boehm papers. Perhaps all I proposed has already been done :-(
It's surely been discussed at length, and there are experts in this area, which both of us are not, and this list is not really the place to discuss it ...
I have understood your argument. I am trying to stick to those issues that hold on to greater security of Pascal programs, as it can be enforced by language.
IMHO, canary ought to be checked on free(). This would catch a number of errors, since most common error alongside buffer overrun is probably of-by-one error in loop.
This would serve as a debugging aid (similar to efence), not as attack prevention, as an attack can occur before free().
True. However, several attack scenarios rely on smashing alloc list pointers and overwritting arbitrary location in memory. A canary could prevent that, if checked prior to evaluating pointers that follow it. :-)
Yet, again if GPC depends on libc malloc internally, then I guess it is a libac issue, not GPC issue.
Obviously, the source code. I haven't been very clear with the obvious fact that changing pointer semantics and/or size into indirect pointer or pointer with upper and lower boundary would inevitably imply at least recompilation of libraries, assuming that indirection would be implemented transparent to existing source. And having in mind that GPC uses libc extensively, this may not be feasible in terms of necessary wrappers.
Actually it means that the only possibly successful way is probabl doing it all in the backend, so it would work the same for all languages. Perhaps you can get such an option in the backend. Then you basically only have to compile two versions of your whole system ... ;-)
Right said: StackGuard does exactly that! I could try to apply StackGuard patch once GPC would have accepted 4.* backends.
IMHO, I see it rather as a separate project. GPC is not really short of features (existing and wishlist), and due to the considerations above, it seems easily separable (unless you really want to add compiler-supported checking which you don't seem to want, according to the previous paragraph).
The problem is: why all those fancy mechanisms of heap protection as libefence are not used more widely? Simply because they are not handy, and few people know of them. Having it seamlessly distributed with compiler or as a language option might make people consider using it.
IMHO it would be considerably less work to create and distribute an integrated environment containing the existing tools and making them easily available than writing something entirely new just for this reason (i.e., unless it has other advantages).
I could be much less work, agreed. I did not insist on writing everything from scratch myself ;-) no matter how much I enjoy mucking with internals. I realize however the problem of code maturity, which my new code would inevitably be lacking.
The basic inspiration came from my studying of month-and.-half long virus invasion and recent network and system intrusions I faced helplessly in last (sort of) six months as the system administrator.
If the buffer overrun protection isn't elegant, seamless and nearly mandatory, programmers might not use it I'm afraid.
Can we do something in this direction?
I'm afraid we probably cannot solve The Computer Security Problem within the next two weeks. ;-)
;-)
And, BTW, this is also an area of its own, with its experts. This does not mean we should not care here, but one should really first study existing work and state of the art. If implementation of some techniques require compiler support, we can discuss them here, but this is not really the place to design new techniques (which most likely will have been discussed by the experts already).
I guess you want to tell me I need to do more homework before raising similar issues, so I will try to do it next time. However, it is hard to become expert in a month, so I was relying on your experience ;-) May I be forgiven.
Thank you for your time, and I will now try to do some research.
Mirsad