Jan Vroonhof <vroonhof(a)math.ethz.ch> writes:
> Gunnar Evermann <ge204(a)eng.cam.ac.uk> writes:
>
> > I seem to remember that there were some problems with zombie
> > subprocesses on Solaris before. I couldn't find anything in the
> > list-archives. Was this ever understood and fixed
>
> Something is eating the SIGCHLD's. Back then it seemed related
> to tooltalk+gcc (disabling tooltalk/workshop support has always solved it for
> me).
excellent hint -- thanks, Jan!
I can now reproduce the problem from a vanilla xemacs, if no ttsession
process is running on the machine.
truss/dbx show that there are a couple of sigaction() calls from deep
inside the libtt (apparently it forks and tries to execute something
or other -- I didn't check in detail. These calls do NOT happen if a
ttsession is running). All this is triggered by loading the tooltalk
and the sunpro stuff and executing sunpro-startup.
The original sigaction done by xemacs is:
TRUSS> sigaction(SIGCLD, 0xEFFFF430, 0x0064DE3C) = 0
TRUSS> new: hand = 0xEF1B7D9C mask = 0 0 0 0 flags = 0x0000
TRUSS> old: hand = 0x00000000 mask = 0 0 0 0 flags = 0x20000
the calls from inside libtt are:
TRUSS> sigaction(SIGCLD, 0xEFFFD790, 0xEFFFD890) = 0
TRUSS> new: hand = 0x00000000 mask = 0 0 0 0 flags = 0x20012
TRUSS> old: hand = 0xEF1B7D9C mask = 0 0 0 0 flags = 0x0000
TRUSS> Received signal #18, SIGCLD, in waitid() [default]
TRUSS> siginfo: SIGCLD CLD_EXITED pid=11826 status=0x007F
TRUSS> sigaction(SIGCLD, 0xEFFFD790, 0xEFFFD890) = 0
TRUSS> new: hand = 0xEF1B7D9C mask = 0 0 0 0 flags = 0x20012
TRUSS> old: hand = 0x00000000 mask = 0 0 0 0 flags = 0x20000
I think it's pretty obvious that libtt doesn't restore the old action
correctly (i.e. it resets the handler but uses its own flags 0x20012
instead of the ones xemacs had originally installed 0x0000).
from /usr/include/sys/signal.h these flags are:
#define SA_NOCLDSTOP 0x00020000 /* don't send job control SIGCLD's */
#define SA_NODEFER 0x00000010
#define SA_RESETHAND 0x00000002
whatever that exactly means...
I think this machine is not really up to date on patches, so maybe
this was fixed by Sun at some point (any Sun employees reading this,
who can peek at the libtt source to confirm/deny this?)
So I suggest we reinitialise our handler again, if the tt_open() call
in tooltalk.c has failed. Any comments?
Gunnar
P.S.: I haven't checked my HPUX builds yet, but maybe we have the same
problem there. Has anybody ever actually USED tooltalk on HPUX?
--
Gunnar Evermann
Speech, Vision & Robotics Group
Engineering Department
Cambridge University