I have been at the XEmacs-with-gpm problem again (with a debugger and
a -g build of libgpm this time) and I found the problem. It is a bug
in the way libgpm tries to handle relaying to the old signal handler.
I debugged against 1.13 (that's what I have at home) but the code did
not change in 1.14.
Note that even a simple program of the type
int main(int argc,char *argv[])
{
/* setup conn struct */
Gpm_Open(conn,0);
puts("Here we go..");
killpg(0,SIGTSTP);
puts("Back again");
Gpm_Close();
}
will hang when run on a Linux console (and not on an Xterm).
Alternatively, just run the 'mev' example program on a Linux console and send
it a SIGTSTP and it will also hang.
The problem is here in (liblow.c):
> #if (defined(SIGTSTP))
> /* itz: support for SIGTSTP */
First comment. Compare this handler with the way the SIGWINCH is
relayed just above.
> /* Old SIGTSTP handler. */
>
> static __sighandler_t gpm_saved_suspend_hook;
>
> static void gpm_suspend_hook (int signum)
Note. Signal handlers installed with 'signal' are called with the signal
that generated them blocked. It is then unblocked on exiting the signal
handler.
> {
> Gpm_Connect gpm_connect;
> sigset_t old_sigset;
> sigset_t new_sigset;
> int success;
>
> sigemptyset (&new_sigset);
> sigaddset (&new_sigset, SIGTSTP);
> sigprocmask (SIG_BLOCK, &new_sigset, &old_sigset);
Not necessary signal is already blocked.
> /* Open a completely transparent gpm connection */
> [code snipped]
>
> /* take the default action, whatever it is (probably a stop :) */
> sigprocmask (SIG_SETMASK, &old_sigset, 0);
> signal (SIGTSTP, gpm_saved_suspend_hook);
Reinstall old signal handler.
> kill (getpid (), SIGTSTP);
Send the signal to ourselves. The idea here is that the old signal
handler will now get called. However as the signal is currently
blocked it will get queued.
> /* in bardo here */
No we aren't!
> /* Reincarnation. Prepare for another death early. */
> signal (SIGTSTP, gpm_suspend_hook);
Gpm handler is installed again.
> /* Pop the gpm stack by closing the useless connection */
> /* but do it only when we know we opened one.. */
> if (success) {
> Gpm_Close ();
[*] Suppose the old SIGTSTP handler was called. How are we sure we
get back here? What if the old handler longjmp's to somewhere?
> } /*if*/
> }
Here the signal handler exits. The SIGTSTP handler is unblocked and
any queued signals are sent including the one we sent outselves. Bingo!
we now have a sinal loop. The process will hang sucking up 100% of CPU.
> #endif /* SIGTSTP */
This code was obviously written for a system where signal handlers
were not called with the signal blocked. Did Linux change its
behaviour sometime in the past?
How to fix
1. Maybe explicitly unblocking the signal between reinstalling the
old handler and sending the signal works.
[Last minute addition. Yes this must work, it is the way Emacs
relays signals to their default handler].
2. The SIGWINCH handler just above it just calls the old signal
handler direct. That obviously works. May I suggest however that
a more complete prototype is used for the signal handler.
3. The signal handler is installed using a more general interface
than the BSD signal wich allows setting the handler type such
that the signal is NOT blocked by default. This however leads to
race conditions.
Solutions [1] [2] have the advantage that they are relatively minor
changes and do not require the libgpm user to be aware of the signal
issues. However as the user is not aware this leads to problems such
as [*]. However for XEmacs this is not relevant. It does not have its
own SIGTSTP handler. However this might change in the future (for
instance to fix the 'gnuclient on same tty problem' .
4. The Debian bug tracking system contains patches that allow
processes to specify before and after hooks that get called by
the libgpm singal handler instead of calling the origional
handler. This is better however it still does not solve [*].
Moreover we still want to call the SIG_DFL handler to actuall do
the suspending (or can we just call pause()) I am not sure how
that is handled correctly or how that is done in this particular
patch.
5. Alternatively one can take the dual interface to 4. There is is
a hook that says the process will take care of signal handling
itself and libgpm will provide function gpm_before_suspend and
gpm_after_suspend that the program must call.
I would like a combination of 1 and 5. With 5 being low priority so it
can wait till there actually is a client that needs it.
Jan