Hello David,
Sorry for taking so long to reply. It's the end of another semester,
which means crunch time for faculty and students alike.
David Evers <extsw(a)appliedgenerics.com> wrote:
We noticed this problem after upgrading to JDEE 2.3.4. The symptom
was
that JDE's (jde-open-class-at-point) function would hang for 30 seconds,
then show the target file in a state only partially processed by JDE.
Tracing the JDE reveals that the hang is in its (bsh-eval) function,
which calls (process-send-string), then awaits a reply using
(accept-process-output) with a 30-second timeout. The string it's
sending is rather long: about 5000 bytes consisting of a single
beanshell command to tell the inferior bsh about project settings.
There is only one newline at the end. Having been supposedly sent this
string by (process-send-string), the beanshell has not replied after 30
seconds.
In JDEE 2.3.4, communication with the bsh is over pipes: it binds
(process-connection-type nil) when starting the inferior process.
(JDE 2.3.2 did not do this, so it communicated using ptys instead.)
The output pipe from xemacs to the inferior bsh is in non-blocking
mode, apparently as of -r1.50 of process-unix.c.
Adding a bit of tracing to process-unix.c:unix_send_process() and
running under strace revealed a problem with the handling of EAGAIN on
the output pipe. For this particular combination of circumstances
(sending a string of just the right length to a slowish inferior
process), it is only the write(2) system call under the final last-gasp
Lstream_flush() in unix_send_process() that fails with EAGAIN. All the
earlier writes succeeded, so the flushing loop guarded by
Lstream_was_blocked_p() was never entered. The upshot is that the final
431-byte chunk of the string remains queued. The inferior bsh never
sees the newline, so it never executes the command, and never sends the
reply that (accept-process-output) is waiting for.
The patch attached arranges that the last chunk of the string is always
flushed through the DATA_OUTSTREAM _before_ the Lstream_was_blocked_p()
loop, so that that loop can ensure it does actually get written. I've
retained the existing last-gasp Lstream_flush() on the assumption that
part of its function is to clear the _input_ side of the
p->coding_outstream etc., even in the case that we longjmp() out of the
SIGPIPE handler. Someone more familiar with the code may well be able
to achieve the effect of this patch more cleanly.
With this patch, our JDE hang goes away. The patch does not seem to
break communications over ptys either (tested by going back to JDE
2.3.2).
Although this patch is against XEmacs 21.4.14 from Fedora Core 1, the
bug seems to be present in all newer versions I can see in CVS.
I wonder if this is related to a problem in the 21.5 series that I've
been chasing for some time. The lstream code has changed somewhat from
21.4 to 21.5, but I wonder if the 2nd hunk of this patch:
http://list-archive.xemacs.org/xemacs-beta/200410/msg00100.html
is attempting to fix the same problem. Briefly, Lstream_close can lose
data if the process on the other end of the stream doesn't read stuff
fast enough. The referenced patch attempts to loop and push data into
the stream until it either encounters an error or succeeds in flushing
everything out. Does that approach also solve your problem? If so,
perhaps we can make common cause on finding a final solution to this
problem.
Thanks,
--
Jerry James
http://www.ittc.ku.edu/~james/