[Bug: 21.4.14] unix_send_process can fail to flush last chunk
james at xemacs.org
Mon Dec 13 12:56:54 EST 2004
Sorry for taking so long to reply. It's the end of another semester,
which means crunch time for faculty and students alike.
David Evers <extsw at appliedgenerics.com> wrote:
> We noticed this problem after upgrading to JDEE 2.3.4. The symptom was
> that JDE's (jde-open-class-at-point) function would hang for 30 seconds,
> then show the target file in a state only partially processed by JDE.
> Tracing the JDE reveals that the hang is in its (bsh-eval) function,
> which calls (process-send-string), then awaits a reply using
> (accept-process-output) with a 30-second timeout. The string it's
> sending is rather long: about 5000 bytes consisting of a single
> beanshell command to tell the inferior bsh about project settings.
> There is only one newline at the end. Having been supposedly sent this
> string by (process-send-string), the beanshell has not replied after 30
> In JDEE 2.3.4, communication with the bsh is over pipes: it binds
> (process-connection-type nil) when starting the inferior process.
> (JDE 2.3.2 did not do this, so it communicated using ptys instead.)
> The output pipe from xemacs to the inferior bsh is in non-blocking
> mode, apparently as of -r1.50 of process-unix.c.
> Adding a bit of tracing to process-unix.c:unix_send_process() and
> running under strace revealed a problem with the handling of EAGAIN on
> the output pipe. For this particular combination of circumstances
> (sending a string of just the right length to a slowish inferior
> process), it is only the write(2) system call under the final last-gasp
> Lstream_flush() in unix_send_process() that fails with EAGAIN. All the
> earlier writes succeeded, so the flushing loop guarded by
> Lstream_was_blocked_p() was never entered. The upshot is that the final
> 431-byte chunk of the string remains queued. The inferior bsh never
> sees the newline, so it never executes the command, and never sends the
> reply that (accept-process-output) is waiting for.
> The patch attached arranges that the last chunk of the string is always
> flushed through the DATA_OUTSTREAM _before_ the Lstream_was_blocked_p()
> loop, so that that loop can ensure it does actually get written. I've
> retained the existing last-gasp Lstream_flush() on the assumption that
> part of its function is to clear the _input_ side of the
> p->coding_outstream etc., even in the case that we longjmp() out of the
> SIGPIPE handler. Someone more familiar with the code may well be able
> to achieve the effect of this patch more cleanly.
> With this patch, our JDE hang goes away. The patch does not seem to
> break communications over ptys either (tested by going back to JDE
> Although this patch is against XEmacs 21.4.14 from Fedora Core 1, the
> bug seems to be present in all newer versions I can see in CVS.
I wonder if this is related to a problem in the 21.5 series that I've
been chasing for some time. The lstream code has changed somewhat from
21.4 to 21.5, but I wonder if the 2nd hunk of this patch:
is attempting to fix the same problem. Briefly, Lstream_close can lose
data if the process on the other end of the stream doesn't read stuff
fast enough. The referenced patch attempts to loop and push data into
the stream until it either encounters an error or succeeds in flushing
everything out. Does that approach also solve your problem? If so,
perhaps we can make common cause on finding a final solution to this
More information about the XEmacs-Beta