OK, I think I understand to some extent what's going on here -- at the very
least, I can certainly see what goes wrong with the nil for argument. It
doesn't explain my periodic M-x grep problems with pdump, though.
Basically, the key problem is that the error is getting signalled INSIDE THE
CHILD. That's bad. Even though we don't use vfork(), so the two are separate
processes, they still share some handles, in particular the X connection. So
when the error occurs, the child will proceed on its way, signalling the error
and ultimately trying to display on the screen. At this point, you have two
processes talking to the same X connection, so of course the parent Xlib gets
confused.
i assume all such problems ultimately have to boil down to I/O on the socket by
the child. This could conceivably come from QUIT checking, for example -- if
something in the child process setup code invokes QUIT, and there's some I/O
pending on the socket, it will do I/O trying to figure out if C-g was pressed.
Another possibility: somewhere along the way someone added the following to the
end of child_setup():
/* I can't think of any reason why child processes need any more
than the standard 3 file descriptors. It would be cleaner to
close just the ones that need to be, but the following brute
force approach is certainly effective, and not too slow.
*/
{
int fd;
for (fd = 3; fd <= 64; fd++)
retry_close (fd);
}
Besides the fact that the check for <= 64 is obviously wrong and should be < 64,
there is already code earlier on to close all the open descriptors that XEmacs
owns -- i.e. all processes and files currently being loaded. i don't understand
Unix very well -- could closing a socket (e.g. the X socket) possibly cause some
flushing of the buffers, even if there's another handle onto the same descriptor
in the parent process?
my bet has to do with QUIT checking. in particular any call that does DFC-type
text conversion -- e.g.qxe_open(), qxe_chdir(), etc. etc. -- can trigger a QUIT
check. It will be likely if e.g. you move the mouse after starting the process
or maybe there's a key release waiting to be processed. the reason why this may
have started appearing more often in 21.5 -- even though you can still get the
problem using the `nil' trick in 21.4 -- is that the filename translation was
added in 21.5.
the solution is to wrap the entirety of the child process in
begin_dont_check_for_quit(). you don't need to bother ending this because once
we execvpe(), everything's wiped out anyway. Michael, can you try this and see
if it fixes your problems (other than the nil problem, which needs a different
fix)?
ben
----- Original Message -----
From: "Michael Sperber [Mr. Preprocessor]"
<sperber(a)informatik.uni-tuebingen.de>
To: <xemacs-beta(a)xemacs.org>; "Ben Wing" <ben(a)xemacs.org>
Sent: Friday, May 31, 2002 2:17 AM
Subject: Reliable way to lose X sequence
In the latest beta, do:
(start-process "foo" (get-buffer "*scratch*") "ls" nil)
This gets you:
Xlib: sequence lost (0x10000 > 0x3e5) in reply type 0x1!
... and XEmacs becomes unusable. I don't see this in 21.4.4.
Now, I and others have been seeing process-related sequence-lost
problems for *a long* time, so maybe this is related. (I know the nil
is invalid---XEmacs still shouldn't crash.)
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla