Glynn Clements <glynn.clements(a)virgin.net> writes:
Note: the backtrace which Stephen queried *does* have _XRead() in it.
Also, *all* of the backtraces have a NULL device argument, which is
just too improbably (it not only shouldn't be able happen, but if it
did, XEmacs should segfault almost instantly, before it gets down into
Xlib).
Hmmm...
> Sorry for the bad backtrace-- I'm pretty sure that I just
pointed at
> the wrong binary when I generated that one.
I don't think so. The odd one out (the second one) appears to differ
from the other two only in that there aren't any symbols for the Xlib
portion. Most of the addresses match.
I don't know... I am running it now with -synchronous, now, to see if
either that changes things or might help me attach a debugger and
investigate more thoroughly when it's deadlocked.
> Thanks very much, guys, for investigating this.... I'm
hating the
> thought of using GNU Emacs again, but I'm losing work regularly
> because of these hangs (and so are other folks in the office).
The thing is, everything points towards problems at a lower level than
XEmacs itself, i.e. in the communication between Xlib and the X server
(X Errors mentioning obviously bogus resource IDs, Xlib's "unexpected
async reply" message, crashes in XRead()).
I agree; I'd more strongly suspect lower level issues if I had even
just one other program have issues. Perhaps XEmacs is just pushing
the envelope more.
Assuming that neither the X server nor Xlib are outright broken, the
next most likely explanations include Xlib not being compatible with
other key components (e.g. libc, or the Xlib headers), or something
dumping garbage into the socket.
I'm running RH9 RPMs including XFree86-4.3.0-2, XFree86-libs-4.3.0-2,
NVIDIA_kernel-2.4.20-9-1.0-4363gg4 (I think that's a local build) and
glibc-2.3.2-27.9. Numerous XEmacs users are all having
this issue... we do have nearly identical installs and often similar
hardware (including the video card).
This is just a complete shot in the dark, but can you check the
output
from "ls -l /proc/<pid>/fd", where <pid> is the PID of the XEmacs
process. I'm curious as to whether the X socket is getting assigned to
one of the standard descriptors (0, 1 or 2).
No, 0,1,2 are all attached to /dev/pts/* devices.
But that makes me think of something else that I found strange... it
was happening a bit earlier today and seemed to happen most around the
same time as I was having truly horrible issues with XEmacs hanging
(about every 20 minutes, sometimes twice in 5 minutes).
Using zsh, I would start an xemacs like so:
xemacs &
or
xemacs &|
(the `&|' suffix is like `&' but disowns the process immediately -- e.g.,
doesn't put it in the job table of the parent shell).
When I did that, I got this error:
temacs can only be run in -batch mode.
and it exited with value 255. Retries would often work, especially if
I didn't background the process initially (maybe that was just a
coincidence, though).
I cannot reproduce it now...
Any further help you can provide is greatly appreciated!
Thanks,
Greg