>>>> "Glynn" == Glynn Clements
<glynn.clements(a)virgin.net>:
Glynn> Jan Rychter wrote:
The common recurring theme is the two messages that appear on the
console that XEmacs was started from:
X Error of failed request: BadWindow (invalid Window parameter) Major
opcode of failed request: 25 (X_SendEvent) Resource id in failed
request: 0xe009ba Serial number of failed request: 593927 Current
serial number in output stream: 594059 Xlib: unexpected async reply
(sequence 0x9108b)!
xemacs-21.4.14: X Error of failed request: BadWindow (invalid Window
parameter) Major opcode of failed request: 18 (X_ChangeProperty)
Resource id in failed request: 0xe009ba Serial number of failed
request: 593926 Current serial number in output stream: 594059
> [umpteen or so crashes later, having lost some data along the
way]
>
> Can I do anything else to help resolving this bug?
Glynn> The following may produce useful information, but they may
Glynn> involve more inconvenience than you can accept:
[...]
Thanks -- I'll try doing that. It might take me a while, though.
> Is this a known problem, or is it just me?
Glynn> You
certainly aren't the only person to report it.
> Is anyone working on this?
Glynn> I don't think
so.
> I have a feeling this could be related to the resolver.
Glynn> How so?
I think I'm seeing two separate (but possibly related) issues here. One
is that XEmacs hangs. The other are the BadWindow messages.
I've just spent a long time trying to reproduce the problems, which
seemed to be spurious. And I think I found a case that shows where to
look for the problems.
Tcpdump says:
192.168.1.82.32807 > 10.11.53.10.domain: 17472+ A?
news-server.san.rr.com. (40) (DF)
[repeated 5 more times]
192.168.1.82.32809 > 10.11.53.10.domain: 17473+[|domain] (DF)
[repeated 5 more times]
192.168.1.82.32811 > 10.11.53.10.domain: 17474+ AAAA?
news-server.san.rr.com. (40)
(DF)
[repeated 5 more times]
192.168.1.82.32813 > 10.11.53.10.domain: 17475+[|domain] (DF)
[repeated 5 more times]
192.168.1.82.32815 > 10.11.53.10.domain: 17476+ A?
news-server.san.rr.com. (40) (DF)
[repeated 5 more times]
192.168.1.82.32813 > 10.11.53.10.domain: 17477+[|domain] (DF)
[repeated 5 more times]
... at which points XEmacs becomes totally unresponsive and stays in
that state forever (as far as I can tell). There is no other network
activity occurring.
Now, I know _why_ it's trying to do a DNS lookup and failing. It seems
this particular XEmacs process was started when I was connected to some
Wi-Fi hotspot and the resolver took the (valid at the time) contents of
/etc/resolv.conf, remembered it and made it it's Bible forever. Of
course I have no way to reach 10.11.53.10 now, and I also have no way to
tell XEmacs that resolv.conf has changed (is there any solution to that
problem?).
What I don't understand is why the lookup doesn't just fail. XEmacs
shouldn't hang forever, no matter what happens.
And I don't understand how the BadWindow messages are related, but they
seem to appear after I switch (using Alt-TAB in WindowMaker) to another
window (and possibly back)
For what it's worth (not much, I'm afraid), here's the lisp backtrace
produced when I kill (-INT) this hanging xemacs:
Lisp backtrace follows:
open-network-stream-internal("nntpd" #<buffer " *server
news-server.san.rr.com nntp *nntpd**"> "news-server.san.
rr.com" "nntp" nil)
# bind (coding-system-for-read coding-system-for-write cs-r cs-w protocol service host
buffer name)
open-network-stream("nntpd" #<buffer " *server
news-server.san.rr.com
nntp *nntpd**"> "news-server.san.rr.com" "
nntp")
# bind (buffer)
nntp-open-network-stream(#<buffer " *server
news-server.san.rr.com nntp
*nntpd**">)
# bind (coding-system-for-read coding-system-for-write)
byte-code("..." [pbuffer nntp-open-connection-function coding-system-for-read
coding-system-for-write nntp-coding
-system-for-write nntp-coding-system-for-read] 2)
# (condition-case ... . ((error) (quit (byte-code "ÁÂ!¨!¨ÄÅÆ\"¨Æ§" ...
3))))
# bind (timer pbuffer buffer)
nntp-open-connection(#<buffer " *nntpd*">)
# bind (connectionless defs server)
nntp-open-server("news-server.san.rr.com" nil)
byte-code("..." [gnus-command-method gnus-get-function open-server] 3)
# (condition-case ... . ((error (byte-code "ÁÂÃÄ!\"\"¨Æ§" ... 6))
(quit (gnus-message 1 "Quit trying to open server") nil)))
# bind (elem gnus-command-method)
gnus-open-server((nntp "news-server.san.rr.com"))
byte-code("..." [result method gnus-open-server] 2)
# (condition-case ... . ((quit (message "Quit gnus-check-server") nil)))
# bind (method result method silent method active method dont-check scan group)
gnus-activate-group("nntp+news-server.san.rr.com:comp.emacs.xemacs" scan)
# bind (retrieve-groups method active group info scanned-methods foreign-level level
newsrc level)
gnus-get-unread-articles(nil)
# bind (gnus-inhibit-demon nnmail-fetched-sources arg)
gnus-group-get-new-news(nil)
# bind (command-debug-status)
call-interactively(gnus-group-get-new-news)
# (condition-case ... . error)
# (catch top-level ...)
[1] Terminated xemacs-21.4.14
235.510u 6.680s 5:07:47.12 1.3% 0+0k 0+0io 17739pf+0w
So, it seems my hangs are directly related to the resolver (no DNS
servers being reachable) and should be easily reproducible.
--J.