Jan Rychter wrote:
  >> I have a feeling this could be related to the resolver.
  Glynn> How so?
 
 I think I'm seeing two separate (but possibly related) issues here. One
 is that XEmacs hangs. The other are the BadWindow messages. 
Sorry, I was focused on the BadWindow errors. The hanging does appear
to be related to the resolver.
 I've just spent a long time trying to reproduce the problems,
which
 seemed to be spurious. And I think I found a case that shows where to
 look for the problems. 
 
 Tcpdump says:
 
 192.168.1.82.32807 > 10.11.53.10.domain:  17472+ A? 
news-server.san.rr.com. (40) (DF)
 [repeated 5 more times]
 192.168.1.82.32809 > 10.11.53.10.domain:  17473+[|domain] (DF)
 [repeated 5 more times]
 192.168.1.82.32811 > 10.11.53.10.domain:  17474+ AAAA? 
news-server.san.rr.com. (40)
(DF)
 [repeated 5 more times]
 192.168.1.82.32813 > 10.11.53.10.domain:  17475+[|domain] (DF)
 [repeated 5 more times]
 192.168.1.82.32815 > 10.11.53.10.domain:  17476+ A? 
news-server.san.rr.com. (40) (DF)
 [repeated 5 more times]
 192.168.1.82.32813 > 10.11.53.10.domain:  17477+[|domain] (DF)
 [repeated 5 more times]
 
 ... at which points XEmacs becomes totally unresponsive and stays in
 that state forever (as far as I can tell). There is no other network
 activity occurring.
 
 Now, I know _why_ it's trying to do a DNS lookup and failing. It seems
 this particular XEmacs process was started when I was connected to some
 Wi-Fi hotspot and the resolver took the (valid at the time) contents of
 /etc/resolv.conf, remembered it and made it it's Bible forever. Of
 course I have no way to reach 10.11.53.10 now, and I also have no way to
 tell XEmacs that resolv.conf has changed (is there any solution to that
 problem?). 
To fix the running process, you would need to attach a debugger to the
process and manually modify libc's internal state.
To eliminate the problem in future, run a caching-only named, and have
resolv.conf point to 127.0.0.1. Then, you can just restart named if
you change the DNS server.
 What I don't understand is why the lookup doesn't just fail.
XEmacs
 shouldn't hang forever, no matter what happens. 
Well, unless gethostbyname/getaddrinfo hang forever. AFAICT, XEmacs
relies upon those functions either succeeding or failing eventually;
it doesn't appear to try to pre-empt them.
 And I don't understand how the BadWindow messages are related,
but they
 seem to appear after I switch (using Alt-TAB in WindowMaker) to another
 window (and possibly back) 
I'm not sure that they are related. Unless it's due to a bug in the X
error handler. In general, error handling code is notorious for not
getting particularly well tested by typical usage.
 So, it seems my hangs are directly related to the resolver (no DNS
 servers being reachable) and should be easily reproducible. 
I wouldn't be so sure, particularly  what happens after XEmacs calls
gethostbyname or getaddrinfo is likely to be highly system-specific
(i.e. it depends upon what's in nsswitch.conf, whether you're running
nscd etc).
-- 
Glynn Clements <glynn.clements(a)virgin.net>