On Mon, 2002-08-05 at 01:00, Reuben Thomas wrote:
I've just been struggling for some time (my fault, I'm not
enough of
an XEmacs/PSGML expert) with the problem that, when I load an HTML
file without a doctype declaration, I always get "external entity html
not found".
Yeah, I can confirm this, annoying...
I created a CATALOG of my own with a doctype entry for html,
although
I later found there was one anyway in
.../xemacs/xemacs-packages/etc/psgml/CATALOG, with the entry
DOCTYPE HTML html.dtd
but it still didn't work.
Much puzzling over elisp later, I found the answer: the names of
doctypes found in catalog files are upcased, but sgml-catalog-lookup
doesn't upcase the name it tries to match against. The following patch
cures that:
[patch snipped]
I have a strongish feeling that this patch is not correct. psgml is
also used with XML documents, and the root element in XML is case
sensitive [1].
Not really very easily evident from there, but just think "the word
following DOCTYPE must be equal to the root element, and elements in XML
are case sensitive". I don't think that's the case with SGML though.
It also seems sensible for the same reason to change the line in
psgml-html.el/html-mode so that it reads
sgml-default-doctype-name "HTML"
(it's unnecessary, since it will be upcased anyway, but it seems
sensible to have all literal doctypes in upper case so as to avoid
confusion (and possible bugs)).
I don't think it's unnecessary at all, I will commit this one. Debian
also distributes a modified version of our psgml-html.el for GNU Emacs
with this change in place. It fixes the problem you described, and most
likely with no side effects.
OTOH, the stuff at the end of etc/CATALOG should probably be modified,
to something like this:
Index: CATALOG
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/xemacs-packages/psgml/etc/CATALOG,v
retrieving revision 1.4
diff -a -u -a -u -r1.4 CATALOG
--- CATALOG 2000/03/31 07:19:48 1.4
+++ CATALOG 2002/08/06 22:04:24
@@ -185,6 +185,6 @@
-- Subdocument doctypes --
DOCTYPE HTML html.dtd
DOCTYPE BOOK docbook.dtd
-DOCTYPE XHTML xhtml1-transitional.dtd
-DOCTYPE SCHEMA structures.dtd
-DOCTYPE MATH mathml.dtd
+DOCTYPE xhtml xhtml1-transitional.dtd
+DOCTYPE schema structures.dtd
+DOCTYPE math mathml.dtd
...since XHTML, XML Schemas and MathML are XML.
In addition, we should probably also have "HTML", "html",
"BOOK", "Book"
and "book" here, since HTML is not case sensitive, and neither is old,
v3.1 DocBook I think. Of course, if we only had the DocBook 4.1 XML
DTDs in etc/, "book" would be enough.
Now it works fine: you can load un-doctyped HTML (such as that
produced by Google) into XEmacs (I have Opera set up to use XEmacs as
its source viewer) and happily play around, even with
auto-activate-dtd on.
Confirmed, with only changing the sgml-default-doctype-name in
psgml-html, neat!
I'm somewhat surprised no-one fixed this before, but I suppose
most
people are happy to obey the command in the PSGML info that "you must
have a doctype declaration", even though the defaulting mechanisms
suggest that this is not strictly true.
Yep. But if the stuff I've written above is true, this is a hairy
subject; if we don't have a proper DOCTYPE declaration, we must guess...
Thanks for a good bug report!
[1] <
http://www.w3.org/TR/REC-xml#NT-doctypedecl>
Cheers,
--
\/ille Skyttä
ville.skytta at
xemacs.org