Re: encoding of etc/HELLO
13 years, 4 months
Stephen J. Turnbull
Uwe Brauer writes:
> >> Regarding Re: encoding of etc/HELLO; "Stephen J. Turnbull" <stephen(a)xemacs.org> adds:
>
> >> And Mule is plenty fast enough these days,
>
> > No, it's not. VM can tie up XEmacs for minutes when
> > saving a 50MB folder. This includes autosaves, which
> > means that XEmacs can drop dead without warning any
> > time I'm reading mail.
>
>
> This was one of the reasons why I switched from VM to gnus
> (or wl which is in the same liga).
>
> However I am most likely the last person on this list to
> suggest upgrading.... :-D
I wouldn't listen anyway. :-) I used Gnus for two or three years
because it was the quickest way to crash XEmacs. I use VM out of
respect for Kyle. :-)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta
Re: encoding of etc/HELLO
13 years, 4 months
Aidan Kehoe
Ar an dara lá de mí Iúil, scríobh Stephen J. Turnbull:
> > > The buffer is corrupted by no-Mule, though, because the information
> > > in the high bits is lost. This is a bad bug, issue780.
> >
> > Trashing the user’s non-Latin-1 data is the fundamental design
> > decision of no-Mule, I don’t think it’s constructive to report that as
> > a bug.
>
> Huh? No, it was and is quite feasible[1] to edit the ASCII portions
> of files in UTF-8 or UTF-16, using the raw-text coding systems.
Right, that doesn’t contradict what I said.
> It was and still is quite usable to edit all of the file in a unibyte
> coding system if you have a legacy X11 font with that registry, too.
‘Unibyte coding system’ has no meaning in XEmacs.
If a user is editing KOI8-R under non-Mule and has set a buffer’s font to
use a KOI8-R font, XEmacs corrupts data when Cyrillic is pasted in from
another app, XEmacs will not assign Cyrillic text read from UTF-8 files on
disk to code points between #x80 and #x100, upcasing will leave
ф as ф and will change Ц to ц,
If a user opens an iso-2022-7 file with Japanese text in non-Mule and
manipulates it as ASCII (say, M-x replace-string $B RET $A RET) , any
Japanese characters will be trashed if their code points overlap.
If a user types ƒ in non-Mule under X11, digit 2 appears in the buffer. If a
user types „, | appears in the buffer. If a user types ф, some ASCII
character appears. This is unusable for inputting those characters.
Trashing the user’s non-Latin-1 data is the fundamental design decision of
no-Mule. If someone cares about non-Latin-1 data, that person should be
using Mule. And Mule is plenty fast enough these days, and I’ve put years of
work into fixing bugs in it, it’s stable too.
--
‘Iodine deficiency was endemic in parts of the UK until, through what has been
described as “an unplanned and accidental public health triumph”, iodine was
added to cattle feed to improve milk production in the 1930s.’
(EN Pearce, Lancet, June 2011)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta
Re: encoding of etc/HELLO
13 years, 4 months
Aidan Kehoe
Ar an dara lá de mí Iúil, scríobh Stephen J. Turnbull:
> Jeff Sparkes writes:
>
> > The new HELLO is based on the one in Emacs. XEmacs opens the Emacs
> > HELLO and sets the buffer encoding to iso-2022-7. Opening the
> > XEmacs HELLO gets the buffer encoding set to raw-text which doesn't
> > display properly.
>
> Right. I haven't done a careful analysis, but I bet this is because
> the XEmacs coding systems are unable to detect this abominable mix.
> UTF-8-encoded segments do not conform to either the 7-bit ISO-2022
> format or the 8-bit ISO-2022-format. We would need to special case
> these ISO UCS coded character set invocation sequences, and in general
> stop detection there.
UTF-8-encoded segments *do* conform to ISO 2022. We happen to ignore that in
our detection algorithms, which is fine, since such text is basically not
encountered in the wild in contexts where we need to detect it.
> Aidan's choice of coding system doesn't make sense to me, I would just
> have said "to hell with the ISO-2022-JP rule, we're going UTF-8."[1]
> Alternatively, use the DOCS sequence ESC % G, which has a length
> specification in octets, making it reasonably easy to find the rest of
> the 3000 octets normally used for coding detection (although I'm not
> sure that we actually use that information in detection, it would be
> easier to implement than trying to deal with the many ISO-IR sequences
> for Unicode).
(ISO IR 196 describes exactly the DOCS sequence ESC % G; that DOCS does not
have an associated length specification in octets.)
> Aidan, were there specific reasons not to just convert the file to
> UTF-8?
Yes, the file distinguishes between the various national Han characters, and
it’s useful to have a file with these differences available as long as we
don’t unify them in our internal encoding.
> > Should the encoding be set in the variables at the end of the file?
> > And what would that be? I've tried utf-8 and iso-2022-7[.]
>
> iso-2022-8 seems to work for me, but I think we need Aidan's input.
iso-2022-8 and iso-2022-7 are both fine. (I’m not sure why the latter didn’t
work for Jeff; it worked fine for me.) But all the places where HELLO is
used programmatically specify its encoding explicitly--it would be helpful
to add the coding cookie, but not necessary.
> BTW, your knowledge may be meager, but your guesser is working
> fine. :-) It just didn't quite suffice this time.
>
> Footnotes:
> [1] Note that that's a rule that I personally instituted before
> XEmacs could handle Unicode coding systems at all. It made sense at
> that time, but now even no-Mule 21.5 XEmacsen can read UTF-8 files.
> Especially if we decide to use Unicode inside soon, this rule should
> be revisited (and in fact I think Ben Wing has already converted some
> files to UTF-8).
>
> The buffer is corrupted by no-Mule, though, because the information in
> the high bits is lost. This is a bad bug, issue780.
Trashing the user’s non-Latin-1 data is the fundamental design decision of
no-Mule, I don’t think it’s constructive to report that as a bug.
--
‘Iodine deficiency was endemic in parts of the UK until, through what has been
described as “an unplanned and accidental public health triumph”, iodine was
added to cattle feed to improve milk production in the 1930s.’
(EN Pearce, Lancet, June 2011)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta