XEmacs-Beta July 2011

xemacs-beta@xemacs.org

19 participants
43 discussions

Stephen J. Turnbull

Uwe Brauer writes: > >> Regarding Re: encoding of etc/HELLO; "Stephen J. Turnbull" <stephen(a)xemacs.org> adds: > > >> And Mule is plenty fast enough these days, > > > No, it's not. VM can tie up XEmacs for minutes when > > saving a 50MB folder. This includes autosaves, which > > means that XEmacs can drop dead without warning any > > time I'm reading mail. > > > This was one of the reasons why I switched from VM to gnus > (or wl which is in the same liga). > > However I am most likely the last person on this list to > suggest upgrading.... :-D I wouldn't listen anyway. :-) I used Gnus for two or three years because it was the quickest way to crash XEmacs. I use VM out of respect for Kyle. :-) _______________________________________________ XEmacs-Beta mailing list XEmacs-Beta(a)xemacs.org http://lists.xemacs.org/mailman/listinfo/xemacs-beta

1 participants
0 comments

Re: encoding of etc/HELLO 14 years

Aidan Kehoe

Ar an dara lá de mí Iúil, scríobh Stephen J. Turnbull: > > > The buffer is corrupted by no-Mule, though, because the information > > > in the high bits is lost. This is a bad bug, issue780. > > > > Trashing the user’s non-Latin-1 data is the fundamental design > > decision of no-Mule, I don’t think it’s constructive to report that as > > a bug. > > Huh? No, it was and is quite feasible[1] to edit the ASCII portions > of files in UTF-8 or UTF-16, using the raw-text coding systems. Right, that doesn’t contradict what I said. > It was and still is quite usable to edit all of the file in a unibyte > coding system if you have a legacy X11 font with that registry, too. ‘Unibyte coding system’ has no meaning in XEmacs. If a user is editing KOI8-R under non-Mule and has set a buffer’s font to use a KOI8-R font, XEmacs corrupts data when Cyrillic is pasted in from another app, XEmacs will not assign Cyrillic text read from UTF-8 files on disk to code points between #x80 and #x100, upcasing will leave ф as ф and will change Ц to ц, If a user opens an iso-2022-7 file with Japanese text in non-Mule and manipulates it as ASCII (say, M-x replace-string $B RET $A RET) , any Japanese characters will be trashed if their code points overlap. If a user types ƒ in non-Mule under X11, digit 2 appears in the buffer. If a user types „, | appears in the buffer. If a user types ф, some ASCII character appears. This is unusable for inputting those characters. Trashing the user’s non-Latin-1 data is the fundamental design decision of no-Mule. If someone cares about non-Latin-1 data, that person should be using Mule. And Mule is plenty fast enough these days, and I’ve put years of work into fixing bugs in it, it’s stable too. -- ‘Iodine deficiency was endemic in parts of the UK until, through what has been described as “an unplanned and accidental public health triumph”, iodine was added to cattle feed to improve milk production in the 1930s.’ (EN Pearce, Lancet, June 2011) _______________________________________________ XEmacs-Beta mailing list XEmacs-Beta(a)xemacs.org http://lists.xemacs.org/mailman/listinfo/xemacs-beta

1 participants
0 comments

Re: encoding of etc/HELLO 14 years

Aidan Kehoe

Ar an dara lá de mí Iúil, scríobh Stephen J. Turnbull: > Jeff Sparkes writes: > > > The new HELLO is based on the one in Emacs. XEmacs opens the Emacs > > HELLO and sets the buffer encoding to iso-2022-7. Opening the > > XEmacs HELLO gets the buffer encoding set to raw-text which doesn't > > display properly. > > Right. I haven't done a careful analysis, but I bet this is because > the XEmacs coding systems are unable to detect this abominable mix. > UTF-8-encoded segments do not conform to either the 7-bit ISO-2022 > format or the 8-bit ISO-2022-format. We would need to special case > these ISO UCS coded character set invocation sequences, and in general > stop detection there. UTF-8-encoded segments *do* conform to ISO 2022. We happen to ignore that in our detection algorithms, which is fine, since such text is basically not encountered in the wild in contexts where we need to detect it. > Aidan's choice of coding system doesn't make sense to me, I would just > have said "to hell with the ISO-2022-JP rule, we're going UTF-8."[1] > Alternatively, use the DOCS sequence ESC % G, which has a length > specification in octets, making it reasonably easy to find the rest of > the 3000 octets normally used for coding detection (although I'm not > sure that we actually use that information in detection, it would be > easier to implement than trying to deal with the many ISO-IR sequences > for Unicode). (ISO IR 196 describes exactly the DOCS sequence ESC % G; that DOCS does not have an associated length specification in octets.) > Aidan, were there specific reasons not to just convert the file to > UTF-8? Yes, the file distinguishes between the various national Han characters, and it’s useful to have a file with these differences available as long as we don’t unify them in our internal encoding. > > Should the encoding be set in the variables at the end of the file? > > And what would that be? I've tried utf-8 and iso-2022-7[.] > > iso-2022-8 seems to work for me, but I think we need Aidan's input. iso-2022-8 and iso-2022-7 are both fine. (I’m not sure why the latter didn’t work for Jeff; it worked fine for me.) But all the places where HELLO is used programmatically specify its encoding explicitly--it would be helpful to add the coding cookie, but not necessary. > BTW, your knowledge may be meager, but your guesser is working > fine. :-) It just didn't quite suffice this time. > > Footnotes: > [1] Note that that's a rule that I personally instituted before > XEmacs could handle Unicode coding systems at all. It made sense at > that time, but now even no-Mule 21.5 XEmacsen can read UTF-8 files. > Especially if we decide to use Unicode inside soon, this rule should > be revisited (and in fact I think Ben Wing has already converted some > files to UTF-8). > > The buffer is corrupted by no-Mule, though, because the information in > the high bits is lost. This is a bad bug, issue780. Trashing the user’s non-Latin-1 data is the fundamental design decision of no-Mule, I don’t think it’s constructive to report that as a bug. -- ‘Iodine deficiency was endemic in parts of the UK until, through what has been described as “an unplanned and accidental public health triumph”, iodine was added to cattle feed to improve milk production in the 1930s.’ (EN Pearce, Lancet, June 2011) _______________________________________________ XEmacs-Beta mailing list XEmacs-Beta(a)xemacs.org http://lists.xemacs.org/mailman/listinfo/xemacs-beta

1 participants
0 comments

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

XEmacs-Beta July 2011