Annoyance with coding systems

Fabrice Popineau Fabrice.Popineau at supelec.fr
Thu Jun 9 03:37:13 EDT 2005


* Stephen J Turnbull <stephen at xemacs.org> writes:

    Fabrice>  (coding-priority-list) (no-conversion
    Fabrice>  utf-16-little-endian-bom utf-16-bom utf-8-bom iso-7 utf-8
    Fabrice>  iso-8-1 iso-8-2 iso-8-designate iso-lock-shift shift-jis
    Fabrice>  big5 utf-16-little-endian utf-16 ucs-4)

    > OK, something is broken here.  iso-8-1 (which basically means some
    > page of ISO 8859, which one is determined by
    > current-language-environment, in "French" of course it's either
    > ISO 8859-1 or ISO 8859-15) comes before Big5.  So for some reason
    > Big5 is getting very high likelihood ratings, despite the fact
    > that there are few potential multibyte characters in the buffer.

Well, maybe I'll try to trace that but this stuff is too hairy for the
moment.

Currently, I have those 2 options:

(when (featurep 'mule)
  (setq buffer-file-coding-system-for-read 
	(car (get-language-info current-language-environment 'coding-system))))
	  
;(when (featurep 'mule)
;  (set-default-buffer-file-coding-system 
;   (get-coding-system-from-locale (current-locale)))
;  (set-buffer-file-coding-system-for-read 
;   (get-coding-system-from-locale (current-locale))))

I tend to favour the first one, which results in 'iso-8859-1 for me
(living in a "French" locale). I assume it will work for other people
too as long as their locale is detected.

The most unnatural part of this stuff is that you can change the coding
of your buffer: you don't _see_ any effect. Only when the file is written,
you will possibly lose information. This is annoying. I did some tests
like inserting a string made of chars from 0 to 255 and writing it so
some file with different codings. It seems that MSW-MB even loses chars
where iso-8859-1 does not. That's weird because you should not lose any
information when writing things out.

Fabrice




More information about the XEmacs-Beta mailing list