Ar an séiú lá de mí Méan Fómhair, scríobh David Kastrup:
Didier Verna <didier(a)xemacs.org> writes:
> Stephen J. Turnbull wrote:
>
>> It's annoying, for sure, but all I can suggest is "Don't do that,
>> If you want to work on it, be my guest, but IMO there are a lot
>> more important things to do.
>
> Well, sure. I was just taking the time to report something I
> noticed for the first time. You have such a nice way of encouraging
> people to get involved again :-) :-)
You say you want a re-e-volution, we-ell you kno-o-ow...
Since you’re around, David, can I have your input on this API for error
sequences with (especially) UTF-8? As you know, we don’t distinguish binary
and latin-1, which hasn’t been a problem for us in general, but it does mean
we can’t take the GNU approach to invalid sequences on disk.
I’ve changed our Unicode coding systems to generate “Unicode” code points
ranging from 0x200000-0x2000FF instead, and a Lisp API to deal with them
follows. Is there anything you’d like included beyond that for your AUCTeX
use case?
(defvar unicode-error-default-translation-table [omitted]
"Table mapping Unicode error octets to ASCII, control-1 and latin-1 chars.
To transform XEmacs Unicode error sequences to the ascii, control-1 and
latin-iso8859-1 characters that correspond to the octets on disk, you can
use this variable. ")
(defvar unicode-error-sequence-regexp-range [omitted]
"Regular expression range to match Unicode error sequences in XEmacs.
Invalid Unicode sequences on input are represented as XEmacs
characters with values stored as the keys in
`unicode-error-default-translation-table', one character for each
invalid octet. You can use this variable (with `re-search-forward' or
`skip-chars-forward') to search for such characters; see also
`unicode-error-translate-region'. ")
(defun unicode-error-translate-region (begin end &optional buffer table)
"Translate the Unicode error sequences in BUFFER between BEGIN and END.
The error sequences are transformed, by default, into the ASCII,
control-1 and latin-iso8859-1 characters with the numeric values
corresponding to the incorrect octets encountered. This is achieved
by using `unicode-error-default-translation-table' (which see) for
TABLE; you can change this by supplying another character table,
mapping from the error sequences to the desired characters.
You can call `map-char-table' on `unicode-error-default-translation-table'
to collect the actual XEmacs characters corresponding to Unicode errors on
disk. "
[implementation omitted])
(defun frob-unicode-errors-region (frob-function begin end &optional buffer)
"Call FROB-FUNCTION on the Unicode error sequences between BEGIN and END.
Optional argument BUFFER specifies the buffer that should be examined for
such sequences. "
[implementation omitted])
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta