The attached file defines a wrapper `safer-decode-coding-region' that
preserves most markers and extents across calls to
`decode-coding-region'. It also contains some example code which
shows how `decode-coding-region' is normally broken.
It works by temporarily setting the endpoint of all extents abutting
the region to `open', and placing "fenceposts" at each end of the
region to ensure that markers get pushed and pulled in the right
directions by operations acting entirely within the region.
I have understood Hrvoje to say `decode-coding-region' totally breaks
markers and extents but I haven't found that; my experience is that
only markers and extent endpoints within the closure of the region
being decoded are at risk. You can see that the "whole-extent" and
the external endpoint of "overlap-extent" in the example code are
fine. If somebody has bugs that involves external markers or extent
endpoints, and recipes to replicate I'd love to have a look at them.
`safer-decode-coding-region' is safer only for markers and extent
endpoints at the boundaries. Strictly interior markers end up at the
beginning of the decoded region; I don't know the general rule for
extent endpoints, but in the example the extent endpoint moves in the
opposite direction, to the end of the decoded region.
I don't see much reason to care about this in the case of
`decode-coding-region'; setting markers in the middle of an
externally-encoded region is semantically dubious at best.
Unfortunately, that's not true for `encode-coding-region', and I
suspect it's probably not true for any functions that work the way
`*code-coding-region' does (feeding buffer text to an lstream,
deleting the text, and inserting the output of the lstream into the
buffer).
I have figured out but not implemented a hack that can make a pretty
good guess at where an interior extent endpoint belongs after the
region is en/decoded (at least for non-modal and ISO-2022-conformant
modal encodings). I don't see any way to handle markers, though,
because according to Info there's no way to get a list of markers. In
any case it probably ought to be possible to handle extents much more
easily in the C code for F*code-coding-region.
Probably some of this (the hacking at the boundary of the region)
should be done at the level of lstream.c (nothing I do is specific to
`decode-coding-region'; in fact the insight came after considering an
RMS comment about the ordering of block insertions and deletions in
wid-edit.el, but this doesn't work with extents because of their
flexible closure properties). Unfortunately that level of code is
beyond me at the moment.
Here's the code. You need a Mule XEmacs (of course), and a Japanese
font and a color-capable terminal for the visuals. You can't just use
`load-file' on it because that automatically translates the Japanese
from ISO-2022-JP to Mule internal encoding, ruining the experiment.
It's probably easiest to load it into a buffer, raw, and then do
`eval-buffer' on it.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for? "Free software rules."
Show replies by date