Very long. But if you get the idea that I am dumb, and sufficiently
so as to discount your skills, I'd better be _very_ explicit.
>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
"Stephen J. Turnbull" wrote:
Eistring (outline);
eicpy_c (outline, "\033$B");
eicpy_c (outline, "<+J,$G$7$g$&!#"); /* this should come from an
lstream or something */
eicpy_c (outline, "\033(B\n");
Ben> Eistrings are supposed to hold Mule-internal data inside
Ben> them. They're definitely not supposed to hold random string
Ben> stuff in them. So I'm not quite sure of what you're trying
Ben> to do. Could you explain it in words? Then I'll tell you
Ben> how you could do it best, using Eistrings and/or other
Ben> interfaces.
What I want to do (in this example), basically, is to implement
"encode-coding-region". (I stated that, explicitly but in the wrong
place, in the message you responded to.) I have some Japanese text, I
want to translate it to external format ISO-2022-JP.
This requires translating the Japanese from internal encoding to 7-bit
JIS X 0208 (presumably not the responsibility of an Eistring), then
adding appropriate escape sequences (which are octets, not in any
character set, if you take the ISO 2022 document seriously) before and
after the Japanese (and the newline of course is ASCII, not JIS).
Ben> If you want to copy or concatenate JIS format text, you need
Ben> to mark it as JIS. For example, you could imagine
Ben> Eistring (outline);
Ben> eicpy_ext (outline, "\033$B", Qjapanese_jis_0208);
This is not at all what I have in mind; that string is the ISO-2022
registered escape sequence for "designating JIS X 0208 to G0 and
invoking G0 to GL." I assumed you'd recognize one of those; if you
don't, why should I just "trust you" to design an interface that will
be used to implement handlers for them? It is not itself JIS X 0208,
and in fact that usage should cause XEmacs to barf() and abort().
Ben> eicat_ext (outline, "<+J,$G$7$g$&!#", Qjapanese_jis_0208);
I don't know what this code means. That's not Japanese, that's a
stream of octets for squirting out on some wire. Ie, Qbinary. I have
no guess at what you think I'm thinking, here.
Ben> eicat_ext (outline, "\033(B\n", Qjapanese_jis_0208);
Ben> But the real problem here is that the data conversion that
Ben> goes on inside the Eistrings is EXTREMELY SIMPLE and
Ben> stateless. In fact, it just uses TO_INTERNAL_FORMAT() and
Ben> TO_EXTERNAL_FORMAT() for all conversions, which assume a
Ben> complete encoding block.
I was assuming (and said so) that the conversion to desired format of
the second string was already done (eg by a "decoding lstream"); I
just wanted to add some literals (the kanji-in and kanji-out escape
sequences).
Ben> In your example above, you'd need to concatenate the Japanese
Ben> strings together separately, using some other mechanism, and
Ben> then feed the whole thing to eicat_ext(), so that it would
Ben> correctly encode it into Japanese.
No, there's no Japanese there ("Japanese" doesn't really exist in a
coherent way at the level Eistrings work at, IMO) ...
Ben> Or alternatively, you might be wanting to feed the bytes
Ben> directly into the string?
... as Lucy van Pelt would say, "THAT'S IT!!!!"
Ben> Then you can go ahead, but use Qbinary, e.g.
Ben> Eistring (outline);
Ben> eicpy_ext (outline, "\033$B", Qbinary);
Ben> eicat_ext (outline, "<+J,$G$7$g$&!#", Qbinary);
Ben> eicat_ext (outline, "\033(B\n", Qbinary);
Um, that's exactly what I wrote in the first place. Except that I
miswrote "cpy" for "cat". And used your "convenience
function". Not
very convenient, after all, since my usage is illegal, it seems. YMMV.
(And if ESC is arguably ASCII, the related usage for inserting the C1
(== 8-bit control) characters SS3 and SS4 (used as prefixes for JIS X
0212 and 0201 characters in EUC-JP) would be certainly be forbidden.)
Ben> but from the Eistring's perspective, you don't have Japanese
Ben> in here; you have gobbledygook which could be decoded into
Ben> Japanese. But that's not part of the design of the Eistring
Ben> -- it's supposed to entirely keep internally-formatted text
Ben> in it, and provide routines to manipulate this
Ben> internally-formatted text.
Right! Yes! Exactly! For example, implementing the Lisp functions
that a MIME-capable mailer would use. Or the lstream functions that
those functions would call. Or is that not what you have in mind?
Ben> It seems that you're trying to rather ad-hoc-ish extend the
Ben> design to do something else, and then you start complaing
Ben> when there are problems!
I'm trying to use the interface as my best guess suggests you intend
it to be used.
Ben> IMHO, my design is extremely well thought out.
I've noticed. But-but-but ...
Ben> Take a look at the example function I just posted to Hrvoje
Ben> showing how this all should be used.
... I did.
What is the specification of eiextdata()? I'd guess it just defaults
the coding_system argument for eito_external(). But I can't think of
a sensible way to do it, not even in the usage in your example. For
Unix, it has to involve getting the current filesystem coding-system,
but that need not be correct (it would be quite possible to have a
directory structure .../dir/, .../dir/JP/, .../dir/TW/ where the
filenames under JP are encoded in EUC-JP and those under TW are in
Big5). And I'm not even sure that Win32 guarantees such a filesystem
can't be constructed (say, using the Linux vfat filesystem---I know
some users who would do that, and I sympathize with their goals,
although not that implementation, they should be using Unicode ;),
maybe even under Windows 9x by dual-booting Japanese and Taiwanese.
And I assume that the documentation for mswindows_get_files specifies
that the programmer _must_ insure that that "dirfile" contains only
characters that Win32 can handle (we already have stock XEmacsen that
can[1] handle all 2^31 UCS-4 characters, you know---AFAIK, Win32 does
not, at best it handles UTF-16), since presumably Eistrings don't know
about that?
The point not being that Eistrings should nursemaid Win32 APIs; of
course they shouldn't. Rather that it's still very easy to write code
that may be "mule-correct" and "mule-safe" as those terms are defined
in Winglish, but is quite capable of aiding or abetting undefined
behavior (I've given two plausible examples in your sample function).
Which is neither "correct" nor "safe", at least not in plain English.
So can you define what "correct" and "safe" mean so that we can
decide
where the interface is going to help those goals by making intuitive
distinctions between what we do and don't need to worry about? (I
think that "checking what's in dirfile" is easily seen to be not
Eistring's job. Good design!) Or on the other hand, is it going to
make it easy to write unobviously buggy code? (I suspect that is true
of eiextdata(), per the example above. Doubleplusungood!)
Footnotes:
[1] Trusting Morioka's off-hand remark. If not all of them, at least
say 2^24 or 2^28 of them. Lots more than the 17*2^16 in UTF-16.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."