Now that I've created chains of coding systems and gotten them to work, the
issue comes up: when specifying the chain, does it make more sense to take a
decoding or encoding perspective?
e.g. imagine i have data in base64 wrapping of gzip of euc-jp of CRLF text
[coding system convert-eol-crlf, which assumes the EOL processing has been
extracted into a separate coding system, which i'm in the process of doing].
once you specify a coding system chain to do this transformation, you can use it
both ways: decoding to get the actual text, encoding to get some base64 stuff
[or vice versa if you reversed the order of the chain]. as you can see,
everything is symmetrical and reversible, and it's easy to get mixed up unless
there's a clear standard, which hopefully is the most intuitive.
so, what do people think: given that we for the moment are talking about
processes where the result of *decoding* is just the text, should we consider as
basic the steps required to decode:
base64 | gunzip | euc-jp | convert-eol-crlf,
or the steps required to encode:
convert-eol-crlf | euc-jp | gzip | base64
Note that I changed the name of the gzip process between gzip and gunzip so as
not to bias things. In some ways decode makes sense because that's what the
end-user is more likely to be doing, and they may get confused having to specify
encode-centric steps. [encoding will typically just used what's been decoded.]
In some ways encode makes sense because it makes more sense to attach data to a
file describing how it was created, rather than how to decode it. [and indeed
this is how the world works -- utilities are generically called "tar" and
"gzip", implying encoding.]
this also affects the naming. e.g. i have a coding system that converts between
unicode and multibyte; when decoding multibyte, the steps are [multibyte-data] |
multibyte-to-unicode | unicode | convert-eol-crlf, and you get raw text. when
encoding, the steps are reversed. do i call this coding system
`multibyte-to-unicode' [emphasizing what it does when decoding] or
`unicode-to-multibyte' [emphasizing what it does when encoding]? as usual,
names tend to emphasize their encoding process, so i called it
unicode-to-multibyte; but that gets very confusing if you're specifying a
decoding chain, since the operation will be multibyte-to-unicode but you'll
still have to give the opposite name.
sorry if this seems very confusing. the basic point is:
for naming, we need to settle on one of {encoding, decoding} as the "basic"
operation, and the other as the "undoing" operation. all naming, all chains,
etc. have to take the point of view of the "basic" operation, and when doing the
other operation, you need to consciously say to yourself, "i'm doing a reverse
operation" and choose or specify the "forward" operation, knowing it will go
backward.
which of {encoding,decoding} is the "basic", "forward" operation?
ben