Re: yet another XEmacs fork;-)

Sunday, 16 November 2008

Julian Bradfield wrote:

...
 How do systems deal with the problem that in some encodings (any
 ISO2022 that allows general character sets) there are many
 octet-strings that encode the same abstract text string? 
For the most part, they don't. Unicode fans pretend that anything
which causes problems for Unicode doesn't exist (or is "obsolete";
apparently, they get to decide that this is the case). Anyone who
actually needs to use such encodings typically avoids Unicode like the
plague (I've yet to see a Japanese game for Windows which uses the
Unicode API rather than the codepage-based API).

If you want to retrieve a filename from the OS then pass it back at a
later point, you need to retain the raw data. If you can't get at the
raw data, you lose.

...
 Come to that, how do UTF-8 based filesystems (Windows, Mac) behave
 when faced with a filename that is invalid - or are the OSes
 sufficiently well written to validate filename on creation? 
Mostly.

A more significant point is that Unicode strings aren't strings of
"characters", but of Unicode code points. The conversions between
Unicode and abstract characters suffer from many of the same problems
as with traditional encodings.

E.g. an accented letter can often be represented either as a single
code representing the composed character or as a sequence of the base
letter and a combining accent (Windows and Linux typically use the
former, while MacOSX uses the latter).

NTFS will happily let you have files whose names represent identical
text but differ in the exact sequence of codepoints.

...
 (Many years ago, we had a Pyramid Unix system, which had a network
file
 system interface to the Vaxen. This interface did so little checking
 of filenames that it was possible, from a Vax, to create a Unix file
 on the Pyramid with a '/' in its name! Of course, the only way to
 remove it, or access it in any way, was from a Vax.) 
A slightly similar situation exists on Windows, at least for registry
keys (I'm not sure about filenames). The "native" NT API represents
strings using an explicit length, while the Win32 API uses NUL
termination. Using the native API, you can create registry keys which
contain embedded NUL characters.

It's impossible to specify such keys via the Win32 API. They will show
up in RegEdit, minus the first NUL and anything following it. 
Attempting to examine the key's subkeys or value will result in a "key
not found" error.

-- 
Glynn Clements <glynn(a)gclements.plus.com&gt;

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: yet another XEmacs fork;-)