goofy charset selection for Unicode pastes

Jamie Zawinski jwz at jwz.org
Fri Aug 20 22:27:09 EDT 2004


Glynn Clements wrote:
> 
> None of the ISO-8859-* family have em-dash (ISO-8859-1 doesn't and,
> AFAIK, the rest of them all have essentially the same set of
> "punctuation" characters). The first one which *does* have an em-dash
> is chinese-cns11643-1.
> 
> Notes:
> 
> 1. By default, XEmacs isn't set up to use the *-iso10646-1 fonts (and
> I don't think that it can; if displaying Unicode was as simple as
> selecting a Unicode font, I don't think that we'd be using mule-ucs).
> 
> 2. mule-ucs doesn't understand the windows-125x encodings (and, if it
> wasn't for those, I doubt that many people would be using — in
> the first place).

I'm almost afraid to ask, but how does Mozilla end up displaying that
mdash properly?  

Really, I just wish mdash (and ldquo, and all that other Windows crap)
got turned into the roughly-corresponding Latin1 characters on paste...

As long as we're on the topic -- how do I search a buffer for
"problematic" characters?  I used to do

   (re-search-forward "[\000-\010\013-\037\177-\377]")

but that does not match unicrud.  The best guess I've been able to
come up with is

   (while (and (not (eobp))
               (eq 'ascii (charset-after (point))))
     (forward-char 1))

but that really doesn't smell right.

-- 
Jamie Zawinski
jwz at jwz.org             http://www.jwz.org/
jwz at dnalounge.com       http://www.dnalounge.com/




More information about the XEmacs-Beta mailing list