goofy charset selection for Unicode pastes
Jamie Zawinski
jwz at jwz.org
Fri Aug 20 22:27:09 EDT 2004
Glynn Clements wrote:
>
> None of the ISO-8859-* family have em-dash (ISO-8859-1 doesn't and,
> AFAIK, the rest of them all have essentially the same set of
> "punctuation" characters). The first one which *does* have an em-dash
> is chinese-cns11643-1.
>
> Notes:
>
> 1. By default, XEmacs isn't set up to use the *-iso10646-1 fonts (and
> I don't think that it can; if displaying Unicode was as simple as
> selecting a Unicode font, I don't think that we'd be using mule-ucs).
>
> 2. mule-ucs doesn't understand the windows-125x encodings (and, if it
> wasn't for those, I doubt that many people would be using — in
> the first place).
I'm almost afraid to ask, but how does Mozilla end up displaying that
mdash properly?
Really, I just wish mdash (and ldquo, and all that other Windows crap)
got turned into the roughly-corresponding Latin1 characters on paste...
As long as we're on the topic -- how do I search a buffer for
"problematic" characters? I used to do
(re-search-forward "[\000-\010\013-\037\177-\377]")
but that does not match unicrud. The best guess I've been able to
come up with is
(while (and (not (eobp))
(eq 'ascii (charset-after (point))))
(forward-char 1))
but that really doesn't smell right.
--
Jamie Zawinski
jwz at jwz.org http://www.jwz.org/
jwz at dnalounge.com http://www.dnalounge.com/
More information about the XEmacs-Beta
mailing list