problems with (decode-coding-region 'utf-8: ' gives &#39

Stephen J. Turnbull stephen at xemacs.org
Thu May 7 14:39:43 EDT 2009


Uwe Brauer writes:

 > In turns out that  Xemacs (21.4.21. Or 21.5.18) cannot deal correctly
 > with the ' any more. (While GNU emacs can!)
 > 
 > Look at the following example:
 > Ich war heute  in Paris 
 > --->
 > J'étais aujourd'hui à Paris
 > instead of 
 > J'étais aujourd'hui à Paris
 > 
 > The relevant code line is 
 >            (decode-coding-region (point-min) (point-max) 'utf-8)

That has nothing to do with it.

Those are HTML character entities,
they are composed of perfectly valid ASCII characters,
Google put them there, not XEmacs, and
the utf-8 coding system must leave them alone.

The problem is that babel.el isn't doing anything about translating
them to characters.  Try adding

                         (require 'w3-parse)
                         (goto-char (point-min))
                         (while (progn
                                  (skip-chars-forward "^&")
                                  (not (eobp)))
                           (w3-expand-entity-at-point-maybe))

somewhere appropriate in babel.el.  Right after the
decode-coding-region form might be a good guess.

You'll need the w3 package installed.




More information about the XEmacs-Beta mailing list