Mats Lidell writes:
There is a tool, godef, that for example, can print info about an
identifier in a go program. You can select the identifier by giving an
offset for its location in the file given in bytes. Not characters,
bytes. Godef can also read the program on stdin and then the offset
for the interesting identifier is also in bytes.
That's a horrible API. Boy, am I glad I don't work on that code!
*Both* godef *and* GNU Emacs!
In any case, `position-bytes' cannot work in XEmacs (ever!) because
XEmacs uses an internal encoding that is used by nothing else in the
world. This is true in principle for Emacs (its internal encoding is
not Unicode compatible), but it should be good enough almost all the
time in the case that a language defines its input as UTF-8, since
Emacs's internal encoding is UTF-8.
Or we could adjust to the world outside and implement some way for
lisp to know, calculate!?, the underlying byte implementation for each
encoding.
We already know how to do that. It has to be calculated, of course.
(defun xemacs-call-godef (buffer bytepos)
"Implements the subprocess stuff to call godef program.
Stream BUFFER with 'binary encoding to godef."
...)
(defun xemacs-godef (&optional point-before-identifier)
"Call this function in the source buffer."
(let ((pbi (or point-before-identifier (point)))
((work (clone-buffer (current-buffer))))
(pbi-bytes (encode-coding-region (point-min) pbi 'utf-8 work)))
(encode-coding-region pbi-bytes (point-max) 'utf-8 work)
(xemacs-call-godef work pbi-bytes)))
is the basic idea. Possibly it would be a better idea to keep the
work buffer around for multiple queries, but that depends on how
frequent the queries are compared to editing the source buffer.
Trying to keep the two synchronized would be horrible, so if the
source buffer is edited you pretty much have to redo from scratch. I
suppose we could also implement like this:
(defun xemacs-call-godef (bytepos &optional buffer)
"Implements the subprocess stuff to call godef program.
Stream BUFFER with 'utf-8 encoding to godef."
...)
(defun xemacs-godef (&optional point-before-identifier)
"Call this function in the source buffer."
(let ((pbi (or point-before-identifier (point)))
((work (clone-buffer (current-buffer))))
(pbi-bytes (encode-coding-region (point-min) pbi 'utf-8 work)))
(xemacs-call-godef pbi-bytes)))
It's not clear how much work this saves. The first version encodes
the whole source buffer to the work buffer using 'utf-8, then encodes
the work buffer to the subprocess using 'binary. The second encodes
the first part of the source buffer to the work buffer using 'utf-8,
then does it again using 'utf-8 to encode the whole buffer to the
subprocess. UTF-8 encoding is slightly more expensive than binary
encoding.
If we implement fixed width buffers (see PEP 393[1] for Python's
implementation), then encoding binary will be a no-op, and UTF-8
encoding will be much more expensive.
Steve
Footnotes:
[1]
http://www.python.org/dev/peps/pep-0393/
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta