Re: GNU Emacs position-bytes?

Monday, 11 March 2013

        Mats Lidell writes:

...
 There is a tool, godef, that for example, can print info about an
 identifier in a go program. You can select the identifier by giving an
 offset for its location in the file given in bytes. Not characters,
 bytes. Godef can also read the program on stdin and then the offset
 for the interesting identifier is also in bytes. 
That's a horrible API.  Boy, am I glad I don't work on that code!
*Both* godef *and* GNU Emacs!

In any case, `position-bytes' cannot work in XEmacs (ever!) because
XEmacs uses an internal encoding that is used by nothing else in the
world.  This is true in principle for Emacs (its internal encoding is
not Unicode compatible), but it should be good enough almost all the
time in the case that a language defines its input as UTF-8, since
Emacs's internal encoding is UTF-8.

...
 Or we could adjust to the world outside and implement some way for
 lisp to know, calculate!?, the underlying byte implementation for each
 encoding. 
We already know how to do that.  It has to be calculated, of course.

(defun xemacs-call-godef (buffer bytepos)
  "Implements the subprocess stuff to call godef program.
Stream BUFFER with 'binary encoding to godef."
  ...)

(defun xemacs-godef (&optional point-before-identifier)
  "Call this function in the source buffer."
  (let ((pbi (or point-before-identifier (point)))
        ((work (clone-buffer (current-buffer))))
        (pbi-bytes (encode-coding-region (point-min) pbi 'utf-8 work)))
    (encode-coding-region pbi-bytes (point-max) 'utf-8 work)
    (xemacs-call-godef work pbi-bytes)))

is the basic idea.  Possibly it would be a better idea to keep the
work buffer around for multiple queries, but that depends on how
frequent the queries are compared to editing the source buffer.
Trying to keep the two synchronized would be horrible, so if the
source buffer is edited you pretty much have to redo from scratch.  I
suppose we could also implement like this:

(defun xemacs-call-godef (bytepos &optional buffer)
  "Implements the subprocess stuff to call godef program.
Stream BUFFER with 'utf-8 encoding to godef."
  ...)

(defun xemacs-godef (&optional point-before-identifier)
  "Call this function in the source buffer."
  (let ((pbi (or point-before-identifier (point)))
        ((work (clone-buffer (current-buffer))))
        (pbi-bytes (encode-coding-region (point-min) pbi 'utf-8 work)))
    (xemacs-call-godef pbi-bytes)))

It's not clear how much work this saves.  The first version encodes
the whole source buffer to the work buffer using 'utf-8, then encodes
the work buffer to the subprocess using 'binary.  The second encodes
the first part of the source buffer to the work buffer using 'utf-8,
then does it again using 'utf-8 to encode the whole buffer to the
subprocess.  UTF-8 encoding is slightly more expensive than binary
encoding.

If we implement fixed width buffers (see PEP 393[1] for Python's
implementation), then encoding binary will be a no-op, and UTF-8
encoding will be much more expensive.

Steve

Footnotes: 
[1]  http://www.python.org/dev/peps/pep-0393/

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: GNU Emacs position-bytes?