alastair, this looks great! please continue.
as for lstreams, originally i wanted them not to escape because there were no
lisp accessors, and there may be [or might have been, conceivably] primitives
that could take lstreams as arguments and might [conceivably ...?] not work if
some strange lstream were passed in.
but i've actually been thinking of creating an lstream interface myself, for use
in creating arbitrary lisp coding systems [i want to extend the coding-system
interface to work not just with international encodings but to be able to handle
gzip, base64, md5, etc.].
[unfortunately, what you're working on now doesn't fit into this system because
the latter only deals with strings/streams of text or binary data and not
arbitrary lisp objects; although i can certainly see the usefulness of an
arbitrary lisp object converter, and it looks like that's exactly what you're
working on here. the coding system stuff would still be useful because it
includes various optimizations for working specifically with streams; but
eventually i would really like to see the interfaces merged. e.g. why couldn't
you `find-file' using a coding system that generated sound and image objects
mixed in with the text? that's exactly what modern html browsers do, in
essence. i suppose i should extend the coding system interface to allow text
marked up with extents; still ...]
i'm appending a rather raw writeup of my proposed lstream interface, with some
bits on extending the coding system mechanism. [this comes out of a massive
document of such proposals that martin and i sent to japan a few months ago as
part of the contract that he and i are getting from them. most of this is stuff
he transcribed from scribbled notes i faxed to him, since i can't type too well
any more but can still write more or less; and the rest of it i dictated to a
professional transcriptionist [with no technical knowledge, of course!], and was
cleaned up by martin. that's why it's so messy.]
if you're interested in implementing this lstream interface or something like
it, please go ahead! i've got my hands busy with mule work and merging of
existing workspaces into the code base for quite some time now.
btw when you have time you might want to extend your 'string' encoding to allow
encoding/decoding using a coding system, which would almost certainly be
required when the string contains non-ascii characters. [e.g. when sending text
to an x selection, `ctext' is required, and for windows, `mswindows-tstr'.]
also, you might consider adding funs that allow creating a user-defined
"encoding", instead of specifying the conversion functions directly.
ben
- Lisp Stream API
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Expose XEmacs internal lstreams to Lisp as stream objects. (In
addition to the functions given below, each stream object has
properties that can be associated with it using the standard put, get
etc. API. For GNU Emacs, where put and get have not been extended to
be general property functions, but work only on strings, we would have
to create functions set-stream-property, stream-property,
remove-stream-property, and stream-properties. These provide the same
functionality as the generic get, put, remprop, and object-plist
functions under XEmacs)
(Implement properties using a hash table, and *generalize* this so
that it is extremely easy to add a property interface onto any kind
of object)
(write-stream STREAM STRING)
Write the STRING to the STREAM. This will signal an error if all the
bytes cannot be written.
(read-stream STREAM &optional N SEQUENCE)
Reads data from STREAM. N specifies the number of bytes or
characters, depending on the stream. SEQUENCE specifies where to
write the data into. If N is not specified, data is read until end of
file. If SEQUENCE is not specified, the data is returned as a stream.
If SEQUENCE is specified, the SEQUENCE must be large enough to hold
the data.
(push-stream-marker STREAM)
returns ID, probably a stream marker object
(pop-stream-marker STREAM)
backs up stream to last marker
(unread-stream STREAM STRING)
The only valid STREAM is an input stream in which case the data in
STRING is pushed back and will be read ahead of all other data. In
general, there is no limit to the amount of data that can be unread or
the number of times that unread-stream can be called before another
read.
(stream-available-chars STREAM)
This returns the number of characters (or bytes) that can definitely
be read from the screen without an error. This can be useful, for
example, when dealing with non-blocking streams when an attempt to
read too much data will result in a blocking error.
(stream-seekable-p STREAM)
Returns true if the stream is seekable. If false, operations such as
seek-stream and stream-position will signal an error. However, the
functions set-stream-marker and seek-stream-marker will still succeed
for an input stream.
(stream-position STREAM)
If STREAM is a seekable stream, returns a position which can be passed
to seek-stream.
(seek-stream STREAM N)
If STREAM is a seekable stream, move to the position indicated by N,
otherwise signal an error.
(set-stream-marker STREAM)
If STREAM is an input stream, create a marker at the current position,
which can later be moved back to. The stream does not need to be a
seekable stream. In this case, all successive data will be buffered
to simulate the effect of a seekable stream. Therefore use this
function with care.
(seek-stream-marker STREAM marker)
Move the stream back to the position that was stored in the marker
object. (this is generally an opaque object of type stream-marker).
(delete-stream-marker MARKER)
Destroy the stream marker and if the stream is a non-seekable stream
and there are no other stream markers pointing to an earlier position,
frees up some buffering information.
(delete-stream STREAM N)
(delete-stream-marker STREAM ID)
(close-stream stream)
Writes any remaining data to the stream and closes it and the object
to which it's attached. This also happens automatically when the
stream is garbage collected.
(getchar-stream STREAM)
Return a single character from the stream. (This may be a single byte
depending on the nature of the stream). This is actually a macro with
an extremely efficient implementation (as efficient as you can get in
Emacs Lisp), so that this can be used without fear in a loop. The
implementation works by reading a large amount of data into a vector
and then simply using the function AREF to read characters one by one
from the vector. Because AREF is one of the primitives handled
specially by the byte interpreter, this will be very efficient. The
actual implementation may in fact use the function
call-with-condition-handler to avoid the necessity of checking for
overflow. Its typical implementation is to fetch the vector
containing the characters as a stream property, as well as the index
into that vector. Then it retrieves the character and increments the
value and stores it back in the stream. As a first implementation, we
check to see when we are reading the character whether the character
would be out of range. If so, we read another 4096 characters,
storing them into the same vector, setting the index back to the
beginning, and then proceeding with the rest of the getchar algorithm.
(putchar-stream STREAM CHAR)
This is similar to getchar-stream but it writes data instead of
reading data.
Function make-stream
There are actually two stream-creation functions, which are:
(make-input-stream TYPE PROPERTIES)
(make-output-stream TYPE PROPERTIES)
These can be used to create a stream that reads data, or writes data,
respectively. PROPERTIES is a property list and the allowable
properties in it are defined by the type. Possible types are:
(1) `file' (this reads data from a file or writes to a file)
Allowable properties are:
:file-name (the name of the file)
:create (for output streams only, creates the file if it doesn't
already exist)
:exclusive (for output streams only, fails if the file already
exists)
:append (for output streams only; starts appending to the end
of the file rather than overwriting the file)
:offset (positions in bytes in the file where reading or writing
should begin. If unspecified, defaults to the beginning of the
file or to the end of the file when :appended specified)
:count (for input streams only, the number of bytes to read from
the file before signaling "end of file". If nil or omitted, the
number of bytes is unlimited)
:non-blocking (if true, reads or writes will fail if the operation
would block. This only makes sense for non-regular files).
(2) `process' (For output streams only, send data to a process.)
Allowable properties are:
:process (the process object)
(3) `buffer' (Read from or write to a buffer.)
Allowable properties are:
:buffer (the name of the buffer or the buffer object.)
:start (the position to start reading from or writing to. If nil,
use the buffer point. If true, use the buffer's point and move
point beyond the end of the data read or written.)
:end (only for input streams, the position to stop reading at. If
nil, continue to the end of the buffer.)
:ignore-accessible (if true, the default for :start and :end
ignore any narrowing of the buffer.)
(4) `stream' (read from or write to a lisp stream)
Allowable properties are:
:stream (the stream object)
:offset (the position to begin to be reading from or writing to)
:length (For input streams only, the amount of data to read,
defaulting to the rest of the data in the string. Revise string
for output streams only if true, the stream is resized as
necessary to accommodate data written off the end, otherwise the
writes will fail.
(5) `memory' (For output only, writes data to an internal memory
buffer. This is more lightweight than using a Lisp buffer. The
function memory-stream-string can be used to convert the memory
into a string.)
(6) `debugging' (For output streams only, write data to the debugging
output.)
(7) `stream-device' (During non-interactive invocations only, Read
from or write to the initial stream terminal device.)
(8) `function' (For output streams only, send data by calling a
function, exactly as with the STREAM argument to the print
primitive.)
Allowable Properties are:
:function (the function to call. The function is called with one
argument, the stream.)
(9) `marker' (Write data to the location pointed to by a marker and
move the marker past the data.)
Allowable properties are:
:marker (the marker object.)
(10) `decoding' (As an input stream, reads data from another stream and
decodes it according to a coding system. As an output stream
decodes the data written to it according to a coding system and
then writes results in another stream.)
Properties are:
:coding-system (the symbol of coding system object, which defines the
decoding.)
:stream (the stream on the other end.)
(11) `encoding' (As an input stream, reads data from another stream and
encodes it according to a coding system. As an output stream
encodes the data written to it according to a coding system and
then writes results in another stream.)
Properties are:
:coding-system (the symbol of coding system object, which defines the
encoding.)
:stream (the stream on the other end.)
Consider
(define-stream-type 'type
:read-function
:write-function
:rewind-
:seek-
:tell-
(?:buffer)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Generalized Coding Systems
- Lisp API for Defining Coding Systems
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
User-defined coding systems.
(define-coding-system-type 'type
:encode-function FUN
:decode-function FUN
:detect-function FUN
:buffering (number = at least this many chars
line = buffer up to end of line
regexp = buffer until this regexp is found in match
source data. match data will be appropriate when fun is
called
encode fun is called as
(encode INSTREAM OUTSTREAM)
should read data from instream and write converted result onto
outstream. Can leave some data stuff in stream, it will reappear
next time. Generally, there is a finite amount of data in instream
and further attempts to read lead to would-block errors or retvals.
Can use instream properties to record state. May use read-stream
functionality to read everything into a vector or string.
->Need vectors + string exposed to resizing of Lisp implementation
where necessary.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"Alastair J. Houghton" wrote:
Why does it say
/* #define CHECK_LSTREAM(x) CHECK_RECORD (x, lstream)
Lstream pointers should never escape to the Lisp level, so
functions should not be doing this. */
in lstream.h? The reason I'm wondering is that I'd like to
make my encode-binary and decode-binary functions work
with arbitrary output sinks/input sources, so the obvious
implementation is to create a suitable Lstream within the
Lisp-visible functions. The trouble is that I want the
interface to the functions to include the facility to add
user-defined conversions, which means that a Lisp function
may have to accept an Lstream parameter... so I'm wondering
whether there was any reason for this comment ;-)
Just in case you're interested, here's the interface I'm
proposing (there'll be an additional encode-binary-string
function that works in an efficient way). The STREAM parameter
could accept any Lisp object for which an Lstream can be
created.
DEFUN ("encode-binary-stream", Fencode_binary_stream, 3, 3, 0, /*
Encode the sequence DATA into a binary STREAM using the specified
binary FORMAT vector. Each element of the FORMAT vector should either
be a symbol, or a list of the form (SYMBOL PARAMETER...). SYMBOL may be
one of
binary string bit-vector integer float space vector
or alternatively the name of a Lisp function that will be called with the
remaining data, the output stream and a list of PARAMETER values as its
arguments. i.e. it's declaration should look something like the following
(defun my-conversion data stream parameter-list ... )
and it will be called using
(my-conversion data stream parameter-list)
Such a function should return the remaining data after it has consumed
whatever it required.
The built-in encodings support the following parameters:
Encoding Parameters
binary :length
string :length :pad :terminator
bit-vector :length :direction
integer :length :signed :direction
float :length :format :direction
space :length
vector FORMAT-ELT :length :pack
where
FORMAT-ELT is anything that could be an element of the FORMAT parameter.
:length is followed by a length in bytes (or in bits for bit-vector).
:pad is followed by a character used to pad the string to the
specified length.
:terminator is followed by a character used to terminate the string.
:direction is followed by one of `big-endian', `little-endian', `host'
or `network'.
:signed is followed by t or nil.
:format is followed by `native'. Conversions to other floating point
formats are currently not supported.
:pack is followed by an integer specifying the vector stride
(e.g. the format [(vector (integer :length 2) :pack 4)]
represents an array of 16-bit integers, but with a gap
of 2 bytes between successive elements).
The function returns a string containing the raw binary data. */
(format, data, stream))
DEFUN ("decode-binary-stream", Fdecode_binary_stream, 2, 2, 0, /*
Decode the specified STREAM using the binary FORMAT vector. See
`encode-binary-stream' for more information on the FORMAT vector;
note however that user-defined conversion functions should be declared
as
(defun my-conversion stream parameter-list ...)
and should return the data they have converted. */
(format, stream))
Kind Regards,
Alastair.
____________________________________________________________
Alastair Houghton ajhoughton(a)lineone.net
--
Ben
In order to save my hands, I am cutting back on my mail. I also write
as succinctly as possible -- please don't be offended. If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail. If you need an immediate response and it is not
apparent in your message, please say so. Thanks for your understanding.
See also
http://www.666.com/ben/chronic-pain/