[PATCH] Add #'parse-integer to data.c

Saturday, 14 March 2015

I wrote the below because I am encountering the non-ASCII decimal digits in my
other work, and the CL function provides a useful way to make them available
to my code. (The non-ASCII decimal digits are actually in a separate patch.) I
didn’t go straight ahead and commit it because, well, not everyone likes the
CL spec as much as I do, maybe I’m going overboard.

That said, I notice GNU now has a cl-parse-integer, written to the CL
definition. And our code is full of (string-to-number (match-string 1 string)
16), consing where that shouldn’t be necessary.

# HG changeset patch
# User Aidan Kehoe <kehoea(a)parhasard.net&gt;
# Date 1424864832 0
#      Wed Feb 25 11:47:12 2015 +0000
# Node ID 750fab17b299ae7f163b2c3b3f6a954d9f0f1320
# Parent  83e5c3cd6be65c951cba14988d62752977104a30
Make #'parse-integer Lisp-visible, extend it, allowing non-ASCII digits.

src/ChangeLog addition:

2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* lread.c (read_atom): Use the new calling convention for
	parse_integer().
	* lisp.h: Change the declaration of parse_integer ().
	* number.h (bignum_set_emacs_int, make_bignum_emacs_uint):
	New #defines, used in data.c.
	* lread.c (read_integer): Ditto.
	* lread.c (read1): Ditto.
	* data.c (find_highest_value): New.
	* data.c (fill_ichar_array): New.
	* data.c (build_fixnum_to_char_map): New.
	* data.c (Fset_digit_fixnum_map): New.
	* data.c (Fdigit_char_p): Moved from cl-extra.el.
	* data.c (Fdigit_char): Moved from cl-extra.el.
	* data.c (parse_integer): Moved from lread.c.
	* data.c (Fparse_integer): Made available to Lisp.
	* data.c (syms_of_data): Make the new subrs available.
	* data.c (vars_of_data): Make the new vars available.
	Expose parse_integer to Lisp, make it follow the Common Lisp API
	(with some extensions, to allow us to support non ASCII digit
	characters).

lisp/ChangeLog addition:

2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* cl-extra.el (digit-char-p): Moved to data.c.
	* cl-extra.el (digit-char): Moved to data.c.

tests/ChangeLog addition:

2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* automated/lisp-tests.el:
	parse_integer(), used in #'read, now signals invalid-argument
	rather than invalid-read-syntax, check for that.
	* automated/lisp-tests.el:
	Check #'parse-integer now it's available to Lisp, check
	#'digit-char, #'digit-char-p and the congruence in behaviour,
	check the XEmacs-specific RADIX-TABLE argument behaviour.

diff -r 83e5c3cd6be6 -r 750fab17b299 lisp/ChangeLog
--- a/lisp/ChangeLog	Sat Jan 10 19:43:28 2015 +0900
+++ b/lisp/ChangeLog	Wed Feb 25 11:47:12 2015 +0000
＠＠ -1,3 +1,8 ＠＠
+2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* cl-extra.el (digit-char-p): Moved to data.c.
+	* cl-extra.el (digit-char): Moved to data.c.
+	
 2014-12-31  Michael Sperber  <mike(a)xemacs.org&gt;

 	* simple.el (line-move): Add `noerror' optional argument, as in
diff -r 83e5c3cd6be6 -r 750fab17b299 lisp/cl-extra.el
--- a/lisp/cl-extra.el	Sat Jan 10 19:43:28 2015 +0900
+++ b/lisp/cl-extra.el	Wed Feb 25 11:47:12 2015 +0000
＠＠ -746,32 +746,6 ＠＠
     (char>= . "Return t if the character arguments are monotonically \
 nonincreasing.")))

-(defun* digit-char-p (character &optional (radix 10))
-  "Return non-nil if CHARACTER represents a digit in base RADIX.
-
-RADIX defaults to ten.  The actual non-nil value returned is the integer
-value of the character in base RADIX."
-  (check-type character character)
-  (check-type radix integer)
-  (if (<= radix 10)
-      (and (<= ?0 character (+ ?0 radix -1)) (- character ?0))
-    (or (and (<= ?0 character ?9) (- character ?0))
-	(and (<= ?a character (+ ?a (setq radix (- radix 11))))
-	     (+ character (- 10 ?a)))
-	(and (<= ?A character (+ ?A radix))
-	     (+ character (- 10 ?A))))))
-
-(defun* digit-char (weight &optional (radix 10))
-  "Return a character representing the integer WEIGHT in base RADIX.
-
-RADIX defaults to ten.  If no such character exists, return nil."
-  (check-type weight integer)
-  (check-type radix integer)
-  (and (natnump weight) (< weight radix)
-       (if (< weight 10)
-	   (int-char (+ ?0 weight))
-	 (int-char (+ ?A (- weight 10))))))
-
 (defun alpha-char-p (character)
   "Return t if CHARACTER is alphabetic, in some alphabet.

diff -r 83e5c3cd6be6 -r 750fab17b299 src/ChangeLog
--- a/src/ChangeLog	Sat Jan 10 19:43:28 2015 +0900
+++ b/src/ChangeLog	Wed Feb 25 11:47:12 2015 +0000
＠＠ -1,3 +1,26 ＠＠
+2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* lread.c (read_atom): Use the new calling convention for
+	parse_integer(). 
+	* lisp.h: Change the declaration of parse_integer ().
+	* number.h (bignum_set_emacs_int, make_bignum_emacs_uint):
+	New #defines, used in data.c.
+	* lread.c (read_integer): Ditto.
+	* lread.c (read1): Ditto.
+	* data.c (find_highest_value): New.
+	* data.c (fill_ichar_array): New.
+	* data.c (build_fixnum_to_char_map): New.
+	* data.c (Fset_digit_fixnum_map): New.
+	* data.c (Fdigit_char_p): Moved from cl-extra.el.
+	* data.c (Fdigit_char): Moved from cl-extra.el.
+	* data.c (parse_integer): Moved from lread.c.
+	* data.c (Fparse_integer): Made available to Lisp.
+	* data.c (syms_of_data): Make the new subrs available.
+	* data.c (vars_of_data): Make the new vars available.
+	Expose parse_integer to Lisp, make it follow the Common Lisp API
+	(with some extensions, to allow us to support non ASCII digit
+	characters).
+
 2015-01-08  Stephen J. Turnbull  <stephen(a)xemacs.org&gt;

 	Fix progress bar crashes.
diff -r 83e5c3cd6be6 -r 750fab17b299 src/data.c
--- a/src/data.c	Sat Jan 10 19:43:28 2015 +0900
+++ b/src/data.c	Wed Feb 25 11:47:12 2015 +0000
＠＠ -31,6 +31,7 ＠＠
 #include "gc.h"
 #include "syssignal.h"
 #include "sysfloat.h"
+#include "syntax.h"

 Lisp_Object Qnil, Qt, Qlambda, Qunbound;
 Lisp_Object Qerror_conditions, Qerror_message;
＠＠ -65,6 +66,9 ＠＠

 Lisp_Object Qerror_lacks_explanatory_string;
 Lisp_Object Qfloatp;
+Lisp_Object Q_junk_allowed,  Q_radix, Q_radix_table;
+
+Lisp_Object Vdigit_fixnum_map, Vfixnum_to_char_map;

 Fixnum Vmost_negative_fixnum, Vmost_positive_fixnum;

＠＠ -1432,8 +1436,583 ＠＠
     }
 #endif /* HAVE_BIGNUM */
 }
+
+static int
+find_highest_value (struct chartab_range * range, Lisp_Object UNUSED (table),
+                    Lisp_Object val, void *extra_arg)
+{
+  Lisp_Object *highest_pointer = (Lisp_Object *) extra_arg;
+  Lisp_Object max_seen = *highest_pointer;
+
+  CHECK_FIXNUM (val);
+  if (range->type != CHARTAB_RANGE_CHAR)
+    {
+      invalid_argument ("Not an appropriate char table range", Qunbound);
+    }
+
+  if (XFIXNUM (max_seen) < XFIXNUM (val))
+    {
+      *highest_pointer = val;
+    }
+
+  return 0;
+}
+
+static int
+fill_ichar_array (struct chartab_range *range, Lisp_Object UNUSED (table),
+                  Lisp_Object val, void *extra_arg)
+{
+  Ichar *cctable = (Ichar *) extra_arg;
+  EMACS_INT valint = XFIXNUM (val);
+
+  /* Save the value if it hasn't been seen yet. */
+  if (-1 == cctable[valint])
+    {
+      cctable[valint] = range->ch;
+    }
+  else
+    {
+      /* Otherwise, save it if the existing value is not uppercase, and this
+	 one is. Use the standard case table rather than any buffer-specific
+	 one because a) this can be called early before current_buffer is
+	 available and b) it's better to have these independent of particular
+	 buffer case tables. */
+      if (current_buffer != NULL && UPCASE (0, range->ch) == range->ch
+          && UPCASE (0, cctable[valint]) != cctable[valint])
+	{
+	  cctable[valint] = range->ch;
+	}
+      /* Maybe our own case infrastructure is not available yet. Use the C
+         library's. */
+      else if (isupper (range->ch) && !isupper (cctable[valint]))
+        {
+	  cctable[valint] = range->ch;          
+        }
+      /* Otherwise, save it if this character has a numerically lower value
+         (preferring ASCII over fullwidth Chinese and so on). */
+      else if (range->ch < cctable[valint])
+	{
+	  cctable[valint] = range->ch;
+	}
+    }
+
+  return 0;
+}
+
+static Lisp_Object
+build_fixnum_to_char_map (Lisp_Object radix_table)
+{
+  Lisp_Object highest_value, result;
+  struct chartab_range ctr = { CHARTAB_RANGE_ALL, 0, Qnil, 0 };
+  Ichar *cctable;
+  EMACS_INT ii, cclen;
+  Ibyte *data;
+
+  /* What's the greatest fixnum value seen? In passing, check all the char
+     table values are fixnums. */
+  CHECK_FIXNUM (XCHAR_TABLE (radix_table)->default_);
+  highest_value = XFIXNUM (XCHAR_TABLE (radix_table)->default_);
+  map_char_table (radix_table, &ctr, find_highest_value, &highest_value);
+  cclen = XFIXNUM (highest_value) + 1;
+
+  cctable = malloc (sizeof (Ichar) * cclen);
+  if (cctable == NULL)
+    {
+      out_of_memory ("Could not allocate data for `digit-char'",
Qunbound);
+    }
+
+  for (ii = 0; ii < cclen; ++ii)
+    {
+      cctable[ii] = (Ichar) -1;
+    }
+
+  map_char_table (radix_table, &ctr, fill_ichar_array, cctable);
+
+  for (ii = 0; ii < cclen; ++ii)
+    {
+      if (cctable[ii] < 0)
+	{
+	  free (cctable);
+	  invalid_argument ("No digit specified for weight", make_fixnum (ii));
+	}
+    }
+
+  result = Fmake_string (make_fixnum (cclen * MAX_ICHAR_LEN), make_char (0));
+
+  data = XSTRING_DATA (result);
+  for (ii = 0; ii < cclen; ii++)
+    {
+      set_itext_ichar (data + (MAX_ICHAR_LEN * ii), cctable[ii]);
+    }
+
+  init_string_ascii_begin (result);
+  bump_string_modiff (result);
+  sledgehammer_check_ascii_begin (result);
+
+  free (cctable);
+
+  return result;
+}
+
+DEFUN ("set-digit-fixnum-map", Fset_digit_fixnum_map, 1, 1, 0, /*
+Set the value of `digit-fixnum-map', which see.
+
+Also check that RADIX-TABLE is well-formed from the perspective of
+`parse-integer' and `digit-char-p', and create an internal inverse mapping
+for `digit-char', so that all three functions behave consistently.
+
+RADIX-TABLE itself is not saved, a read-only copy of it is made and returned.
+*/
+       (radix_table))
+{
+  Lisp_Object ftctable = Qnil;
+
+  CHECK_CHAR_TABLE (radix_table);
+
+  /* Create a table for `digit-char', checking the consistency of
+     radix_table while doing so. */
+  ftctable = build_fixnum_to_char_map (radix_table);
+
+  Vdigit_fixnum_map = Fcopy_char_table (radix_table);
+  LISP_READONLY (Vdigit_fixnum_map) = 1;
+  Vfixnum_to_char_map = ftctable;
+
+  return Vdigit_fixnum_map;
+}
+
+DEFUN ("digit-char-p", Fdigit_char_p, 1, 3, 0, /*
+Return non-nil if CHARACTER represents a digit in base RADIX.
+
+RADIX defaults to ten.  The actual non-nil value returned is the integer
+value of the character in base RADIX.
+
+RADIX-TABLE, if non-nil, is a character table describing characters' numeric
+values. See `parse-integer' and `digit-fixnum-map'.
+*/
+       (character, radix, radix_table))
+{
+  Lisp_Object got = Qnil;
+  EMACS_INT radixing, val;
+  Ichar cc;
+
+  CHECK_CHAR (character);
+  cc = XCHAR (character);
+
+  if (!NILP (radix))
+    {
+      check_integer_range (radix, Qzero,
+                           NILP (radix_table) ?
+                           /* If we are using the default radix table, the
+                              maximum possible value for the radix is
+                              available to us now. */
+                           make_fixnum
+                           (XSTRING_LENGTH (Vfixnum_to_char_map)
+                            / MAX_ICHAR_LEN)
+                           /* Otherwise, calculating that is expensive. Check
+                              at least that the radix is not a bignum, the
+                              maximum count of characters available will not
+                              exceed the size of a fixnum. */
+: make_fixnum (MOST_POSITIVE_FIXNUM));
+      radixing = XFIXNUM (radix);
+    }
+  else
+    {
+      radixing = 10;
+    }
+
+  if (NILP (radix_table))
+    {
+      radix_table = Vdigit_fixnum_map;
+    }
+
+  got = get_char_table (cc, radix_table);
+  CHECK_FIXNUM (got);
+  val = XFIXNUM (got);
+
+  if (val < 0 || val >= radixing)
+    {
+      return Qnil;
+    }
+
+  return make_fixnum (val);
+}
+
+DEFUN ("digit-char", Fdigit_char, 1, 3, 0, /*
+Return a character representing the integer WEIGHT in base RADIX.
+
+RADIX defaults to ten.  If no such character exists, return nil. `digit-char'
+prefers an upper case character if available.  RADIX must be a non-negative
+integer of value less than the maximum value in RADIX-TABLE.
+
+RADIX-TABLE, if non-nil, is a character table describing characters' numeric
+values. It defaults to the value of `digit-fixnum-map'; see the documentation
+for that variable and for `parse-integer'. This is not specified by Common
+Lisp, and using a value other than the default in `digit-char' is expensive,
+since the inverse map needs to be calculated.
+*/
+       (weight, radix, radix_table))
+{
+  EMACS_INT radixing = 10, weighting;
+  Lisp_Object fixnum_to_char_table = Qnil;
+  Ichar cc;
+
+  CHECK_NATNUM (weight);
+
+  if (!NILP (radix_table) && !EQ (radix_table, Vdigit_fixnum_map))
+    {
+      CHECK_CHAR_TABLE (radix_table);
+      /* The result of this isn't GCPROd, but the rest of this function
+	 won't GC and continue. */
+      fixnum_to_char_table = build_fixnum_to_char_map (radix_table);
+    }
+  else
+    {
+      fixnum_to_char_table = Vfixnum_to_char_map;
+    }
+
+  if (!NILP (radix))
+    {
+      check_integer_range (radix, Qzero,
+                           make_fixnum (XSTRING_LENGTH (fixnum_to_char_table)
+                                        / MAX_ICHAR_LEN));
+      radixing = XFIXNUM (radix);
+    }
+
+  /* If weight is in its canonical form (and there's no reason to think it
+     isn't), Vfixnum_to_char_map can't be long enough to handle
+     this. */
+  if (BIGNUMP (weight))
+    {
+      return Qnil;
+    }
+
+  weighting = XFIXNUM (weight);
+
+  if (weighting < radixing)
+    {
+      cc = itext_ichar (XSTRING_DATA (fixnum_to_char_table)
+			+ MAX_ICHAR_LEN * weighting);
+      return make_char (cc);
+    }
+
+  return Qnil;
+}
+
+Lisp_Object
+parse_integer (const Ibyte *buf, Ibyte **buf_end_out, Bytecount len,
+               EMACS_INT base, Boolint junk_allowed, Lisp_Object radix_table)
+{
+  const Ibyte *lim = buf + len, *p = buf;
+  EMACS_UINT num = 0, onum = (EMACS_UINT) -1;
+  EMACS_UINT fixnum_limit = MOST_POSITIVE_FIXNUM;
+  EMACS_INT cint = 0;
+  Boolint negativland = 0;
+  Ichar c = -1;
+  Lisp_Object result = Qnil, got = Qnil;
+
+  if (NILP (radix_table))
+    {
+      radix_table = Vdigit_fixnum_map;
+    }
+
+  /* This function ignores the current buffer's syntax table.
+     Respecting it will probably introduce more bugs than it fixes. */
+  update_mirror_syntax_if_dirty (XCHAR_TABLE (Vstandard_syntax_table)->
+                                 mirror_table);
+
+  /* Ignore leading whitespace, if that leading whitespace has no
+     numeric value. */
+  while (p < lim)
+    {
+      c = itext_ichar (p);
+      if (!(((got = get_char_table (c, radix_table), FIXNUMP (got))
+             && ((cint = XFIXNUM (got), cint < 0) || cint >= base))
+            && (SYNTAX (XCHAR_TABLE (Vstandard_syntax_table)->mirror_table,
+                        c) == Swhitespace)))
+        {
+          break;
+        }
+
+      INC_IBYTEPTR (p);
+    }
+
+  /* Drop sign information if appropriate. */
+  if (c == '-')
+    {
+      negativland = 1;
+      fixnum_limit = - MOST_NEGATIVE_FIXNUM;
+      INC_IBYTEPTR (p);
+    }
+  else if (c == '+')
+    {
+      got = get_char_table (c, radix_table);
+      cint = FIXNUMP (got) ? XFIXNUM (got) : -1;
+      /* If ?+ has no integer weight, drop it. */
+      if (cint < 0 || cint >= base)
+        {
+          INC_IBYTEPTR (p);
+        }
+    }
+
+  while (p < lim)
+    {
+      c = itext_ichar (p);
+      
+      got = get_char_table (c, radix_table);
+      if (!FIXNUMP (got))
+        {
+          goto loser;
+        }
+
+      cint = XFIXNUM (got);
+
+      if (cint < 0 || cint >= base)
+        {
+          goto loser;
+        }
+
+      onum = num;
+      num *= base;
+      if (num > fixnum_limit)
+        {
+          goto overflow;
+        }
+
+      num += cint;
+      if (num > fixnum_limit)
+        {
+          goto overflow;
+        }
+
+      INC_IBYTEPTR (p);
+    }
+
+  if (onum == (EMACS_UINT) -1)
+    {
+      /* No digits seen, we may need to error. */
+      goto loser;
+    }
+
+  if (negativland)
+    {
+      result = make_fixnum (- (EMACS_INT) num);
+    }
+  else
+    {
+      result = make_fixnum (num);
+    }
+
+  *buf_end_out = (Ibyte *) p;
+  return result;
+
+ overflow:
+#ifndef HAVE_BIGNUM
+  return Fsignal (Qinvalid_argument,
+                  list3 (build_msg_string ("Integer constant overflow"),
+                         make_string (buf, len), make_fixnum (base)));
+
+#else /* HAVE_BIGNUM */
+  result = make_bignum_emacs_uint (onum);
+
+  bignum_set_emacs_int (scratch_bignum, base);
+  bignum_set_emacs_int (scratch_bignum2, cint);
+  bignum_mul (XBIGNUM_DATA (result), XBIGNUM_DATA (result), scratch_bignum);
+  bignum_add (XBIGNUM_DATA (result), XBIGNUM_DATA (result), scratch_bignum2);
+  INC_IBYTEPTR (p);
+
+  assert (!bignum_fits_emacs_int_p (XBIGNUM_DATA (result))
+          || (fixnum_limit
+              < (EMACS_UINT) bignum_to_emacs_int (XBIGNUM_DATA (result))));
+
+  while (p < lim)
+    {
+      c = itext_ichar (p);    
+
+      got = get_char_table (c, radix_table);
+      if (!FIXNUMP (got))
+	{
+	  goto loser;
+	}
+
+      cint = XFIXNUM (got);
+      if (cint < 0 || cint >= base)
+	{
+	  goto loser;
+	}
+
+      bignum_set_emacs_int (scratch_bignum2, cint);
+      bignum_mul (XBIGNUM_DATA (result), XBIGNUM_DATA (result),
+                  scratch_bignum);
+      bignum_add (XBIGNUM_DATA (result), XBIGNUM_DATA (result),
+                  scratch_bignum2);
+
+      INC_IBYTEPTR (p);
+    }
+
+  if (negativland)
+    {
+      bignum_set_long (scratch_bignum, -1L);
+      bignum_mul (XBIGNUM_DATA (result), XBIGNUM_DATA (result),
+                  scratch_bignum);
+    }
+
+  *buf_end_out = (Ibyte *)  p;
+  return result;
+#endif /* HAVE_BIGNUM */
+ loser:
+
+  if (p < lim && !junk_allowed)
+    {
+      /* JUNK-ALLOWED is zero. If we have stopped parsing because we
+	 encountered whitespace, then we need to check that the rest if the
+	 string is whitespace and whitespace alone if we are not to error.
+
+         Perhaps surprisingly, if JUNK-ALLOWED is zero, the parse is regarded
+         as including the trailing whitespace, so the second value returned is
+         always the length of the string. */
+      while (p < lim)
+	{
+	  c = itext_ichar (p);
+	  if (!(SYNTAX (XCHAR_TABLE (Vstandard_syntax_table)->mirror_table, c)
+                == Swhitespace))
+	    {
+	      break;
+	    }
+
+	  INC_IBYTEPTR (p);
+	}
+    }
+
+  *buf_end_out = (Ibyte *) p;
+
+  if (junk_allowed || (p == lim && onum != (EMACS_UINT) -1))
+    {
+
+#ifdef HAVE_BIGNUM
+      if (!NILP (result))
+        {
+          /* Bignum terminated by whitespace or by non-digit. */
+          return Fcanonicalize_number (result);
+        }
+#endif
+
+      if (onum == (EMACS_UINT) -1)
+        {
+          /* No integer digits seen, but junk allowed, so no indication to
+             error. Return nil. */
+          return Qnil;
+        }
+
+      if (negativland)
+        {
+          assert ((- (EMACS_INT) num) >= MOST_NEGATIVE_FIXNUM);
+          result = make_fixnum (- (EMACS_INT) num);
+        }
+      else
+        {
+          assert ((EMACS_INT) num <= MOST_POSITIVE_FIXNUM);
+          result = make_fixnum (num);
+        }
+    
+      return result;
+    }
+    
+  return Fsignal (Qinvalid_argument,
+                  list3 (build_msg_string ("Invalid integer syntax"),
+                         make_string (buf, len), make_fixnum (base)));
+}
+
+DEFUN ("parse-integer", Fparse_integer, 1, MANY, 0, /*
+Parse and return the integer represented by STRING using RADIX.
+
+START and END are bounding index designators, as used in `remove*'.  START
+defaults to 0 and END defaults to nil, meaning the end of STRING.
+
+If JUNK-ALLOWED is nil, error if STRING does not consist in its entirety of
+the representation of an integer, with or without surrounding whitespace
+characters.
+
+If RADIX-TABLE is non-nil, it is a char table mapping from characters to
+fixnums used with RADIX. Otherwise, `digit-fixnum-map' provides the
+correspondence to use.
+
+RADIX must always be a non-negative fixnum.  RADIX-TABLE constrains its
+possible values further, and the maximum RADIX available is always the largest
+positive value available RADIX-TABLE.
+
+arguments: (STRING &key (START 0) end (RADIX 10) junk-allowed radix-table)
+*/
+       (int nargs, Lisp_Object *args))
+{
+  Lisp_Object string = args[0], result;
+  Charcount starting = 0, ending = MOST_POSITIVE_FIXNUM + 1, ii = 0;
+  Bytecount byte_len;
+  Ibyte *startp, *cursor, *end_read, *limit, *saved_start;
+  EMACS_INT radixing;
+
+  PARSE_KEYWORDS (Fparse_integer, nargs, args, 5,
+                  (start, end, radix, junk_allowed, radix_table),
+                  (start = Qzero, radix = make_fixnum (10)));
+
+  CHECK_STRING (string);
+  CHECK_NATNUM (start);
+  starting = BIGNUMP (start) ? 1 + MOST_POSITIVE_FIXNUM : XFIXNUM (start); 
+  if (!NILP (end))
+    {
+      CHECK_NATNUM (end);
+      ending = BIGNUMP (end) ? 1 + MOST_POSITIVE_FIXNUM : XFIXNUM (end);
+    }
+
+  if (!NILP (radix_table))
+    {
+      CHECK_CHAR_TABLE (radix_table);
+    }
+  else
+    {
+      radix_table = Vdigit_fixnum_map;
+    }
+
+  check_integer_range (radix, Qzero,
+                       EQ (radix_table, Vdigit_fixnum_map) ?
+                       make_fixnum (XSTRING_LENGTH (Vfixnum_to_char_map)
+                                    / MAX_ICHAR_LEN)
+                       /* Non-default radix table; calculating the upper limit
+                          is is expensive. Check at least that the radix is
+                          not a bignum, the maximum count of characters
+                          available in our XEmacs will not exceed the size of
+                          a fixnum. */
+: make_fixnum (MOST_POSITIVE_FIXNUM));
+  radixing = XFIXNUM (radix);
+
+  startp = cursor = saved_start = XSTRING_DATA (string);
+  byte_len = XSTRING_LENGTH (string);
+  limit = startp + byte_len;
+
+  while (cursor < limit && ii < ending)
+    {
+      INC_IBYTEPTR (cursor);
+      if (ii < starting)
+        {
+          startp = cursor;
+        }
+      ii++;
+    }
+
+  if (ii < starting || (ii < ending && !NILP (end)))
+    {
+      check_sequence_range (string, start, end, Flength (string));
+    }
+
+  result = parse_integer (startp, &end_read, cursor - startp, radixing,
+                          !NILP (junk_allowed), radix_table);
+
+  /* This code hasn't been written to handle relocating string data. */ 
+  assert (saved_start == XSTRING_DATA (string));
+
+  return values2 (result, make_fixnum (string_index_byte_to_char
+                                       (string, end_read - saved_start)));
+}

-
 DEFUN ("+", Fplus, 0, MANY, 0, /*
 Return sum of any number of arguments.
 The arguments should all be numbers, characters or markers.
＠＠ -3539,6 +4118,10 ＠＠
   DEFSYMBOL_MULTIWORD_PREDICATE (Qweak_listp);
   DEFSYMBOL (Qfloatp);

+  DEFKEYWORD (Q_radix);
+  DEFKEYWORD (Q_junk_allowed);
+  DEFKEYWORD (Q_radix_table);
+
   DEFSUBR (Fwrong_type_argument);

 #ifdef HAVE_RATIO
＠＠ -3595,6 +4178,10 ＠＠

   DEFSUBR (Fnumber_to_string);
   DEFSUBR (Fstring_to_number);
+  DEFSUBR (Fset_digit_fixnum_map);
+  DEFSUBR (Fdigit_char_p);
+  DEFSUBR (Fdigit_char);
+  DEFSUBR (Fparse_integer);
   DEFSUBR (Feqlsign);
   DEFSUBR (Flss);
   DEFSUBR (Fgtr);
＠＠ -3659,6 +4246,53 ＠＠
 */);
   Vmost_positive_fixnum = MOST_POSITIVE_FIXNUM;

+  DEFVAR_CONST_LISP ("digit-fixnum-map", &Vdigit_fixnum_map /*
+Table used to determine a character's numeric value when parsing.
+
+This is a character table with fixnum values. A value of -1 indicates this
+character does not have an assigned numeric value. See `parse-integer',
+`digit-char-p', and `digit-char'.
+*/);
+  Vdigit_fixnum_map = Fmake_char_table (Qgeneric);
+  set_char_table_default (Vdigit_fixnum_map, make_fixnum (-1));
+  {
+    int ii = 0;
+
+    for (ii = 0; ii < 10; ++ii)
+      {
+        XCHAR_TABLE (Vdigit_fixnum_map)->ascii['0' + ii] = make_fixnum(ii);
+      }
+
+    for (ii = 10; ii < 36; ++ii)
+      {
+        XCHAR_TABLE (Vdigit_fixnum_map)->ascii['a' + (ii - 10)]
+          = make_fixnum(ii);
+        XCHAR_TABLE (Vdigit_fixnum_map)->ascii['A' + (ii - 10)]
+          = make_fixnum(ii);
+      }
+  }
+  {
+    Ascbyte *fixnum_tab = alloca_ascbytes (36 * MAX_ICHAR_LEN), *ptr;
+    int ii;
+    Ichar cc;
+    memset ((void *)fixnum_tab, 0, 36 * MAX_ICHAR_LEN);
+
+    /* The whole point of fixnum_to_character_table is access as an array,
+       avoid O(N) issues by giving every character MAX_ICHAR_LEN of
+       bytes.  */
+    for (ii = 0, ptr = fixnum_tab; ii < 36; ++ii, ptr += MAX_ICHAR_LEN)
+      {
+	cc = ii < 10 ? '0' + ii : 'A' + (ii - 10);
+	set_itext_ichar ((Ibyte *) ptr, cc);
+      }
+
+    /* Sigh, we can't call build_fixnum_to_char_map() on Vdigit_fixnum_map,
+       this is too early in the boot sequence to map across a char table. Do
+       it by hand. */
+    Vfixnum_to_char_map = build_ascstring (fixnum_tab);
+    staticpro (&Vfixnum_to_char_map);
+  }
+
 #ifdef DEBUG_XEMACS
   DEFVAR_BOOL ("debug-issue-ebola-notices", &debug_issue_ebola_notices /*
 If non-zero, note when your code may be suffering from char-int confoundance.
diff -r 83e5c3cd6be6 -r 750fab17b299 src/lisp.h
--- a/src/lisp.h	Sat Jan 10 19:43:28 2015 +0900
+++ b/src/lisp.h	Wed Feb 25 11:47:12 2015 +0000
＠＠ -4594,6 +4594,10 ＠＠
 Lisp_Object word_to_lisp (unsigned int);
 unsigned int lisp_to_word (Lisp_Object);

+Lisp_Object parse_integer (const Ibyte *buf, Ibyte **buf_end_out,
+			   Bytecount len, EMACS_INT base,
+			   Boolint junk_allowed, Lisp_Object base_table);
+
 extern Lisp_Object Qarrayp, Qbitp, Qchar_or_string_p, Qcharacterp,
     Qerror_conditions, Qerror_message, Qinteger_char_or_marker_p,
     Qinteger_or_char_p, Qinteger_or_marker_p, Qlambda, Qlistp, Qnatnump,
diff -r 83e5c3cd6be6 -r 750fab17b299 src/lread.c
--- a/src/lread.c	Sat Jan 10 19:43:28 2015 +0900
+++ b/src/lread.c	Wed Feb 25 11:47:12 2015 +0000
＠＠ -1922,8 +1922,6 ＠＠
   return Lstream_byte_count (XLSTREAM (Vread_buffer_stream)) - 1;
 }

-static Lisp_Object parse_integer (const Ibyte *buf, Bytecount len, int base);
-
 static Lisp_Object
 read_atom (Lisp_Object readcharfun,
            Ichar firstchar,
＠＠ -1958,23 +1956,15 ＠＠
 	    p1++;
           if (p1 == p)
             {
+	      Ibyte *buf_end;
               /* It is an integer. */
 	      if (p1[-1] == '.')
-		p1[-1] = '\0';
-#if 0
-	      {
-		int number = 0;
-		if (sizeof (int) == sizeof (EMACS_INT))
-		  number = atoi (read_buffer);
-		else if (sizeof (long) == sizeof (EMACS_INT))
-		  number = atol (read_buffer);
-		else
-		  ABORT ();
-		return make_fixnum (number);
-	      }
-#else
-              return parse_integer ((Ibyte *) read_ptr, len, 10);
-#endif
+                {
+                  len -= 1;
+                }
+
+              return parse_integer ((Ibyte *) read_ptr, &buf_end, len, 10,
+                                    0, Qnil);
 	    }
 	}
 #ifdef HAVE_RATIO
＠＠ -2011,97 +2001,16 ＠＠
   }
 }

-
-static Lisp_Object
-parse_integer (const Ibyte *buf, Bytecount len, int base)
-{
-  const Ibyte *lim = buf + len;
-  const Ibyte *p = buf;
-  EMACS_UINT num = 0;
-  int negativland = 0;
-
-  if (*p == '-')
-    {
-      negativland = 1;
-      p++;
-    }
-  else if (*p == '+')
-    {
-      p++;
-      /* GMP deals with a leading plus sign, badly, make sure it doesn't see
-	 it. */
-      buf++;
-    }
-
-  if (p == lim)
-    goto loser;
-
-  for (; (p < lim) && (*p != '\0'); p++)
-    {
-      int c = *p;
-      EMACS_UINT onum;
-
-      if (isdigit (c))
-	c = c - '0';
-      else if (isupper (c))
-	c = c - 'A' + 10;
-      else if (islower (c))
-	c = c - 'a' + 10;
-      else
-	goto loser;
-
-      if (c < 0 || c >= base)
-	goto loser;
-
-      onum = num;
-      num = num * base + c;
-      if (num < onum)
-	goto overflow;
-    }
-
-  {
-    EMACS_INT int_result = negativland ? - (EMACS_INT) num : (EMACS_INT) num;
-    Lisp_Object result = make_fixnum (int_result);
-    if (num && ((XFIXNUM (result) < 0) != negativland))
-      goto overflow;
-    if (XFIXNUM (result) != int_result)
-      goto overflow;
-    return result;
-  }
- overflow:
-#ifdef HAVE_BIGNUM
-  {
-    bignum_set_string (scratch_bignum, (const char *) buf, base);
-    return make_bignum_bg (scratch_bignum);
-  }
-#else
-  return Fsignal (Qinvalid_read_syntax,
-                  list3 (build_msg_string
-			 ("Integer constant overflow in reader"),
-                         make_string (buf, len),
-                         make_fixnum (base)));
-#endif /* HAVE_BIGNUM */
- loser:
-  return Fsignal (Qinvalid_read_syntax,
-                  list3 (build_msg_string
-			 ("Invalid integer constant in reader"),
-                         make_string (buf, len),
-                         make_fixnum (base)));
-}
-
-
 static Lisp_Object
 read_integer (Lisp_Object readcharfun, int base)
 {
   /* This function can GC */
   int saw_a_backslash;
+  Ibyte *buf_end;
   Bytecount len = read_atom_0 (readcharfun, -1, &saw_a_backslash);
   return (parse_integer
 	  (resizing_buffer_stream_ptr (XLSTREAM (Vread_buffer_stream)),
-	   ((saw_a_backslash)
-	    ? 0 /* make parse_integer signal error */
-	    : len),
-	   base));
+           &buf_end, len, base, 0, Qnil));
 }

 static Lisp_Object
＠＠ -2700,6 +2609,7 ＠＠
 	    /* Reader forms that can reuse previously read objects.  */
 	    {
 	      Lisp_Object parsed, found;
+	      Ibyte *buf_end;

 	      Lstream_rewind (XLSTREAM (Vread_buffer_stream));

＠＠ -2718,10 +2628,10 ＠＠

 	      parsed
 		= parse_integer (resizing_buffer_stream_ptr
-				 (XLSTREAM (Vread_buffer_stream)),
+				 (XLSTREAM (Vread_buffer_stream)), &buf_end,
 				 Lstream_byte_count (XLSTREAM
 						     (Vread_buffer_stream))
-				 - 1, 10);
+				 - 1, 10, 0, Qnil);

 	      found = assoc_no_quit (parsed, Vread_objects);
 	      if (c == '=')
diff -r 83e5c3cd6be6 -r 750fab17b299 src/number.h
--- a/src/number.h	Sat Jan 10 19:43:28 2015 +0900
+++ b/src/number.h	Wed Feb 25 11:47:12 2015 +0000
＠＠ -101,12 +101,18 ＠＠
 #if SIZEOF_EMACS_INT == SIZEOF_LONG
 # define bignum_fits_emacs_int_p(b) bignum_fits_long_p(b)
 # define bignum_to_emacs_int(b) bignum_to_long(b)
+# define bignum_set_emacs_int bignum_set_long
+# define make_bignum_emacs_uint(b) make_bignum_un(b)
 #elif SIZEOF_EMACS_INT == SIZEOF_INT
 # define bignum_fits_emacs_int_p(b) bignum_fits_int_p(b)
 # define bignum_to_emacs_int(b) bignum_to_int(b)
+# define bignum_set_emacs_int bignum_set_long
+# define make_bignum_emacs_uint(b) make_bignum_un(b)
 #else
 # define bignum_fits_emacs_int_p(b) bignum_fits_llong_p(b)
 # define bignum_to_emacs_int(b) bignum_to_llong(b)
+# define bignum_set_emacs_int bignum_set_llong
+# define make_bignum_emacs_uint(b) make_bignum_ull(b)
 #endif

 extern Lisp_Object make_bignum (long);
diff -r 83e5c3cd6be6 -r 750fab17b299 tests/ChangeLog
--- a/tests/ChangeLog	Sat Jan 10 19:43:28 2015 +0900
+++ b/tests/ChangeLog	Wed Feb 25 11:47:12 2015 +0000
＠＠ -1,3 +1,13 ＠＠
+2015-02-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* automated/lisp-tests.el:
+	parse_integer(), used in #'read, now signals invalid-argument
+	rather than invalid-read-syntax, check for that.
+	* automated/lisp-tests.el:
+	Check #'parse-integer now it's available to Lisp, check
+	#'digit-char, #'digit-char-p and the congruence in behaviour,
+	check the XEmacs-specific RADIX-TABLE argument behaviour.
+
 2014-10-11  Stephen J. Turnbull  <stephen(a)xemacs.org&gt;

 	* automated/keymap-tests.el:
diff -r 83e5c3cd6be6 -r 750fab17b299 tests/automated/lisp-tests.el
--- a/tests/automated/lisp-tests.el	Sat Jan 10 19:43:28 2015 +0900
+++ b/tests/automated/lisp-tests.el	Wed Feb 25 11:47:12 2015 +0000
＠＠ -1473,8 +1473,8 ＠＠
     (progn
       (Check-Error wrong-type-argument (format "%u" most-negative-fixnum))
       (Check-Error wrong-type-argument (format "%u" -1)))
-  (Check-Error invalid-read-syntax (read (format "%u" most-negative-fixnum)))
-  (Check-Error invalid-read-syntax (read (format "%u" -1))))
+  (Check-Error invalid-argument (read (format "%u" most-negative-fixnum)))
+  (Check-Error invalid-argument (read (format "%u" -1))))

 ;; Check all-completions ignore element start with space.
 (Assert (not (all-completions "" '((" hidden" .
"object")))))
＠＠ -3451,4 +3451,126 ＠＠
   (test-write-string write-string :sequences-too nil)
   (test-write-string write-line :worry-about-newline t :sequences-too nil))

+;;-----------------------------------------------------
+;; Test #'parse-integer and friends.
+;;-----------------------------------------------------
+
+(Check-Error wrong-type-argument (parse-integer 123456789))
+(Check-Error wrong-type-argument (parse-integer "123456789" :start -1))
+(if (featurep 'bignum)
+    (progn
+      (Check-Error args-out-of-range
+                   (parse-integer "123456789" :start (1+
most-positive-fixnum)))
+      (Check-Error args-out-of-range
+                   (parse-integer "123456789" :end (1+
most-positive-fixnum))))
+  (Check-Error wrong-type-argument
+               (parse-integer "123456789" :start (1+ most-positive-fixnum)))
+  (Check-Error wrong-type-argument
+               (parse-integer "123456789" :end (1+ most-positive-fixnum))))
+
+(Check-Error args-out-of-range (parse-integer "123456789" :radix -1))
+(Check-Error args-out-of-range
+             (parse-integer "123456789" :radix (1+ most-positive-fixnum)))
+(Check-Error wrong-number-of-arguments
+             (parse-integer "123456789" :junk-allowed))
+(Check-Error invalid-keyword-argument
+             (parse-integer "123456789" :no-such-keyword t))
+
+;; Next two paragraphs of tests from GNU, thank you Leo Liu.
+(Assert (eql (digit-char-p ?3) 3))
+(Assert (eql (digit-char-p ?a 11) 10))
+(Assert (eql (digit-char-p ?w 36) 32))
+(Assert (not (digit-char-p ?a)))
+(Check-Error args-out-of-range (digit-char-p ?a 37))
+(Assert (not (digit-char-p ?a 1)))
+
+(Assert (equal (multiple-value-list (parse-integer "	-123  ")) '(-123 7)))
+(Assert (equal (multiple-value-list
+                   (parse-integer "-efz" :radix 16 :junk-allowed t))
+               '(-239 3)))
+(Assert (equal (multiple-value-list (parse-integer "zzef" :radix 16 :start 2))
+               '(239 4)))
+(Assert (equal (multiple-value-list
+                   (parse-integer "0123456789" :radix 8 :junk-allowed t))
+               '(342391 8)))
+(Assert (equal (multiple-value-list (parse-integer "" :junk-allowed t))
+               '(nil 0)))
+(Assert (equal (multiple-value-list (parse-integer "abc" :junk-allowed t))
+               '(nil 0)))
+(Check-Error invalid-argument (parse-integer "0123456789" :radix 8))
+(Check-Error invalid-argument (parse-integer "abc"))
+(Check-Error invalid-argument (parse-integer "efz" :radix 16))
+
+;; We don't allow a trailing decimal point, as the Lisp reader does.
+(Check-Error invalid-argument (parse-integer "12348."))
+
+;; In contravention of Common Lisp, we allow both 0 and 1 as values for RADIX,
+;; useless as that is.
+(Assert (equal (multiple-value-list (parse-integer "00000" :radix 1)) '(0
5))
+        "checking 1 is allowed as a value for RADIX")
+(Assert (equal (multiple-value-list
+                   (parse-integer "" :radix 0 :junk-allowed t))
+               '(nil 0))
+        "checking 0 is allowed as a value for RADIX")
+
+(let ((binary-table
+       (copy-char-table #s(char-table :type generic :default -1 :data ()))))
+  (loop for fixnum from 00 to #xff
+        do (put-char-table (int-char fixnum) fixnum binary-table))
+  (Assert (eql most-positive-fixnum
+               (parse-integer
+                (concatenate 'string "\x3f"
+                             (make-string
+                              (/ (- (integer-length most-positive-fixnum)
+                                    (integer-length #x3f)) 8)
+                              ?\xff))
+:radix-table binary-table :radix #x100))
+          "checking parsing text using base 256 (big endian binary) works")
+  (Assert (equal
+           (multiple-value-list 
+               (parse-integer " \1\7\1\7 " :radix-table binary-table))
+           '(1717 6))
+          "checking whitespace treated as such when it is not < radix")
+  (Assert (equal
+           (multiple-value-list 
+               (parse-integer " \1\7\1\7 " :radix-table binary-table
+:junk-allowed t))
+           '(1717 5))
+          "checking whitespace treated as junk when it is not < radix")
+  (Check-Error invalid-argument
+               (parse-integer "1234" :radix-table binary-table))
+  (Assert (equal
+           (multiple-value-list
+               (parse-integer "--" :radix-table binary-table :radix #x100))
+           '(-45 2))
+          "checking ?- always treated as minus sign initially")
+  (Assert (equal
+           (multiple-value-list
+               (parse-integer "+20" :radix-table binary-table :radix #x100))
+           '(2830896 3))
+          "checking ?+ not dropped initially if it has integer weight")
+  (Assert (eql #xff (digit-char-p ?� #x100 binary-table))
+          "checking `digit-char-p' behaves correctly with base 256")
+  (Assert (eql ?\xff (digit-char #xff #x100 binary-table))
+          "checking `digit-char' behaves correctly with base 256")
+  (Assert (eql (parse-integer " " :radix-table binary-table :radix #x100)
+               #x20)
+          "checking whitespace not treated as such when it has fixnum weight")
+  (Assert (null (digit-char-p ?0 nil binary-table))
+          "checking `digit-char-p' reflects RADIX-TABLE, ?0")
+  (Assert (null (digit-char-p ?9 nil binary-table))
+          "checking `digit-char-p' reflects RADIX-TABLE, ?9")
+  (Assert (null (digit-char-p ?a 16 binary-table))
+          "checking `digit-char-p' reflects RADIX-TABLE, ?a")
+  (Assert (eql ?� (digit-char #xff #x100 binary-table))
+          "checking `digit-char' reflects RADIX-TABLE, #xff")
+  (Assert (eql ?a (digit-char #x61 #x100 binary-table))
+          "checking `digit-char' reflects RADIX-TABLE, #x61")
+  (Assert (null (digit-char #xff nil binary-table))
+          "checking `digit-char' reflects RADIX-TABLE, #xff, base 10")
+  (Assert (eql ?\x0a (digit-char 10 16 binary-table))
+          "checking `digit-char' reflects RADIX-TABLE, 10, base 16")
+  (Assert (eql ?\x09 (digit-char 9 nil binary-table))
+          "checking `digit-char' reflects RADIX-TABLE, 9, base 10"))
+
 ;;; end of lisp-tests.el

-- 
‘Tramadol is further fed to cattle […] when working them […] (as draft
animals) so that the animals do not get tired quickly. …’
	— Angewandte Chemie, Sept 2014, describing the social context of
	(synthetic) tramadol having been found in Cameroon tree roots.

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[PATCH] Add #'parse-integer to data.c