---------------------------- revision 1.171 date: 2001/01/25 01:44:54; author: monnier; state: Exp; lines: +1 -0 (mutually_exclusive_p): Add missing `break' at the end of `charset' processing. ---------------------------- revision 1.170 date: 2001/01/24 23:11:40; author: monnier; state: Exp; lines: +19 -6 (mutually_exclusive_p): Don't blindly handle `charset_not' as if it was a `charset'. ---------------------------- revision 1.169 date: 2000/10/30 15:20:17; author: monnier; state: Exp; lines: +40 -41 (re_iswctype, re_wctype_to_bit): Fix braino. (regex_compile): Catch bogus \(\1\). ---------------------------- revision 1.168 date: 2000/10/27 13:29:36; author: monnier; state: Exp; lines: +14 -13 (POP_FAILURE_REG_OR_COUNT, re_match_2_internal) (re_match_2_internal, re_match_2_internal, re_match_2_internal): Giving in to popular pressure to shut up the compiler with casts. ---------------------------- revision 1.167 date: 2000/10/26 00:45:01; author: monnier; state: Exp; lines: +165 -195 More `unsigned char' -> `re_char' changes. Also change several `int' into `re_wchar_t'. (PATTERN_STACK_EMPTY, PUSH_PATTERN_OP, POP_PATTERN_OP): Remove. (PUSH_FAILURE_POINTER): Don't cast any more. (POP_FAILURE_REG_OR_COUNT): Remove the cast that strips `const'. We want GCC to complain, since this piece of code makes re_match non-reentrant, which *should* be fixed. (GET_BUFFER_SPACE): Use size_t rather than unsigned long. (EXTEND_BUFFER): Use RETALLOC. (SET_LIST_BIT): Don't cast. (re_wchar_t): New type. (re_iswctype, re_wctype_to_bit): Make it crystal clear to GCC that those two functions will always properly return. (IMMEDIATE_QUIT_CHECK): Cast to void. (analyse_first): Use recursion rather than an explicit stack. (re_compile_fastmap): Can't fail anymore. (re_search_2): Don't check re_compile_fastmap for failure. (PUSH_NUMBER): Renamed from PUSH_FAILURE_COUNT. Now also sets the new value (passed in a new argument). (re_match_2_internal): Use it. Also, use a new var `reg' of type size_t when looping through regs rather than reuse the inappropriate `mcnt'. ---------------------------- revision 1.166 date: 2000/10/24 14:00:55; author: andrewi; state: Exp; lines: +13 -8 (IMMEDIATE_QUIT_CHECK): New macro, which does QUIT on NT-Emacs only. (re_match_2_internal): Use IMMEDIATE_QUIT_CHECK instead of QUIT, so that re_search functions only quit when callers expect them to. ---------------------------- revision 1.165 date: 2000/10/24 08:27:34; author: handa; state: Exp; lines: +1 -1 (regex_compile): Fix previous change. ---------------------------- revision 1.164 date: 2000/10/24 08:10:27; author: handa; state: Exp; lines: +8 -7 (regex_compile): Change the way of handling a range from a char less than 256 to a char not less than 256. ---------------------------- revision 1.163 date: 2000/10/17 06:55:04; author: jbailey; state: Exp; lines: +0 -4 Remove warning that noone noticed anyway ---------------------------- revision 1.162 date: 2000/10/15 16:44:45; author: monnier; state: Exp; lines: +3 -3 (WIDE_CHAR_SUPPORT): Define if _LIBC as well. Mostly, just a test of the CVS repository. ---------------------------- revision 1.161 date: 2000/09/19 15:47:02; author: jbailey; state: Exp; lines: +4 -0 Add warning to top of source files ---------------------------- revision 1.160 date: 2000/09/04 04:24:00; author: monnier; state: Exp; lines: +155 -111 (WIDE_CHAR_SUPPORT): New macro. (btowc, iswctype, wctype) [_LIBC]: Redefine to __. (BIT_ALPHA, BIT_ALNUM, BIT_ASCII, BIT_NONASCII, BIT_GRAPH, BIT_PRINT) (BIT_UNIBYTE): Remove. (re_match_2_internal): Delete corresponding code and streamline the BIT_MULTIBYTE case to not bother checking ISUNIBYTE. (CHAR_CLASS_MAX_LENGTH) [!WIDE_CHAR_SUPPORT]: Set to 9 rather than 6. (re_wctype_t): New type. (re_wctype, re_iswctype, re_wctype_to_bit): New functions. (regex_compile): Use them and fix handling of overly long char classes. ---------------------------- revision 1.159 date: 2000/08/31 17:19:13; author: monnier; state: Exp; lines: +95 -52 * regex.h (RE_NO_NEWLINE_ANCHOR): New syntax flag. (struct re_pattern_buffer): Remove newline_anchor. * regex.c: Keep namespace clean for GNU libc by renaming to __ and using `weak_alias (__, )'. (re_max_failures, fail_stack): Use size_t rather than unsigned. (regex_compile): For ^ and $, choose between buffer and line (beg|end) depending on the new RE_NO_NEWLINE_ANCHOR syntax flag. (print_compiled_pattern, re_search_2, mutually_exclusive_p) (re_match_2_internal, re_compile_pattern, re_comp, regcomp): Get rid of references to newline_anchor. (regcomp): Allocate and precompute a fastmap. ---------------------------- revision 1.158 date: 2000/08/30 18:31:17; author: monnier; state: Exp; lines: +197 -142 Merge some changes from GNU libc. Add prototypes. (bcopy, bcmp, REGEX_REALLOCATE, re_match_2_internal): Use memcmp and memcpy instead of bcopy and bcmp. (init_syntax_once): Use ISALNUM. (PUSH_FAILURE_POINT, re_match_2_internal): Remove failure_id. (REG_UNSET_VALUE): Remove. Use NULL instead. (REG_UNSET, re_match_2_internal): Use NULL. (SET_HIGH_BOUND, MOVE_BUFFER_POINTER, ELSE_EXTEND_BUFFER_HIGH_BOUND): New macros. (EXTEND_BUFFER): Use them (to work with BOUNDED_POINTERS). (GET_UNSIGNED_NUMBER): Don't use ISDIGIT. (regex_compile): In handle_interval, return an error rather than try to unfetch the interval if we can't find the closing brace. Obey the RE_NO_GNU_OPS syntax bit. (TOLOWER): New macro. (regcomp): Use it. (regexec): Allocate regs.start and regs.end as one block. ---------------------------- revision 1.157 date: 2000/08/28 00:37:22; author: monnier; state: Exp; lines: +285 -297 * regex.c: Indent cpp directives and remove parens after `defined'. (PTR_TO_OFFSET, POS_AS_IN_BUFFER): Move to a better place. (ISDIGIT, ISCNTRL, ISXDIGIT) [!emacs]: Remove duplicate definition. (regex_compile): Use RE_FRUGAL instead of RE_ALL_GREEDY. (re_compile_pattern): Use size_t for length. (init_syntax_once): Move to a better place. * regex.h: Merge changes from GNU libc. Indent cpp directives. (RE_FRUGAL): Replaces RE_ALL_GREEDY (inverted meaning). ---------------------------- revision 1.156 date: 2000/08/25 14:35:12; author: monnier; state: Exp; lines: +49 -25 (PUSH_FAILURE_COUNT): New macro. (POP_FAILURE_REG_OR_COUNT): Renamed from POP_FAILURE_REG. Handle popping of a register's or a counter's data. (POP_FAILURE_POINT): Use the new name. (re_match_2_internal): Push counter data on the stack for succeed_n, jump_n and set_number_at and remove misleading dead code in succeed_n. ---------------------------- revision 1.155 date: 2000/08/11 01:56:59; author: handa; state: Exp; lines: +15 -2 (regex_compile) : Pay attention to multibyteness. (analyse_first) : Setup fastmap correctly for eight-bit-control characters. ---------------------------- revision 1.154 date: 2000/06/20 16:48:05; author: monnier; state: Exp; lines: +10 -3 (re_match, re_match_2): Protect calls to alloca (0). (re_comp): Cast gettext return value to avoid complaints when !HAVE_LIBINTL. ---------------------------- revision 1.153 date: 2000/06/10 08:04:33; author: handa; state: Exp; lines: +17 -11 (MAKE_CHAR) [!emacs]: Dummy macro for non-Emacs env. (regex_compile): Fix the code for handling the case of single byte char and multibyte char being mixed in a range within [...]. ---------------------------- revision 1.152 date: 2000/05/30 02:58:58; author: monnier; state: Exp; lines: +20 -9 (PREFETCH_NOLIMIT): New function. (re_match_2_internal): Use it and adjust the end_match_2 logic. ---------------------------- revision 1.151 date: 2000/05/25 16:30:40; author: monnier; state: Exp; lines: +6 -1 (at_begline_loc_p): Also recognize the \\(?:^ case of an anchor at the beginning of a shy-group. ---------------------------- revision 1.150 date: 2000/04/19 21:39:18; author: monnier; state: Exp; lines: +34 -21 (re_match_2_internal): Don't shorten the strings anymore, instead define end_match(1|2) more carefully. Use GET_CHAR_BEFORE_2 for `begline'. ---------------------------- revision 1.149 date: 2000/04/02 23:56:45; author: monnier; state: Exp; lines: +85 -131 * regex.c (PTR_TO_OFFSET) [!emacs]: Remove. (RE_MULTIBYTE_P, RE_STRING_CHAR_AND_LENGTH): New macros. (GET_CHAR_BEFORE_2): Moved from charset.h plus fixed minor bug when we are between str1 and str2. (MAX_MULTIBYTE_LENGTH, CHAR_STRING) [!emacs]: Provide trivial default. (PATFETCH): Use `TRANSLATE'. (PATFETCH_RAW): Fetch multibyte char if applicable. (PATUNFETCH): Remove. (regex_compile): Rely on PATFETCH to do most of the multibyte magic. When writing a char, write it directly into the pattern buffer rather than going needlessly through a temp char-array. (re_match_2_internal): Similarly, rely on RE_STRING_CHAR to do the multibyte magic and remove the useless `#ifdef emacs'. (bcmp_translate): Don't compare as multibyte chars when in a unibyte buffer. * regex.h (struct re_pattern_buffer): Make field `multibyte' conditional on `emacs'. * charset.h (GET_CHAR_BEFORE_2): Moved to regex.c. ---------------------------- revision 1.148 date: 2000/03/29 04:01:25; author: monnier; state: Exp; lines: +86 -63 (analyse_first): New function obtained by ripping out most of re_compile_fastmap and generalizing it a little bit so that it can also just return whether a given (sub)pattern can match the empty string or not. (regex_compile): Use `analyse_first' to decide whether the loop-check needs to be done or not for *, +, *? and +? (the loop check is costly for non-greedy repetition). (re_compile_fastmap): Delegate the actual work to `analyse_first'. ---------------------------- revision 1.147 date: 2000/03/27 22:26:37; author: monnier; state: Exp; lines: +102 -121 (REGEX_FREE_STACK, RESET_FAIL_STACK): Make them usable as an expression. (enum re_opcode_t): Update description of succeed_n. (PATFETCH): Always define. (regex_compile): Use lookahead rather than PATUNFETCH (for repetition operators, char classes, shy-groups and intervals). Optimize special cases of intervals so as to only use succeed_n and jump_n when really needed. (re_compile_fastmap): Simplify handling of jump_n and succeed_n now that we don't have to handle the special cases any more. Simplify on_failure_jump handling as well. ---------------------------- revision 1.146 date: 2000/03/26 23:05:51; author: monnier; state: Exp; lines: +50 -6 (enum re_opcode_t): New opcode on_failure_jump_nastyloop. (print_partial_compiled_pattern, re_compile_fastmap): Handle new opcode. (regex_compile): Use on_failure_jump_nastyloop for non-greedy loops. (re_match_2_internal): Add code for on_failure_jump_nastyloop when executing it as well as when popping it off the stack to find infinite loops in non-greedy repetition operators. ---------------------------- revision 1.145 date: 2000/03/23 04:36:14; author: monnier; state: Exp; lines: +4 -8 (enum syntaxcode): Provide default for non-Emacs. (re_compile_fastmap, re_match_2_internal): Undo Dave's previous fix. ---------------------------- revision 1.144 date: 2000/03/22 14:25:38; author: fx; state: Exp; lines: +2 -2 (re_compile_fastmap, re_match_2_internal): Fix cast to re_opcode_t. ---------------------------- revision 1.143 date: 2000/03/22 04:17:32; author: monnier; state: Exp; lines: +113 -307 (CHAR_CHARSET, CHARSET_LEADING_CODE_BASE): Add default definitions for non-Emacs compilation. (enum re_opcode_t): Remove (not)wordchar and move (not)syntaxspec outside of `#ifdef emacs'. (print_partial_compiled_pattern): Update. (regex_compile): Use (not)syntaxspec(Sword) instead of (not)wordchar. (re_compile_fastmap): Merge handling of charset and charset_not (for emacs and non-emacs compilation as well). Similarly for (not)categoryspec and (not)syntaxspec. Don't use the fastmap when reaching `anychar' since the added complexity is not justified. (re_match_2_internal): Merge (not)wordchar (emacs and non-emacs) and (not)syntaxspec. Merge (not)categoryspec. ---------------------------- revision 1.142 date: 2000/03/19 23:22:06; author: monnier; state: Exp; lines: +218 -200 (RE_STRING_CHAR): New macro. (GET_CHAR_AFER_2): Remove. (RE_TRANSLATE, RE_TRANSLATE_P): New macros moved from regex.h. (enum re_opcode_t): Remove on_failure_jump_exclusive. (print_partial_compiled_pattern, re_compile_fastmap) (re_match_2_internal): Remove on_failure_jump_exclusive. (regex_compile): Turn optimizable P+ loops into PP*, so that the optimization only need to work for * (ie. can use of_keep_string_jump). Remove the special case for .*\n since it is now covered by the general optimization. (re_search_2): Don't bother with `room'. (skip_one_char): New function. (skip_noops): Simplify since `memory' is not needed any more. (mutually_exclusive_p): Restructure slightly to use `switch' and add handling for "all" remaining cases. (re_match_2_internal): Change on_failure_jump_smart to use on_failure_keep_string_jump (and redirect the end-of-loop jump) rather than on_failure_jump_exclusive. ---------------------------- revision 1.141 date: 2000/03/16 02:53:38; author: monnier; state: Exp; lines: +210 -182 (re_match_2): Fix string shortening (to fit `stop') to make sure POINTER_TO_OFFSET gives the same value before and after PREFETCH. Use `dfail' to guarantee "atomic" matching. (PTR_TO_OFFSET): Use POINTER_TO_OFFSET. (debug): Now only active if > 0 rather than if != 0. (DEBUG_*): Update for the new meaning of `debug'. (print_partial_compiled_pattern): Add missing `succeed' case. Use CHARSET_* macros in the charset(_not) branch. Fix off-by-two bugs in `succeed_n', `jump_n' and `set_number_at'. (store_op1, store_op2, insert_op1, insert_op2) (at_begline_loc_p, at_endline_loc_p): Add prototype. (group_in_compile_stack): Move to after its arg's types are declared and add a prototype. (PATFETCH): Define in terms of PATFETCH_RAW. (GET_UNSIGNED_NUMBER): Add the usual `do { ... } while(0)' wrapper. (QUIT): Redefine as a nop except for NTemacs. (regex_compile): Handle intervals {,M} as if it was {0,M}. Fix indentation of the greedy-op and shy-group code. (at_(beg|end)line_loc_p): Fix argument's types. (re_compile_fastmap): Ifdef out failure_stack_ptr to shut up gcc. (re_search_2): Use POS_AS_IN_BUFFER. Simplify `room' computation. (MATCHING_IN_FIRST_STRING): Remove. (re_match_2): Use POS_AS_IN_BUFFER. Ifdef out failure_stack_ptr to shut up gcc. Use FIRST_STRING_P and POINTER_TO_OFFSET. Use QUIT unconditionally. ---------------------------- revision 1.140 date: 2000/03/14 00:27:57; author: monnier; state: Exp; lines: +84 -114 * regex.c: Declare a new type `re_char' used throughout the code for the string char type. It's `const unsigned char' to match the rest of Emacs. Consistently make sure all pointers to strings use it and make sure all pointers into the pattern use `unsigned char'. (re_match_2_internal): Use `PREFETCH+STRING_CHAR' instead of GET_CHAR_AFTER_2. Also merge wordbound and notwordbound to reduce code duplication. * charset.h (GET_CHAR_AFTER_2): Remove. (GET_CHAR_BEFORE_2): Use unsigned chars, like everywhere else. ---------------------------- revision 1.139 date: 2000/03/08 23:25:41; author: monnier; state: Exp; lines: +621 -1217 This is a big redesign of failure-stack and register handling, prompted by bugs revealed when trying to add shy-groups. Overall, what happened is that loops are now structured a little differently, groups can be shy and the code is a little simpler. (enum re_opcode_t): Remove jump_past_alt, maybe_pop_jump, push_dummy_failure and dumy_failure_jump. Add on_failure_jump_(exclusive, loop and smart). Also fix the comment for (start|stop)_memory since they now only take one argument (the second has becomes unnecessary). (print_partial_compiled_pattern): Adjust for changes in re_opcode_t. (print_compiled_pattern): Use %ld to printf long ints and flush to make debugging a little easier. (union fail_stack_elt): Make the integer unsigned. (struct fail_stack_type): Add a `frame' element. (INIT_FAIL_STACK): Init `frame' as well. (POP_PATTERN_OP): New macro for re_compile_fastmap. (DEBUG_PUSH, DEBUG_POP): Remove. (NUM_REG_ITEMS): Remove. (NUM_NONREG_ITEMS): Adjust. (FAILURE_PAT, FAILURE_STR, NEXT_FAILURE_HANDLE, TOP_FAILURE_HANDLE): New macros for the cycle detection. (ENSURE_FAIL_STACK): New macro for PUSH_FAILURE_(REG|POINT). (PUSH_FAILURE_REG, POP_FAILURE_REG, CHECK_INFINITE_LOOP): New macros. (PUSH_FAILURE_POINT): Don't push registers any more. The pattern address pushed is not the destination of the jump but the source of it instead. (NUM_FAILURE_ITEMS): Remove. (POP_FAILURE_POINT): Adapt to the new stack structure (i.e. pop registers before the actual failure point). Don't hardcode any meaning for str==NULL anymore. (union register_info_type, REG_MATCH_NULL_STRING_P, IS_ACTIVE) (MATCHED_SOMETHING, EVER_MATCHED_SOMETHING, SET_REGS_MATCHED): Remove. (REG_UNSET_VALUE): Use NULL (why not?). (compile_range): Remove declaration since it doesn't exist. (struct compile_stack_elt_t): Remove inner_group_offset. (old_reg(start|end), reg_info, reg_dummy, reg_info_dummy): Remove. (regex_grow_registers): Remove dead code. (FIXUP_ALT_JUMP): New macro. (regex_compile): Add shy-groups Change loops to use on_failure_jump_smart&jump instead of on_failure_jump&maybe_pop_jump. Change + loops to eliminate the initial (dummy_failure_)jump. Remove c1_base (looks like unused variable to me). Use `jump' instead of `jump_past_alt' and don't bother with push_dummy_failure in alternatives since it is now unnecessary. Use FIXUP_ALT_JUMP. Eliminate a useless `#ifdef emacs' for (re)allocating the stack. (re_compile_fastmap): Remove dead variables i and num_regs. Exit from loop when bufp->can_be_null rather than jumping to `done'. Avoid jumping backwards so as to ensure termination. Use PATTERN_STACK_EMPTY and POP_PATTERN_OP. Improved handling of backreferences. Remove dead code in handling of `anychar'. (skip_noops, mutually_exclusive_p): New functions taken from the handling of `maybe_pop_jump' in re_match_2_internal. Slightly improve mutually_exclusive_p to handle ".+\n". ((lowest|highest)_active_reg, NO_(LOWEST|HIGHEST)_ACTIVE_REG) Remove. (re_match_2_internal): Use %p instead of 0x%x when printf'ing ptrs. Don't SET_REGS_MATCHED anymore. Remove many dead variables. Push register (in `start_memory') on the stack rather than storing it in old_reg(start|end). Remove the cycle detection from `stop_memory', replaced by the use of on_failure_jump_loop for greedy loops. Add code for the new on_failure_jump_. Remove ad-hoc code in `on_failure_jump' to push more registers in the case of a loop. Take out code from `maybe_pop_jump' into separate functions and adapt it to the semantics of `on_failure_jump_smart'. Remove jump_past_alt, dummy_failure_jump and push_dummy_failure. Remove dummy_failure handling and handling of `failures to jump to on_failure_jump' (this last one was already dead code, it seems). ((group|alt|common_op)_match_null_string_p): Remove. ----------------------------