[2/2]
A rerun of the second line skip test:
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 0 6) 1 -> yes
bol ("\n\n\n\n\n\n" 0 6) 2 -> yes
bol ("\n\n\n\n\n\n" 0 6) 3 -> yes
bol ("\n\n\n\n\n\n" 0 6) 4 -> yes
bol ("\n\n\n\n\n\n" 0 6) 5 -> yes
$2 = "start\nstart\nstart\nstart\nstart\nstart\n"
There is no match at the very end of the input for the same reason as >>139. Previously there was because irregex-search/backtrack allows looking for matches past the last character, which is used by some branches of sre->procedure as a signal to try to acquire more input.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L1955
A rerun of the double match test:
scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*---"
The match positions are correct but the output lost the 'a'. This happens because the kons lambda passed by irregex-replace/all to irregex-fold/fast is also incompetent at detecting empty matches.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3880
(define (irregex-replace/all irx str . o)
(if (not (string? str)) (error "irregex-replace/all: not a string" str))
(irregex-fold/fast
irx
(lambda (i m acc)
(let* ((m-start (%irregex-match-start-index m 0))
(res (if (>= i m-start)
(append (irregex-apply-match m o) acc)
(append (irregex-apply-match m o)
(cons (substring str i m-start) acc)))))
;; include the skipped char on empty matches
(if (= i (%irregex-match-end-index m 0))
(cons (substring str i (+ i 1)) res)
res)))
'()
str
(lambda (i acc)
(let ((end (string-length str)))
(string-cat-reverse (if (>= i end)
acc
(cons (substring str i end) acc)))))))
Once again the match end is compared to the search start instead of the match start. The fix:
$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm 2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm 2020-07-14 11:52:27.010199230 +0000
@@ -3888,8 +3889,9 @@
(append (irregex-apply-match m o)
(cons (substring str i m-start) acc)))))
;; include the skipped char on empty matches
- (if (= i (%irregex-match-end-index m 0))
- (cons (substring str i (+ i 1)) res)
+ (if (and (= m-start (%irregex-match-end-index m 0))
+ (< m-start (string-length str)))
+ (cons (substring str m-start (+ m-start 1)) res)
res)))
'()
str
A rerun of the double match test:
scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*a---"
I expect that the other kons lambdas will have the same bug. These patches allow wrap-end-chunker to work without the hack from >>137. Also, irregex-fold/chunked/fast >>135 has the bugs of irregex-fold/fast and a few more. This entire irregex thing seems to have a worrying amount of undiscovered bugs.