[ prog / sol / mona ]

prog


What are you working on?

141 2020-07-14 13:39

[2/2]
A rerun of the second line skip test:

scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 0 6) 1 -> yes
bol ("\n\n\n\n\n\n" 0 6) 2 -> yes
bol ("\n\n\n\n\n\n" 0 6) 3 -> yes
bol ("\n\n\n\n\n\n" 0 6) 4 -> yes
bol ("\n\n\n\n\n\n" 0 6) 5 -> yes
$2 = "start\nstart\nstart\nstart\nstart\nstart\n"

There is no match at the very end of the input for the same reason as >>139. Previously there was because irregex-search/backtrack allows looking for matches past the last character, which is used by some branches of sre->procedure as a signal to try to acquire more input.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L1955
A rerun of the double match test:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*---"

The match positions are correct but the output lost the 'a'. This happens because the kons lambda passed by irregex-replace/all to irregex-fold/fast is also incompetent at detecting empty matches.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3880

(define (irregex-replace/all irx str . o)
  (if (not (string? str)) (error "irregex-replace/all: not a string" str))
  (irregex-fold/fast
   irx
   (lambda (i m acc)
     (let* ((m-start (%irregex-match-start-index m 0))
            (res (if (>= i m-start)
                     (append (irregex-apply-match m o) acc)
                     (append (irregex-apply-match m o)
                             (cons (substring str i m-start) acc)))))
       ;; include the skipped char on empty matches
       (if (= i (%irregex-match-end-index m 0))
           (cons (substring str i (+ i 1)) res)
           res)))
   '()
   str
   (lambda (i acc)
     (let ((end (string-length str)))
       (string-cat-reverse (if (>= i end)
                               acc
                               (cons (substring str i end) acc)))))))

Once again the match end is compared to the search start instead of the match start. The fix:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-14 11:52:27.010199230 +0000
@@ -3888,8 +3889,9 @@
                      (append (irregex-apply-match m o)
                              (cons (substring str i m-start) acc)))))
        ;; include the skipped char on empty matches
-       (if (= i (%irregex-match-end-index m 0))
-           (cons (substring str i (+ i 1)) res)
+       (if (and (= m-start (%irregex-match-end-index m 0))
+                (< m-start (string-length str)))
+           (cons (substring str m-start (+ m-start 1)) res)
            res)))
    '()
    str

A rerun of the double match test:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*a---"

I expect that the other kons lambdas will have the same bug. These patches allow wrap-end-chunker to work without the hack from >>137. Also, irregex-fold/chunked/fast >>135 has the bugs of irregex-fold/fast and a few more. This entire irregex thing seems to have a worrying amount of undiscovered bugs.

199


VIP:

do not edit these