[ prog / sol / mona ]

prog


What are you working on?

154 2020-07-17 20:09

Here is the irregex-split fix consistent with the irregex-fold/fast fix >>140 and the irregex-replace/all fix >>141. The accumulator is a vector because that is ashinn's preferred structure for records throughout irregex.scm. Where an edge case could be reasonably decided either way, the existng 0.9.8 behavior on searching for an empty regex >>139 was used as guidance.

(define (irregex-split irx str . o)
  (if (not (string? str)) (error "irregex-split: not a string" str))
  (let ((start (if (pair? o) (car o) 0))
        (end (if (and (pair? o) (pair? (cdr o))) (cadr o) (string-length str))))
    (irregex-fold/fast
     irx
     (lambda (i m a)
       (let* ((msi   (%irregex-match-start-index m 0))
              (mei   (%irregex-match-end-index m 0))
              (empty (= msi mei))
              (lst   (vector-ref a 0))
              (pos   (vector-ref a 1)))
         (cond ((not empty)
                  (vector-set! a 0 (cons (substring str pos msi) lst))
                  (vector-set! a 1 mei)
                  (vector-set! a 2 #f))
               ((< pos msi)
                  (vector-set! a 0 (cons (substring str pos msi) lst))
                  (vector-set! a 1 msi)
                  (vector-set! a 2 #t))
               ((> pos start)
                  (vector-set! a 0 (cons "" lst))
                  (vector-set! a 2 #t))
               (else
                  (vector-set! a 2 #t)))
         a))
     (let ((acc (make-vector 3)))
       (vector-set! acc 0 '())
       (vector-set! acc 1 start)
       (vector-set! acc 2 #t)
       acc)
     str
     (lambda (i a)
       (let* ((lst   (vector-ref a 0))
              (pos   (vector-ref a 1))
              (empty (vector-ref a 2)))
         (reverse (cond
           ((< pos end)
              (cons (substring str pos end) lst))
           (empty lst)
           (else  (cons "" lst))))))
     start
     end)))

The key is to decouple the match search position used internally by irregex-fold/fast from the split position used internally by irregex-split. The failing tests from >>143 are corrected:

$ guile --no-auto-compile -l irregex2.scm
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-split "-" "-a-")
$1 = ("" "a" "")
scheme@(guile-user)> (irregex-split "a" "---a")
$2 = ("---" "")
scheme@(guile-user)> (irregex-split "a" "---aa")
$3 = ("---" "" "")
scheme@(guile-user)> (irregex-split "a" "aaa")
$4 = ("" "" "" "")
scheme@(guile-user)> (irregex-split 'any "aaa")
$5 = ("" "" "" "")

The 'bol test >>152 is also corrected:

scheme@(guile-user)> (irregex-split 'bol "\n\n\n\n\n\n")
$6 = ("\n" "\n" "\n" "\n" "\n" "\n")

And some more edge cases:

scheme@(guile-user)> (irregex-split "" "")
$7 = ()
scheme@(guile-user)> (irregex-split "" "aaa")
$8 = ("a" "a" "a")
scheme@(guile-user)> (irregex-split 'bos "aaa")
$9 = ("aaa")
scheme@(guile-user)> (irregex-split 'eos "aaa")
$10 = ("aaa")
scheme@(guile-user)> (irregex-split "(?<=a)" "aaa")
$11 = ("a" "a" "a")
scheme@(guile-user)> (irregex-split "(?=a)" "aaa")
$12 = ("a" "a" "a")

The three fixes can be applied as a group, and the 0.9.8 hack to wrap-end-chunker >>137 can be reverted.

>>153

just get to the point

Already answered in >>145.

journals

1. Already answered in >>145.
2. You may not be familiar with the opening post of the thread you choose to visit of your own free will:

This is the type of thread that you can write in every time you visit the site! You can do anything from use this as a personal log for your project, to sharing what you're hacking on at the moment, or to critique something some else did in the thread. I'm looking forward to hearing what you guys are up to!

199


VIP:

do not edit these