[ prog / sol / mona ]

prog


How can I run my own instance of this

103 2020-03-01 11:20

[4/5]
By contrast, irregex-search's driver for sre->procedure is irregex-search/backtrack:

(define (irregex-search/backtrack irx cnk init src i matches)
  (let ((matcher (irregex-nfa irx))
        (str ((chunker-get-str cnk) src))
        (end ((chunker-get-end cnk) src))
        (get-next (chunker-get-next cnk)))
    (if (flag-set? (irregex-flags irx) ~searcher?)
        (matcher cnk init src str i end matches (lambda () #f))
        (let lp ((src2 src)
                 (str str)
                 (i i)
                 (end end))
          (cond
           ((matcher cnk init src2 str i end matches (lambda () #f))
            (irregex-match-start-chunk-set! matches 0 src2)
            (irregex-match-start-index-set! matches 0 i)
            matches)
           ((< i end)
            (lp src2 str (+ i 1) end))
           (else
            (let ((src2 (get-next src2)))
              (if src2
                  (lp src2
                      ((chunker-get-str cnk) src2)
                      ((chunker-get-start cnk) src2)
                      ((chunker-get-end cnk) src2))
                  #f))))))))

If there was no match, it advances one character or acquires more input. But if there was a match, it is immediately returned. What is completely missing is any backtracking to attempt to find a longer match at the current position, to achieve "leftmost, longest". This procedure performs backtracking in the same way that the DPRK is democratic, all of the backtracking is in the procedure name, none in the code. According to git blame, this way of doing backtracking is from the "http-url fix to support multiple directories in the path" commit by ashinn on "Aug 15, 2012".
https://github.com/ashinn/irregex/commit/3de590ee0b517ab786a42a6a75922890e7972acf
Here is an example to illustrate the underlying issue. Consider 5-groups and 3-groups on an input of 9:

scheme@(guile-user)> (imsim '(** 1 1 (+ (or "aaaaa" "aaa"))) "aaaaaaaaa")
trying aaaaa at 0
trying aaaaa at 5
trying aaa at 5
trying aaaaa at 8
trying aaa at 8
retry by irregex-match/chunked
retry by irregex-match/chunked
trying aaa at 0
trying aaaaa at 3
trying aaaaa at 8
trying aaa at 8
retry by irregex-match/chunked
trying aaa at 3
trying aaaaa at 6
trying aaa at 6
trying aaaaa at 9
trying aaa at 9
$1 = "aaaaaaaaa"
scheme@(guile-user)> 

In order to get the longest overall match, the 5-group needs to back off twice, at 0 and at 3. The (or) itself did nothing wrong here, it provided the longest match it could at each step, due to the order of alternatives. It just so happens that in this case those are not the paths that lead to overall success. In the general case, proper backtracking is not a gentle suggestion but a hard requirement. As this example shows, and as one might recall from paying attention in college, the crux of the matter is this: regex matching can possess global maxima that are assembled entirely from local minima.

301


VIP:

do not edit these