[1/3]
The final markup bug for this group of posts involves m3 of between-code? >>27. The following properties hold: the m3 scan is unconditional in between-code?, it is a linear scan potentially to the end of s2 and it is called in a loop by string->sxml-rec. As a result, the overall behavior of m3 within string->sxml is quadratic. This is a bad idea because it invites a stress test on the loop count, which can be controlled by packing spoilers tightly together. Calling append via append-element in a loop also yields quadratic behavior, which should be replaced by the usual backward consing followed by reverse, but unlike the case of m3 the counts involved are insufficient to become a problem. Here is a subset of markup procedures that can be used to exercise Bitdiddle's m3:
$ cat test-m3.scm
(define (timeit proc)
(with-timings proc
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline))))
(define (string->sxml markup s)
(define (string->sxml-rec s res)
(let ((match (irregex-search (regex markup) s)))
(cond ((string-null? s)
res)
((not match)
(append-element res s))
(else
(let* ((start (irregex-match-start-index match))
(end (irregex-match-end-index match))
(substr (irregex-match-substring match))
(s1 (substring s 0 start))
(s2 (substring s end (string-length s))))
(if (string-null? s1)
(string->sxml-rec
s2
(append-element res ((transform markup) substr)))
(if (and (eq? (name markup) 'del) ;; exception to escape spoiler inside code
(between-code? s1 s2))
(string->sxml-rec "" (append-element res (string-append s1 substr s2)))
(string->sxml-rec
s2
(append-element res s1 ((transform markup) substr))))))))))
(string->sxml-rec s '()))
(define (append-element l . e)
(append l e))
;; edge false positive (between-code? "==code== ==code==" "==")
;; could add another pass of spoiler, but ok good-enough
(define (between-code? s1 s2)
(let ((m1 (irregex-search (irregex ".*==$|.*==[^ ]") s1)) ;opening code in s1
(m2 (irregex-search (irregex ".*[^ ]==") s1)) ;closing code in s1
(m3 (irregex-search (irregex "^==|.*?[^ ]==") s2)) ;closing code in s2
(imei irregex-match-end-index))
(if (and m1 m3 (or (not m2) (>= (imei m1) (imei m2))))
#t
#f)))
(define (transform-rule name regex transform)
(define (dispatch op)
(cond ((eq? op 'name) name)
((eq? op 'regex) regex)
((eq? op 'transform) transform)))
dispatch)
(define (transform markup) (apply markup '(transform)))
(define (regex markup) (apply markup '(regex)))
(define (name markup) (apply markup '(name)))
(define code
(transform-rule
'code
(irregex "==[^ ].*?[^ ]==|==[^ ]==")
(lambda (sub) `(code ,(substring sub 2 (- (string-length sub) 2))))))
(define del
(transform-rule
'del
(irregex "~~[^ ].*?[^ ]~~|~~[^ ]~~")
(lambda (sub) `(del ,(substring sub 2 (- (string-length sub) 2))))))
Here are some timings for input sizes much smaller than the original post size limit:
$ mit-scheme --load deps/irregex.scm --load test-m3.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "deps/irregex.scm"... done
;Loading "test-m3.scm"... done
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 20 "a~~xx~~"))) 'ok))
1.31 .02 1.33
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 30 "a~~xx~~"))) 'ok))
3.98 .03 4.008
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 40 "a~~xx~~"))) 'ok))
9.71 .13 9.744
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 50 "a~~xx~~"))) 'ok))
17.34 .2 17.538
;Value: ok
At a count of 40 spoiler processing already takes ten seconds, and much higher runtimes can be obtained by increasing the count. The solution is to never rescan a section of between-code?:s2 that has already been covered, which brings the order of growth down to linear. However, since the four types of markup bugs covered in this series are interrelated, they require changes to the same parts of lib/markup.scm, so here is a combined fix for all four: