[ prog / sol / mona ]

prog


SchemeBBS [part 2]

29 2020-08-02 13:31

lib/markup.scm:bold

(define bold
  (transform-rule
    'bold
    (irregex  "\\*\\*[^ ].*?[^ ]\\*\\*|\\*\\*[^ ]\\*\\*")
    (lambda (sub) `(b ,(substring sub 2 (- (string-length sub) 2))))))

(define italic
  (transform-rule
    'italic
    (irregex  "__[^ ].*?[^ ]__|__[^ ]__")
    (lambda (sub) `(i ,(substring sub 2 (- (string-length sub) 2))))))

(define code
  (transform-rule
    'code
    (irregex  "==[^ ].*?[^ ]==|==[^ ]==")
    (lambda (sub) `(code ,(substring sub 2 (- (string-length sub) 2))))))

(define del
  (transform-rule
    'del
    (irregex "~~[^ ].*?[^ ]~~|~~[^ ]~~")
    (lambda (sub) `(del ,(substring sub 2 (- (string-length sub) 2))))))

This was obviously replicated through copypasting so the error in handling single-character content is shared by all four transform-rules:

**M**agneto**H**ydro**D**ynamics
__M__agneto__H__ydro__D__ynamics
==M==agneto==H==ydro==D==ynamics
~~M~~agneto~~H~~ydro~~D~~ynamics

M**agnetoHydroD**ynamics
M__agnetoHydroD__ynamics
M==agnetoHydroD==ynamics
M~~agnetoHydroD~~ynamics

The source of the bug is that the branch intended for at least two characters can run over a match intended for the other branch. A solution that does not depend on the order of alternation nor on irregex's mercurial leftmost longest semantics is to exclude the intersection of the two branches using negative lookahead.

$ guile --no-auto-compile -l deps/irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-match-substring (irregex-search "==[^ ].*?[^ ]==|==[^ ]==" "==a==b c==d=="))
$1 = "==a==b c=="
scheme@(guile-user)> (irregex-match-substring (irregex-search "==[^ ](?!==).*?[^ ]==|==[^ ]==" "==a==b c==d=="))
$2 = "==a=="
scheme@(guile-user)> 

The same fix applies to all four transform-rules above.

112


VIP:

do not edit these