[ prog / sol / mona ]

prog


What are you working on?

138 2020-07-13 22:49

Here is a related bug. We might want to add 'start' to the start of every line, and we might have a file with some number of empty lines.

$ for k in $(seq 1 6); do echo ""; done > emptylines.txt
$ hd emptylines.txt 
00000000  0a 0a 0a 0a 0a 0a                                 |......|
00000006

We might use sed:

$ sed -e 's/^/start/' emptylines.txt 
start
start
start
start
start
start
$ 

We might use regular expressions in python:

$ python3
[...]
>>> s = "\n" * 6
>>> s
'\n\n\n\n\n\n'
>>> import re
>>> re.sub ("(?m)^", "start", s)
'start\nstart\nstart\nstart\nstart\nstart\nstart'
>>> print (_)
start
start
start
start
start
start
start
>>> 

Sed doesn't have a match at the very end of the input because it uses the text file convention for a final newline. We might also use irregex:

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)> (display $1)
start

start
start
start
start
startscheme@(guile-user)> 

There is no match at the start of the second line, when there clearly should be. We'll add the usual verbosity to see what happens:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-13 20:15:27.347747147 +0000
@@ -3419,11 +3419,14 @@
                (fail))))
         ((bol)
          (lambda (cnk init src str i end matches fail)
+           (simple-format #t "bol ~S ~A" src i)
            (if (or (and (eq? src (car init)) (eqv? i (cdr init)))
                    (and (> i ((chunker-get-start cnk) src))
                         (eqv? #\newline (string-ref str (- i 1)))))
-               (next cnk init src str i end matches fail)
-               (fail))))
+               (begin (display " -> yes\n")
+                      (next cnk init src str i end matches fail))
+               (begin (display " -> no\n")
+                      (fail)))))
         ((bow)
          (lambda (cnk init src str i end matches fail)
            (if (and (if (> i ((chunker-get-start cnk) src))

Rerunning the test:

$ guile --no-auto-compile -l irregex2.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 1 6) 1 -> no
bol ("\n\n\n\n\n\n" 1 6) 2 -> yes
bol ("\n\n\n\n\n\n" 2 6) 2 -> no
bol ("\n\n\n\n\n\n" 2 6) 3 -> yes
bol ("\n\n\n\n\n\n" 3 6) 3 -> no
bol ("\n\n\n\n\n\n" 3 6) 4 -> yes
bol ("\n\n\n\n\n\n" 4 6) 4 -> no
bol ("\n\n\n\n\n\n" 4 6) 5 -> yes
bol ("\n\n\n\n\n\n" 5 6) 5 -> no
bol ("\n\n\n\n\n\n" 5 6) 6 -> yes
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)> 

There is no match at index 1, and every location from 2 to 5 is retested after success. Both of these bugs are in irregex-fold/fast which completely mishandles empty matches.

199


VIP:

do not edit these