Here is a related bug. We might want to add 'start' to the start of every line, and we might have a file with some number of empty lines.
$ for k in $(seq 1 6); do echo ""; done > emptylines.txt
$ hd emptylines.txt
00000000 0a 0a 0a 0a 0a 0a |......|
00000006
We might use sed:
$ sed -e 's/^/start/' emptylines.txt
start
start
start
start
start
start
$
We might use regular expressions in python:
$ python3
[...]
>>> s = "\n" * 6
>>> s
'\n\n\n\n\n\n'
>>> import re
>>> re.sub ("(?m)^", "start", s)
'start\nstart\nstart\nstart\nstart\nstart\nstart'
>>> print (_)
start
start
start
start
start
start
start
>>>
Sed doesn't have a match at the very end of the input because it uses the text file convention for a final newline. We might also use irregex:
$ guile --no-auto-compile -l irregex.scm
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)> (display $1)
start
start
start
start
start
startscheme@(guile-user)>
There is no match at the start of the second line, when there clearly should be. We'll add the usual verbosity to see what happens:
$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm 2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm 2020-07-13 20:15:27.347747147 +0000
@@ -3419,11 +3419,14 @@
(fail))))
((bol)
(lambda (cnk init src str i end matches fail)
+ (simple-format #t "bol ~S ~A" src i)
(if (or (and (eq? src (car init)) (eqv? i (cdr init)))
(and (> i ((chunker-get-start cnk) src))
(eqv? #\newline (string-ref str (- i 1)))))
- (next cnk init src str i end matches fail)
- (fail))))
+ (begin (display " -> yes\n")
+ (next cnk init src str i end matches fail))
+ (begin (display " -> no\n")
+ (fail)))))
((bow)
(lambda (cnk init src str i end matches fail)
(if (and (if (> i ((chunker-get-start cnk) src))
Rerunning the test:
$ guile --no-auto-compile -l irregex2.scm
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 1 6) 1 -> no
bol ("\n\n\n\n\n\n" 1 6) 2 -> yes
bol ("\n\n\n\n\n\n" 2 6) 2 -> no
bol ("\n\n\n\n\n\n" 2 6) 3 -> yes
bol ("\n\n\n\n\n\n" 3 6) 3 -> no
bol ("\n\n\n\n\n\n" 3 6) 4 -> yes
bol ("\n\n\n\n\n\n" 4 6) 4 -> no
bol ("\n\n\n\n\n\n" 4 6) 5 -> yes
bol ("\n\n\n\n\n\n" 5 6) 5 -> no
bol ("\n\n\n\n\n\n" 5 6) 6 -> yes
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)>
There is no match at index 1, and every location from 2 to 5 is retested after success. Both of these bugs are in irregex-fold/fast which completely mishandles empty matches.