Adding the checks recommended in >>16 >>18 >>20 is trivial but fixing string-split takes more than a line or two. Here is how string-split operates:
$ mit-scheme --load lib/utils.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "lib/utils.scm"... done
1 ]=> (string-split "" #\-)
;Value: ()
1 ]=> (string-split "-a-b--c-d-" #\-)
;Value 14: ("a" "b" "c" "d")
Its sibling string-split* has the right idea for tco
;;; this version of string-split doesn't trim the leading separators
;;; (string-split "/usr/local/bin") => ("" "usr" "local" "bin")
(define (string-split* sep str)
(define (f cs xs) (cons (list->string (reverse cs)) xs))
(let loop ((ss (string->list str)) (cs '()) (xs '()))
(cond ((null? ss) (reverse (if (null? cs) xs (f cs xs))))
((char=? (car ss) sep) (loop (cdr ss) '() (f cs xs)))
(else (loop (cdr ss) (cons (car ss) cs) xs)))))
but has an asymmetry bug whereby it keeps a leading empty piece but drops a trailing one:
1 ]=> (string-split* #\- "")
;Value: ()
1 ]=> (string-split* #\- "-a-b--c-d-")
;Value 15: ("" "a" "b" "" "c" "d")
This is fine for its intended use in lib/markup.scm:block-scanner but makes it an incorrect split function. This in turn is because the split of a reversed input should be the split of the original input with the pieces and the order of the pieces reversed. String-split* violates this:
1 ]=> (string-split* #\- "-ab-cd-")
;Value 13: ("" "ab" "cd")
1 ]=> (string-split* #\- (reverse-string "-ab-cd-"))
;Value 14: ("" "dc" "ba")
In addition string-split* uses string->list to break the string up into individual characters, even within runs of non-separators, and it calls char=? on every character. Both of these are easily avoided by taking advantage of the built-in procedures for character based string search:
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Searching-Strings.html#index-substring_002dfind_002dnext_002dchar-982
Here is the asymmetry and efficiency fix for string-split*:
(define (string-split-tco str char)
(define (iter result now end)
(if (= now end)
(cons "" result)
(let ((pos (substring-find-next-char str now end char)))
(if pos
(iter (cons (substring str now pos) result) (+ pos 1) end)
(cons (substring str now end) result)))))
(reverse (iter '() 0 (string-length str))))
Operation:
1 ]=> (string-split-tco "" #\-)
;Value 15: ("")
1 ]=> (string-split-tco "-a-b--c-d-" #\-)
;Value 16: ("" "a" "b" "" "c" "d" "")
1 ]=> (string-split-tco "-ab-cd-" #\-)
;Value 17: ("" "ab" "cd" "")
1 ]=> (string-split-tco (reverse-string "-ab-cd-") #\-)
;Value 18: ("" "dc" "ba" "")
This can be easily used to get a fixed version of string-split that doesn't overflow the stack:
(define (string-split-tco-noempty str char)
(define (iter result now end)
(if (= now end)
result
(let ((pos (substring-find-next-char str now end char)))
(if pos
(iter (if (= now pos) result (cons (substring str now pos) result)) (+ pos 1) end)
(cons (substring str now end) result)))))
(reverse (iter '() 0 (string-length str))))
Operation:
1 ]=> (string-split-tco-noempty "" #\-)
;Value: ()
1 ]=> (string-split-tco-noempty "-a-b--c-d-" #\-)
;Value 19: ("a" "b" "c" "d")
The -tco and -tco-noempty versions can be unified via a filtering lambda, but the cost is a small drop in efficiency and for such a basic tool like splitting I didn't feel that the efficiency loss was justified.
1 ]=> (define (test n) (string-split-tco-noempty (apply string-append (make-list n "&x=x")) #\&) 'ok)
;Value: test
1 ]=> (test 50000)
;Value: ok
1 ]=> (test 100000)
;Value: ok
Even though string-split-tco-noempty gets rid of the stack overflow, I still recommend the 'body' length check from >>20 as well.