prog

frontpage - thread list - new thread - preferences - ?

SchemeBBS [part 2]

21 2020-07-03 03:45

Adding the checks recommended in >>16 >>18 >>20 is trivial but fixing string-split takes more than a line or two. Here is how string-split operates:

$ mit-scheme --load lib/utils.scm 
[...]
  Release 9.1.1     || Microcode 15.3 || Runtime 15.7 || SF 4.41
  LIAR/x86-64 4.118 || Edwin 3.116
;Loading "lib/utils.scm"... done

1 ]=> (string-split "" #\-)
;Value: ()
1 ]=> (string-split "-a-b--c-d-" #\-)
;Value 14: ("a" "b" "c" "d")

Its sibling string-split* has the right idea for tco

;;; this version of string-split doesn't trim the leading separators
;;; (string-split "/usr/local/bin") => ("" "usr" "local" "bin")
(define (string-split* sep str)
  (define (f cs xs) (cons (list->string (reverse cs)) xs))
  (let loop ((ss (string->list str)) (cs '()) (xs '()))
    (cond ((null? ss) (reverse (if (null? cs) xs (f cs xs))))
          ((char=? (car ss) sep) (loop (cdr ss) '() (f cs xs)))
          (else (loop (cdr ss) (cons (car ss) cs) xs)))))

but has an asymmetry bug whereby it keeps a leading empty piece but drops a trailing one:

1 ]=> (string-split* #\- "")
;Value: ()
1 ]=> (string-split* #\- "-a-b--c-d-")
;Value 15: ("" "a" "b" "" "c" "d")

This is fine for its intended use in lib/markup.scm:block-scanner but makes it an incorrect split function. This in turn is because the split of a reversed input should be the split of the original input with the pieces and the order of the pieces reversed. String-split* violates this:

1 ]=> (string-split* #\- "-ab-cd-")
;Value 13: ("" "ab" "cd")
1 ]=> (string-split* #\- (reverse-string "-ab-cd-"))
;Value 14: ("" "dc" "ba")

In addition string-split* uses string->list to break the string up into individual characters, even within runs of non-separators, and it calls char=? on every character. Both of these are easily avoided by taking advantage of the built-in procedures for character based string search:
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Searching-Strings.html#index-substring_002dfind_002dnext_002dchar-982
Here is the asymmetry and efficiency fix for string-split*:

(define (string-split-tco str char)
   (define (iter result now end)
      (if (= now end)
          (cons "" result)
          (let ((pos (substring-find-next-char str now end char)))
             (if pos
                 (iter (cons (substring str now pos) result) (+ pos 1) end)
                 (cons (substring str now end) result)))))
   (reverse (iter '() 0 (string-length str))))

Operation:

1 ]=> (string-split-tco "" #\-)
;Value 15: ("")
1 ]=> (string-split-tco "-a-b--c-d-" #\-)
;Value 16: ("" "a" "b" "" "c" "d" "")

1 ]=> (string-split-tco "-ab-cd-" #\-)
;Value 17: ("" "ab" "cd" "")
1 ]=> (string-split-tco (reverse-string "-ab-cd-") #\-)
;Value 18: ("" "dc" "ba" "")

This can be easily used to get a fixed version of string-split that doesn't overflow the stack:

(define (string-split-tco-noempty str char)
   (define (iter result now end)
      (if (= now end)
          result
          (let ((pos (substring-find-next-char str now end char)))
             (if pos
                 (iter (if (= now pos) result (cons (substring str now pos) result)) (+ pos 1) end)
                 (cons (substring str now end) result)))))
   (reverse (iter '() 0 (string-length str))))

Operation:

1 ]=> (string-split-tco-noempty "" #\-) 
;Value: ()
1 ]=> (string-split-tco-noempty "-a-b--c-d-" #\-)
;Value 19: ("a" "b" "c" "d")

The -tco and -tco-noempty versions can be unified via a filtering lambda, but the cost is a small drop in efficiency and for such a basic tool like splitting I didn't feel that the efficiency loss was justified.

1 ]=> (define (test n) (string-split-tco-noempty (apply string-append (make-list n "&x=x")) #\&) 'ok)
;Value: test

1 ]=> (test 50000)
;Value: ok
1 ]=> (test 100000)
;Value: ok

Even though string-split-tco-noempty gets rid of the stack overflow, I still recommend the 'body' length check from >>20 as well.

112