Let's keep the questions about installation and code reviews in one thread.
previous thread http://textboard.org/prog/39
Thanks a lot, I will probably add these worthwhile optimizations. I was thinking about changing the redirection to an absolute path too because, sadly, patching http-syntax.scm has been an exceedingly high barrier entry for many persons who have shown their interest. Updating that file (and the broken httpio.scm) to follow the new RFC might still be possible in future releases of MIT Scheme 10, but I have only recently tried porting SchemeBBS to the newer version. (The 10.1.10 dist you could download from the official site was broken for a year, I've just noticed it's been replaced by 10.1.11, 2020/06/04)
I have a cute idea for some really fun and unique feature that should be very easy to implement but no spoiler.
Have you sent your patches upstream?
https://textboard.org/prog/39#t39p126
On the other hand, a lot of functions are undocumented in MIT Scheme, and you have to read the source code if you want to use them. And there isn't exactly a huge community of users outside of the academic world. I wouldn't know where to ask for help.
... and I'm stuck.
https://textboard.org/prog/39#t39p208
There's a stub for an antispam system that was never implemented because I had problems with mit-scheme cryptographic functions.
The contexts are usable as shown in mcrypt-encrypt-port, but they are only needed for efficiency in tight loops or where mcrypt-encrypt-port is unsuitable. Otherwise mcrypt-encrypt-port is more convenient than manual contexts, with one caveat. While the function is forgiving on the IV string's length, it is strict on the input size of block modes, so padding responsibility falls on the caller. Here is a small demo with NUL padding appropriate for text.
$ cat test.scm
(define (simple-mcrypt algo mode padunit ivlen)
(define (pad s)
(let* ((slen (string-length s))
(mod (modulo slen padunit)))
(if (= mod 0)
s
(string-pad-right s (+ slen (- padunit mod)) #\NUL))))
(define trimcs (char-set-invert (char-set #\NUL)))
(lambda (text key encrypt?)
(let ((out (with-input-from-string
(if encrypt?
(pad text)
text)
(lambda ()
(call-with-output-string
(lambda (port)
(mcrypt-encrypt-port
algo
mode
(current-input-port)
port
key
(make-string ivlen #\NUL)
encrypt?)))))))
(if encrypt?
out
(string-trim-right out trimcs)))))
Usage example with basic cbc AES:
$ mit-scheme --load test.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
1 ]=> (define aes (simple-mcrypt "rijndael-128" "cbc" 16 16))
;Value: aes
1 ]=> (define key (md5-string "secret key"))
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmhash.so"... done
;Value: key
1 ]=> (define text "How can I run my own instance of this")
;Value: text
1 ]=> (mcrypt-available?)
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmcrypt.so"... done
;Value: #t
1 ]=> (define cytext (aes text key #t))
;Value: cytext
1 ]=> (md5-sum->hexadecimal cytext)
;Value 13: "7944266105c58cacba328bb8aa98859ba4eabf4880d4f715890b1b9d8a0237a00445293cbcbcc0f205db1aa0f0d63c73"
1 ]=> (define text2 (aes cytext key #f))
;Value: text2
1 ]=> text2
;Value 14: "How can I run my own instance of this"
1 ]=> (string=? text text2)
;Value: #t
1 ]=>
To encrypt arbitrary input a proper padding scheme is needed instead of this small demo with NULs. Files can be processed in the obvious way by substituting call-with-binary-input-file and call-with-binary-output-file.
Here is an upgrade of >>3 with selectable padding scheme and the addition of PKCS#7 padding.
$ cat test.scm
(define (simple-mcrypt algo mode padunit ivlen padin padout)
(lambda (text key encrypt?)
(let ((out (with-input-from-string
(if encrypt?
(padin text padunit)
text)
(lambda ()
(call-with-output-string
(lambda (port)
(mcrypt-encrypt-port
algo
mode
(current-input-port)
port
key
(make-string ivlen #\NUL)
encrypt?)))))))
(if encrypt?
out
(padout out padunit)))))
(define (test-simple-mcrypt engine text key)
(let* ((hidden (engine text key #t))
(revealed (engine hidden key #f)))
(string=? text revealed)))
(define (padin-nul s unit)
(let* ((slen (string-length s))
(mod (modulo slen unit)))
(if (= mod 0)
s
(string-pad-right s (+ slen (- unit mod)) #\NUL))))
(define (padout-nul)
(define trimcs (char-set-invert (char-set #\NUL)))
(lambda (s unit)
(string-trim-right s trimcs)))
(define (padin-pkcs7 s unit)
(let* ((slen (string-length s))
(mod (modulo slen unit))
(used (- unit mod)))
(string-pad-right s (+ slen used) (ascii->char used))))
(define (padout-pkcs7 s unit)
(let* ((slen (string-length s))
(used (char->ascii (string-ref s (- slen 1)))))
(string-head s (- slen used))))
Usage example with cbc AES-256. Strings of NULs are no problem with PKCS#7.
$ mit-scheme --load test.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
1 ]=> (mcrypt-available?)
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmcrypt.so"... done
;Value: #t
1 ]=> (define aes (simple-mcrypt "rijndael-256" "cbc" 16 16 padin-nul (padout-nul)))
;Value: aes
1 ]=> (define key (string-append (md5-string "secret key") (md5-string "part two")))
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmhash.so"... done
;Value: key
1 ]=> (define text "How can I run my own instance of this")
;Value: text
1 ]=> (test-simple-mcrypt aes text key)
;Value: #t
1 ]=> (define aes (simple-mcrypt "rijndael-256" "cbc" 16 16 padin-pkcs7 padout-pkcs7))
;Value: aes
1 ]=> (test-simple-mcrypt aes text key)
;Value: #t
1 ]=> (define text (make-string 55 #\NUL))
;Value: text
1 ]=> text
;Value 13: "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
1 ]=> (test-simple-mcrypt aes text key)
;Value: #t
1 ]=>
>>2
I believe it's too late for 9.2 which is an old archived realease. I'll try to submit them for 10.11+ and do it the right way.
>>3,4
That's a game changer, thank you so much for this complete example! I'm very happy someone knowledgeable in MIT Scheme found this place. I did try to feed the correct string size to the functions, I tried padding with null bytes... but to no avail. I don't have a lot of free time these days and I plan to slap that new cool feature on SchemeBBS before everything else. Frankly, had I known before, I would have implemented tripcodes in Scheme. I might stick with MIT Scheme for further devlopments, if I manage to port SchemeBBS to v10.
I also need to get rid of the buggy irregex.scm
. With Anon's invaluable help we found a workaround and that's another bug which wasn't brought to the attention of the maintainer (Alex Shinn), even though Anon thoroughly dug into it. I didn't plan to use the irregex dependency at the beginning. The specs specifically stated "no regex for the parser, no regex anywhere" but I had to finish the project before I found myself in no position to work on it anymore and regexes were the fast lane for some part of the markup line parser.
Hope you reported the irregex bug.
I don't have a lot of free time these days and I plan to slap that new cool feature on SchemeBBS before everything else.
If the issue is free time the performance upgrade you were already given diffs for https://textboard.org/prog/39#t39p291 might be even simpler and faster to add.
and that's another bug which wasn't brought to the attention of the maintainer (Alex Shinn)
From https://textboard.org/prog/39#t39p84
It just needs nice a PR with those reproducible steps.
From https://textboard.org/prog/39#t39p91
I think this might be worth telling the shinnoid. I cannot see any obvious contact info on
http://synthcode.com/scheme/irregex/
but his git commits on
https://github.com/ashinn/irregex
are by "Alex Shinn <alexshinn@gmail.com>". Well, at least he's not one of those protonmail people. If someone could drop him a line, that would be great.
From https://textboard.org/prog/39#t39p104
As I have already asked in >>91 perhaps somebody could bring the shinnoid up to speed.
From https://textboard.org/prog/39#t39p106
Let's file a PR then, maybe it's worth linking to this thread?
From https://textboard.org/prog/39#t39p108
In my opinion he might be sent at least the minimal failure case and the DOS. From there he can figure things out himself faster than I did, since he knows his own library. As for a PR however, I do not care for sites that require any of: accounts, verification and enabling remote code execution, so I will not be interacting with his github myself.
It would appear from https://github.com/ashinn/irregex/issues that Anons who already have github accounts and who do not have any issues with accounts or logins -- like https://textboard.org/prog/39#t39p259 -- did not open an irregex issue for ashinn with a simple link to the bughunt in the intervening months.
[1/2]
Here is an upgrade of >>4. When a file is too large to reasonably fit into mit-scheme's memory, and cannot be temporarily padded in place such as because it is located on read-only media, >>4 cannot handle it. The solution is to apply the padding at the port level, by creating a wrapper port that appends the padding on-the-fly once the wrapped port has been exhausted. Port documentation is at:
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Port-Primitives.html
(define (port-padder iport suffix)
(let* ((sflen (string-length suffix))
(sent #f)
(iread (lambda (ip buf start end)
(let* ((want (- end start))
(got (input-port/read-substring! iport buf start end))
(room (- want got)))
(cond ((= got want) got)
(sent (if (= got 0) 0 'error-zero))
((> sflen room) 'error-fit)
(else (substring-move-left! suffix 0 sflen buf (+ start got))
(set! sent #t)
(+ got sflen))))))
(ops `((read-substring ,iread)))
(itype (make-port-type ops (port/type iport))))
(make-port itype (port/state iport))))
This is a wrapper specifically aimed at crypto.scm:mcrypt-encrypt-port rather than a general one. With file content being arbitrary only the PKCS#7 padding is kept.
; string to append
(define (padadd-pkcs7 slen unit)
(let ((used (- unit (modulo slen unit))))
(make-string used (ascii->char used))))
; length to delete
(define (paddel-pkcs7 stail unit)
(let ((slen (string-length stail)))
(char->ascii (string-ref stail (- slen 1)))))
Here is how to truncate a file in place to remove padding after decryption. This information can be extracted from io.scm and port.scm.
(define (portdo port opsym . rest)
(apply (port/operation port opsym) port rest))
(define (truncate port length)
(channel-file-truncate (portdo port 'output-channel) length))
(define (truncate-padding path paddel unit)
(let* ((size (file-length path))
(want 512)
(take (min size want))
(pos (- size take))
(tail (make-string take))
(f (open-binary-i/o-file path)))
(portdo f 'set-position! pos)
(input-port/read-string! f tail)
(let* ((del (paddel tail unit))
(pos (- size del)))
(cond ((> del 0)
(portdo f 'set-position! pos)
(truncate f pos))))
(close-port f)))
Here is the new engine. It can encrypt and decrypt from a file or string to a file or string. The input and output specifiers are as follows:
- The file specifier for both input and output, and for both encryption and decryption, is a list of two elements holding the symbol 'file followed by the filepath.
- The string input specifier for both encryption and decryption is a list of two elements holding the symbol 'string followed by the string.
- The string output specifier for encryption is a list of two elements holding the symbol 'string followed by an element that is ignored.
- The string output specifier for decryption is a list of two elements holding the symbol 'string followed by a boolean value that controls whether padding should be stripped off. The length of the unpadded output will be returned anyway, so when the result can be used with the padding included, such as via write-substring, this can avoid making a copy of essentially the entire output string.
The return value is as follows:
- For file output the output specifier is returned.
- For string output a list of three elements is returned holding the symbol 'string, followed by the output string and a length.
- For string output encryption the output string is the cyphertext and the length is the cyphertext length.
- For string output decryption the output string is the plaintext with padding stripped off if the boolean value in the second position of the output specifier was true and present otherwise. In either case the length is the unpadded plaintext length.
[2/2]
; in `(file ,path)
; in `(string ,s)
; out `(file ,path)
; out '(string unpad?)
; ret file: outspec
; ret string: `(string ,s ,length)
(define (medium-mcrypt algo mode padunit ivlen padadd paddel)
(lambda (inspec outspec key encrypt?)
(let* ((inwhat (car inspec))
(outwhat (car outspec))
(indata (cadr inspec))
(insize (case inwhat
((file) (file-length indata))
((string) (string-length indata))
(else 'error-inwhat)))
(mep (lambda (outport)
(mcrypt-encrypt-port
algo
mode
(if encrypt?
(port-padder
(current-input-port)
(padadd insize padunit))
(current-input-port))
outport
key
(make-string ivlen #\NUL)
encrypt?)))
(doout (case outwhat
((file) (lambda () (call-with-output-file (cadr outspec) mep)))
((string) (lambda () (call-with-output-string mep)))
(else 'error-outwhat)))
(doin (case inwhat
((file) (lambda () (with-input-from-file indata doout)))
((string) (lambda () (with-input-from-string indata doout)))
(else 'error-inwhat)))
(ret (doin)))
(if encrypt?
(case outwhat
((file) outspec)
((string) `(string ,ret ,(string-length ret))))
(case outwhat
((file) (truncate-padding (cadr outspec) paddel padunit)
outspec)
((string) (let* ((rlen (string-length ret))
(del (paddel ret padunit))
(used (- rlen del)))
(if (cadr outspec)
`(string ,(string-head ret used) ,used)
`(string ,ret ,used)))))))))
Here is an AES cbc example with strings:
$ mit-scheme --load test.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
1 ]=> (mcrypt-available?)
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmcrypt.so"... done
;Value: #t
1 ]=> (define aes (medium-mcrypt "rijndael-128" "cbc" 16 16 padadd-pkcs7 paddel-pkcs7))
;Value: aes
1 ]=> (define key (md5-string "secret key"))
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmhash.so"... done
;Value: key
1 ]=> (define text "How can I run my own instance of this")
;Value: text
1 ]=> (define enc (aes `(string ,text) '(string #f) key #t))
;Value: enc
1 ]=> (car enc)
;Value: string
1 ]=> (caddr enc)
;Value: 48
1 ]=> (md5-sum->hexadecimal (cadr enc))
;Value 13: "7944266105c58cacba328bb8aa98859ba4eabf4880d4f715890b1b9d8a0237a0434d5e497aaee9d0d9ea114437367dc5"
1 ]=> (define dec (aes `(string ,(cadr enc)) '(string #t) key #f))
;Value: dec
1 ]=> dec
;Value 14: (string "How can I run my own instance of this" 37)
1 ]=> (string=? text (cadr dec))
;Value: #t
1 ]=>
And here is an example from file to string to file:
1 ]=> (define enc (aes '(file "bbs.scm") '(string #f) key #t))
;Value: enc
1 ]=> (car enc)
;Value: string
1 ]=> (caddr enc)
;Value: 16592
1 ]=> (define dec (aes `(string ,(cadr enc)) '(file "bbs.scm.dec") key #f))
;Value: dec
1 ]=> dec
;Value 15: (file "bbs.scm.dec")
1 ]=> (string=? (md5-file "bbs.scm") (md5-file "bbs.scm.dec"))
;Value: #t
1 ]=>
$ cmp bbs.scm.dec bbs.scm
$
This removes the need to manually fiddle with paddings and ports.
Correction: an older version of medium-mcrypt >>9 slipped through. Just as truncate-padding >>8 uses binary mode, the same needs to be used in medium-mcrypt, of course.
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/File-Ports.html
Any file can be opened in one of two modes, normal or binary. Normal mode is for accessing text files, and binary mode is for accessing other files. Unix does not distinguish these modes, but Windows do: in normal mode, their file ports perform newline translation, mapping between the carriage-return/linefeed sequence that terminates text lines in files, and the #\newline that terminates lines in Scheme. In binary mode, such ports do not perform newline translation. Unless otherwise mentioned, the procedures in this section open files in normal mode.
(doout (case outwhat
((file) (lambda () (call-with-binary-output-file (cadr outspec) mep)))
((string) (lambda () (call-with-output-string mep)))
(else 'error-outwhat)))
(doin (case inwhat
((file) (lambda () (with-input-from-binary-file indata doout)))
((string) (lambda () (with-input-from-string indata doout)))
(else 'error-inwhat)))
(ret (doin)))
Here is a compatibility experiment between mit-scheme encryption via medium-mcrypt >>10 and java's Cipher. First bbs.scm is encrypted to a file with medium-mcrypt:
$ mit-scheme --load test.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
1 ]=> (mcrypt-available?)
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmcrypt.so"... done
;Value: #t
1 ]=> (define key (md5-string "secret key"))
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmhash.so"... done
;Value: key
1 ]=> (md5-sum->hexadecimal key)
;Value 13: "a7656fafe94dae72b1e1487670148412"
1 ]=> (define aes (medium-mcrypt "rijndael-128" "cbc" 16 16 padadd-pkcs7 paddel-pkcs7))
;Value: aes
1 ]=> (define enc (aes '(file "bbs.scm") '(file "bbs.scm.enc") key #t))
;Value: enc
1 ]=> enc
;Value 13: (file "bbs.scm.enc")
1 ]=>
The same parameters are used in java to decrypt the file:
$ cat Test.java
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.math.BigInteger;
import java.security.GeneralSecurityException;
import java.security.MessageDigest;
import javax.crypto.Cipher;
import javax.crypto.CipherOutputStream;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.SecretKeySpec;
public class Test {
public static void transfer (InputStream is, OutputStream os) throws IOException {
byte [] buf = new byte [65536];
int got;
while ((got = is.read (buf)) != -1) {
os.write (buf, 0, got);
}
}
static void work (String keyseed, String inpath, String outpath) throws GeneralSecurityException, IOException {
MessageDigest md = MessageDigest.getInstance ("MD5");
byte [] key = md.digest (keyseed.getBytes ());
System.out.printf ("key %032x%n", new BigInteger (1, key));
Cipher c = Cipher.getInstance ("AES/CBC/PKCS5Padding");
SecretKeySpec sks = new SecretKeySpec (key, "AES");
IvParameterSpec iv = new IvParameterSpec (new byte [16]);
c.init (Cipher.DECRYPT_MODE, sks, iv);
try (
FileInputStream fis = new FileInputStream (inpath);
FileOutputStream fos = new FileOutputStream (outpath);
BufferedOutputStream bos = new BufferedOutputStream (fos);
CipherOutputStream cos = new CipherOutputStream (bos, c);
) {
transfer (fis, cos);
}
System.out.printf ("%s -> %s%n", inpath, outpath);
}
public static void main (String [] argv) {
try {
work ("secret key", "bbs.scm.enc", "bbs.scm.dec");
} catch (GeneralSecurityException e) {
e.printStackTrace ();
} catch (IOException e) {
e.printStackTrace ();
}
}
}
After decryption the original file has been recovered:
$ javac Test.java
$ java Test
key a7656fafe94dae72b1e1487670148412
bbs.scm.enc -> bbs.scm.dec
$ cmp bbs.scm.dec bbs.scm
$
Here is the reverse compatibility experiment >>11 between libgcrypt https://gnupg.org/related_software/libgcrypt/ and mit-scheme encryption via medium-mcrypt >>10. First bbs.scm is encrypted with libgcrypt:
$ cat src/testgcrypt.c
#include <stdio.h>
#include <string.h>
#include <gcrypt.h>
void transfer (gcry_cipher_hd_t ctx, FILE *fin, FILE *fout, size_t block) {
unsigned char buf [65536];
size_t want = 65536, got;
size_t k, toadd;
int added = 0;
while ((got = fread (buf, 1, want, fin))) {
if (got == want) {
} else {
toadd = block - (got % block);
for (k = 0; k < toadd; k++) {
buf [got + k] = (unsigned char) toadd;
}
got += toadd;
added = 1;
}
gcry_cipher_encrypt (ctx, buf, got, NULL, 0);
fwrite (buf, 1, got, fout);
}
if (!added) {
for (k = 0; k < block; k++) {
buf [k] = (unsigned char) block;
}
gcry_cipher_encrypt (ctx, buf, block, NULL, 0);
fwrite (buf, 1, block, fout);
}
}
void work (char *keyseed, char *inpath, char *outpath) {
unsigned char key [16];
unsigned char iv [16];
size_t k;
gcry_cipher_hd_t ctx;
FILE *fin, *fout;
gcry_md_hash_buffer (GCRY_MD_MD5, key, keyseed, strlen (keyseed));
printf ("%s ", "key");
for (k = 0; k < 16; k++) {
printf ("%02x", key [k]);
iv [k] = 0;
}
printf ("\n");
gcry_cipher_open (&ctx, GCRY_CIPHER_AES128, GCRY_CIPHER_MODE_CBC, 0);
gcry_cipher_setkey (ctx, key, 16);
gcry_cipher_setiv (ctx, iv, 16);
fin = fopen (inpath, "rb");
fout = fopen (outpath, "wb");
transfer (ctx, fin, fout, 16);
fclose (fin);
fclose (fout);
gcry_cipher_close (ctx);
printf ("%s -> %s\n", inpath, outpath);
}
int main (void) {
if (!gcry_check_version (GCRYPT_VERSION)) {
printf ("%s\n", "libgcrypt version mismatch");
return 1;
}
gcry_control (GCRYCTL_DISABLE_SECMEM, 0);
gcry_control (GCRYCTL_INITIALIZATION_FINISHED, 0);
work ("secret key", "bbs.scm", "bbs.scm.enc");
return 0;
}
This is a quick hack for a test, which is why error checking is skipped.
$ gcc -Wall -o bin/testgcrypt src/testgcrypt.c -lgcrypt
$ bin/testgcrypt
key a7656fafe94dae72b1e1487670148412
bbs.scm -> bbs.scm.enc
$
The libgcrypt output is then decrypted with medium-mcrypt and the same parameters.
$ mit-scheme --load test.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
1 ]=> (mcrypt-available?)
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmcrypt.so"... done
;Value: #t
1 ]=> (define key (md5-string "secret key"))
;Loading "/usr/lib/x86_64-linux-gnu/mit-scheme/lib/prmhash.so"... done
;Value: key
1 ]=> (md5-sum->hexadecimal key)
;Value 13: "a7656fafe94dae72b1e1487670148412"
1 ]=> (define aes (medium-mcrypt "rijndael-128" "cbc" 16 16 padadd-pkcs7 paddel-pkcs7))
;Value: aes
1 ]=> (define dec (aes '(file "bbs.scm.enc") '(file "bbs.scm.dec") key #f))
;Value: dec
1 ]=> dec
;Value 14: (file "bbs.scm.dec")
1 ]=> (string=? (md5-file "bbs.scm.dec") (md5-file "bbs.scm"))
;Value: #t
1 ]=>
$ cmp bbs.scm bbs.scm.dec
$
After decryption the original file has been recovered.
https://github.com/alyssa-p-hacker/SchemeBBS/commit/16329f8b70e07b85e1b08d86e37ee7ab5b298a00
fix a bug in blankpost detection
Blankpost test follows.
OK, then. >>14
In post-message:
(frontpage (lookup-def 'frontpage params))
(message (decode-formdata (lookup-def 'epistula params)))
From lookup-def:
((lookup-def key alist)
(let ((nkey key) (nalist alist)) ; evaluate them only once
(let ((res (assq nkey nalist)))
(if res
(let ((res (cdr res)))
(cond
((not (pair? res)) res)
((null? (cdr res)) (car res))
(else res)))
(error "Failed to find " nkey " in " nalist)))))
As a result, posting without frontpage or epistula gives:
HTTP/1.1 502 Bad Gateway
Server: nginx/1.18.0
Date: Wed, 01 Jul 2020 10:35:57 GMT
Content-Type: text/html
Content-Length: 496
Connection: keep-alive
ETag: "5ecc391b-1f0"
Both lookups need a default followed by validation if 502 Bad Gateway is to be avoided.
From "display the list of boards on top of pages" on Jun 11:
(define (make-board-list)
`((p (@ (class "boardlist")) "[ " ,@(list-intersperse
(map (lambda (board) (list 'a `(@ (href ,(string-append "/" board "/") )) board))
*board-list*)
" | ") " ]")))
From "Allow to quote every single posts as comma separated values. Deduplicate regex code" on Jun 11:
(define *board-list* (map pathname-name (cddr (directory-read "data/sexp/*"))))
From https://textboard.org/sexp/
..
mona/
prog/
sandbox/
sol/
From top of the page:
[ mona / prog / sol ]
The same >>16 applies to message/epistula and headline/titulus in post-thread, of course.
Thank you for the code review and fixes. There are much needed security checks to be done on user inputs. The POST server went down with a forged request of thousands of query strings. Spam bots are also more and more aware of this place. I saw some familiar textboard spam ``Am..ox..i..cil.l.in O..n/line Wi&t32hout Pre@scr^ipt%ion'' slipping through. I need to decouple /sandbox/ from the other boards and automate the relaunching of the server asap (for that, a small script will do).
My apologies for not integrating your code yet, I'm mostly afk and offline these days (waiting for a provider to install broadband Internet at my place as a testament to nomadism. I've also dumpster-dived into a worn up but solid and comfortable gamer chair and a classy Directoire style wooden desk. The antique and modern styles don't exactly match but we'll call that a battlestation. With the sound-proof wall of books that I built against the wall behind which dwells another human, we're nearing perfection as far as working conditions are concerned.
I have a mock-up for a little proof of concept feature that I'd like to showcase but this one could be a security nightmare. Even if it's already sandboxed it needs at least to be a jailed process with limits on CPU, RAM and execution time.
-- yours truly
lib/parameters.scm:parameters->alist
(define (parameters->alist p)
(map parameter->pair (string-split p #\&)))
lib/utils.scm:string-split
; Splits the input string 'str into a list of strings
; based on the delimiter character 'ch
; © (Doug Hoyte, hcsw.org)
(define (string-split str ch)
(let ((len (string-length str)))
(letrec
((split
(lambda (a b)
(cond
((>= b len) (if (= a b) '() (cons (substring str a b) '())))
((char=? ch (string-ref str b))
(if (= a b)
(split (+ 1 a) (+ 1 b))
(cons (substring str a b) (split b b))))
(else (split a (+ 1 b)))))))
(split 0 0))))
Because of the split-in-cons line this is recursive rather than iterative. This means that it can be stress tested on the stack depth. In the REPL of MIT/GNU Scheme 9.1.1:
$ mit-scheme --load lib/utils.scm --load lib/parameters.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "lib/utils.scm"... done
;Loading "lib/parameters.scm"... done
1 ]=> (define (test n) (parameters->alist (apply string-append (make-list n "&x=x"))) 'ok)
;Value: test
1 ]=> (test 42308)
;Value: ok
1 ]=> (test 42309)
;Aborting!: maximum recursion depth exceeded
1 ]=>
With an appropriately crafted wget post request textboard.org yields:
HTTP/1.1 502 Bad Gateway
Server: nginx/1.18.0
Date: Thu, 02 Jul 2020 00:41:44 GMT
Content-Type: text/html
Content-Length: 496
Connection: keep-alive
ETag: "5ecc391b-1f0"
The length of 'body' should be validated in post-message and post-thread, before computing 'params'. Something like (* 2 *max-post-size*) should be more than enough. And string-split should be made iterative, in the style of string-split*.
The POST server went down with a forged request of thousands of query strings.
...
nomadism
That sounds cool but please try to stay safe from pandemic stuff.
Adding the checks recommended in >>16 >>18 >>20 is trivial but fixing string-split takes more than a line or two. Here is how string-split operates:
$ mit-scheme --load lib/utils.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "lib/utils.scm"... done
1 ]=> (string-split "" #\-)
;Value: ()
1 ]=> (string-split "-a-b--c-d-" #\-)
;Value 14: ("a" "b" "c" "d")
Its sibling string-split* has the right idea for tco
;;; this version of string-split doesn't trim the leading separators
;;; (string-split "/usr/local/bin") => ("" "usr" "local" "bin")
(define (string-split* sep str)
(define (f cs xs) (cons (list->string (reverse cs)) xs))
(let loop ((ss (string->list str)) (cs '()) (xs '()))
(cond ((null? ss) (reverse (if (null? cs) xs (f cs xs))))
((char=? (car ss) sep) (loop (cdr ss) '() (f cs xs)))
(else (loop (cdr ss) (cons (car ss) cs) xs)))))
but has an asymmetry bug whereby it keeps a leading empty piece but drops a trailing one:
1 ]=> (string-split* #\- "")
;Value: ()
1 ]=> (string-split* #\- "-a-b--c-d-")
;Value 15: ("" "a" "b" "" "c" "d")
This is fine for its intended use in lib/markup.scm:block-scanner but makes it an incorrect split function. This in turn is because the split of a reversed input should be the split of the original input with the pieces and the order of the pieces reversed. String-split* violates this:
1 ]=> (string-split* #\- "-ab-cd-")
;Value 13: ("" "ab" "cd")
1 ]=> (string-split* #\- (reverse-string "-ab-cd-"))
;Value 14: ("" "dc" "ba")
In addition string-split* uses string->list to break the string up into individual characters, even within runs of non-separators, and it calls char=? on every character. Both of these are easily avoided by taking advantage of the built-in procedures for character based string search:
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Searching-Strings.html#index-substring_002dfind_002dnext_002dchar-982
Here is the asymmetry and efficiency fix for string-split*:
(define (string-split-tco str char)
(define (iter result now end)
(if (= now end)
(cons "" result)
(let ((pos (substring-find-next-char str now end char)))
(if pos
(iter (cons (substring str now pos) result) (+ pos 1) end)
(cons (substring str now end) result)))))
(reverse (iter '() 0 (string-length str))))
Operation:
1 ]=> (string-split-tco "" #\-)
;Value 15: ("")
1 ]=> (string-split-tco "-a-b--c-d-" #\-)
;Value 16: ("" "a" "b" "" "c" "d" "")
1 ]=> (string-split-tco "-ab-cd-" #\-)
;Value 17: ("" "ab" "cd" "")
1 ]=> (string-split-tco (reverse-string "-ab-cd-") #\-)
;Value 18: ("" "dc" "ba" "")
This can be easily used to get a fixed version of string-split that doesn't overflow the stack:
(define (string-split-tco-noempty str char)
(define (iter result now end)
(if (= now end)
result
(let ((pos (substring-find-next-char str now end char)))
(if pos
(iter (if (= now pos) result (cons (substring str now pos) result)) (+ pos 1) end)
(cons (substring str now end) result)))))
(reverse (iter '() 0 (string-length str))))
Operation:
1 ]=> (string-split-tco-noempty "" #\-)
;Value: ()
1 ]=> (string-split-tco-noempty "-a-b--c-d-" #\-)
;Value 19: ("a" "b" "c" "d")
The -tco and -tco-noempty versions can be unified via a filtering lambda, but the cost is a small drop in efficiency and for such a basic tool like splitting I didn't feel that the efficiency loss was justified.
1 ]=> (define (test n) (string-split-tco-noempty (apply string-append (make-list n "&x=x")) #\&) 'ok)
;Value: test
1 ]=> (test 50000)
;Value: ok
1 ]=> (test 100000)
;Value: ok
Even though string-split-tco-noempty gets rid of the stack overflow, I still recommend the 'body' length check from >>20 as well.
It would appear that there are two classes of 502s served by textboard.org. When deps/server.scm responds with
(define (500-handler req params err)
`(500 () ,(string-append
"Something went wrong: "
;; TODO: should probably not leak this in production:
(condition/report-string err))))
nginx rewrites this to a 502 but nothing bad like aborting the running instance actually happens. This is why the frontpageless posting >>16 test could be followed up by the explanatory post without any trouble. This class of 502 errors is inelegant but mostly harmless.
The stress tests from the previous thread and the string-split attack >>20 do abort the running instance, and nginx makes a 502 from scratch. These are not quite as harmless, which is why explanations and solutions were posted but not the detailed wget reproduction.
Since the first class of 502s is mostly harmless, there are plenty of them that can be found in MIT/GNU Scheme's HTTP libraries.
Meant to post https://textboard.org/prog/34#t34p132 in this thread. >>21
What happened that even the fossil page and the paster were down?
Operation details:
We have encountered an electrical issue in the rack where your server is located.
Here's the task linked to this incident: http://travaux.ovh.net/?do=details&id=45663
Action and result:
Electrical reboot of the server.
The server is booted on disk and is on the login screen. Ping OK and services are up.
>>25
Cool, thanks.
{french people joke goes here}
lib/markup.scm:string->sxml
(define (string->sxml markup s)
(define (string->sxml-rec s res)
(let ((match (irregex-search (regex markup) s)))
(cond ((string-null? s)
res)
((not match)
(append-element res s))
(else
(let* ((start (irregex-match-start-index match))
(end (irregex-match-end-index match))
(substr (irregex-match-substring match))
(s1 (substring s 0 start))
(s2 (substring s end (string-length s))))
(if (string-null? s1)
(string->sxml-rec
s2
(append-element res ((transform markup) substr)))
(if (and (eq? (name markup) 'del) ;; exception to escape spoiler inside code
(between-code? s1 s2))
(string->sxml-rec "" (append-element res (string-append s1 substr s2)))
(string->sxml-rec
s2
(append-element res s1 ((transform markup) substr))))))))))
(string->sxml-rec s '()))
;; edge false positive (between-code? "==code== ==code==" "==")
;; could add another pass of spoiler, but ok good-enough
(define (between-code? s1 s2)
(let ((m1 (irregex-search (irregex ".*==$|.*==[^ ]") s1)) ;opening code in s1
(m2 (irregex-search (irregex ".*[^ ]==") s1)) ;closing code in s1
(m3 (irregex-search (irregex "^==|.*?[^ ]==") s2)) ;closing code in s2
(imei irregex-match-end-index))
(if (and m1 m3 (or (not m2) (>= (imei m1) (imei m2))))
#t
#f)))
(define (lines->sxml markup l)
(append-map (lambda (e)
(cond ((string? e)
(string->sxml markup e))
((eq? (car e) 'del)
`(,(cons 'del (lines->sxml markup (cdr e)))))
(else `(,e))))
l))
Besides the various types of false positives in between-code?, the "exception to escape spoiler inside code" is also broken in another way. As soon as a spoiler exception is found, all subsequent spoilers on that line are ignored, including those that are completely outside code.
~~one~~ two ~~three~~ ==ab ~~cd~~ ef== gh ~~four~~ ij
one two three ab ~~cd~~ ef
gh ~~four~~ ij
This happens because of the
(string->sxml-rec "" (append-element res (string-append s1 substr s2)))
line which exempts the entire s2 from further spoiler processing. The solution is to also return the end position of the closing code tag from between-code?, and use this information to recurse on the s2 portion after that position in string->sxml-rec.
>>27
k
lib/markup.scm:bold
(define bold
(transform-rule
'bold
(irregex "\\*\\*[^ ].*?[^ ]\\*\\*|\\*\\*[^ ]\\*\\*")
(lambda (sub) `(b ,(substring sub 2 (- (string-length sub) 2))))))
(define italic
(transform-rule
'italic
(irregex "__[^ ].*?[^ ]__|__[^ ]__")
(lambda (sub) `(i ,(substring sub 2 (- (string-length sub) 2))))))
(define code
(transform-rule
'code
(irregex "==[^ ].*?[^ ]==|==[^ ]==")
(lambda (sub) `(code ,(substring sub 2 (- (string-length sub) 2))))))
(define del
(transform-rule
'del
(irregex "~~[^ ].*?[^ ]~~|~~[^ ]~~")
(lambda (sub) `(del ,(substring sub 2 (- (string-length sub) 2))))))
This was obviously replicated through copypasting so the error in handling single-character content is shared by all four transform-rules:
**M**agneto**H**ydro**D**ynamics
__M__agneto__H__ydro__D__ynamics
==M==agneto==H==ydro==D==ynamics
~~M~~agneto~~H~~ydro~~D~~ynamics
M**agnetoHydroD**ynamics
M__agnetoHydroD__ynamics
M==agneto
Hydro
D==ynamics
M~~agnetoHydroD~~ynamics
The source of the bug is that the branch intended for at least two characters can run over a match intended for the other branch. A solution that does not depend on the order of alternation nor on irregex's mercurial leftmost longest semantics is to exclude the intersection of the two branches using negative lookahead.
$ guile --no-auto-compile -l deps/irregex.scm
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-match-substring (irregex-search "==[^ ].*?[^ ]==|==[^ ]==" "==a==b c==d=="))
$1 = "==a==b c=="
scheme@(guile-user)> (irregex-match-substring (irregex-search "==[^ ](?!==).*?[^ ]==|==[^ ]==" "==a==b c==d=="))
$2 = "==a=="
scheme@(guile-user)>
The same fix applies to all four transform-rules above.
The comment of between-code? >>27 identifies one case that yields false positives. It relies on a closing code marker immediately followed by a spoiler, which causes both m1 and m2 to match, and the 'or' will give m1 precedence over m2 in any dispute. False positives can also be obtained by putting any non-space character, such as punctuation, after a code segment:
==one==, ~~two~~, ==three==
one
, ~~two~~, three
This happens because both m1 and m2 look at the last potential marker but ignore everything before it, so they do not have enough information to decide whether what looks like an opening or closing marker actually is one. The solution is to replace m1 and m2 with a scan through s1 using irregex-search and (regex code) to skip valid code segments, then look for an opening marker after the last one.
[1/3]
The final markup bug for this group of posts involves m3 of between-code? >>27. The following properties hold: the m3 scan is unconditional in between-code?, it is a linear scan potentially to the end of s2 and it is called in a loop by string->sxml-rec. As a result, the overall behavior of m3 within string->sxml is quadratic. This is a bad idea because it invites a stress test on the loop count, which can be controlled by packing spoilers tightly together. Calling append via append-element in a loop also yields quadratic behavior, which should be replaced by the usual backward consing followed by reverse, but unlike the case of m3 the counts involved are insufficient to become a problem. Here is a subset of markup procedures that can be used to exercise Bitdiddle's m3:
$ cat test-m3.scm
(define (timeit proc)
(with-timings proc
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline))))
(define (string->sxml markup s)
(define (string->sxml-rec s res)
(let ((match (irregex-search (regex markup) s)))
(cond ((string-null? s)
res)
((not match)
(append-element res s))
(else
(let* ((start (irregex-match-start-index match))
(end (irregex-match-end-index match))
(substr (irregex-match-substring match))
(s1 (substring s 0 start))
(s2 (substring s end (string-length s))))
(if (string-null? s1)
(string->sxml-rec
s2
(append-element res ((transform markup) substr)))
(if (and (eq? (name markup) 'del) ;; exception to escape spoiler inside code
(between-code? s1 s2))
(string->sxml-rec "" (append-element res (string-append s1 substr s2)))
(string->sxml-rec
s2
(append-element res s1 ((transform markup) substr))))))))))
(string->sxml-rec s '()))
(define (append-element l . e)
(append l e))
;; edge false positive (between-code? "==code== ==code==" "==")
;; could add another pass of spoiler, but ok good-enough
(define (between-code? s1 s2)
(let ((m1 (irregex-search (irregex ".*==$|.*==[^ ]") s1)) ;opening code in s1
(m2 (irregex-search (irregex ".*[^ ]==") s1)) ;closing code in s1
(m3 (irregex-search (irregex "^==|.*?[^ ]==") s2)) ;closing code in s2
(imei irregex-match-end-index))
(if (and m1 m3 (or (not m2) (>= (imei m1) (imei m2))))
#t
#f)))
(define (transform-rule name regex transform)
(define (dispatch op)
(cond ((eq? op 'name) name)
((eq? op 'regex) regex)
((eq? op 'transform) transform)))
dispatch)
(define (transform markup) (apply markup '(transform)))
(define (regex markup) (apply markup '(regex)))
(define (name markup) (apply markup '(name)))
(define code
(transform-rule
'code
(irregex "==[^ ].*?[^ ]==|==[^ ]==")
(lambda (sub) `(code ,(substring sub 2 (- (string-length sub) 2))))))
(define del
(transform-rule
'del
(irregex "~~[^ ].*?[^ ]~~|~~[^ ]~~")
(lambda (sub) `(del ,(substring sub 2 (- (string-length sub) 2))))))
Here are some timings for input sizes much smaller than the original post size limit:
$ mit-scheme --load deps/irregex.scm --load test-m3.scm
[...]
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "deps/irregex.scm"... done
;Loading "test-m3.scm"... done
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 20 "a~~xx~~"))) 'ok))
1.31 .02 1.33
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 30 "a~~xx~~"))) 'ok))
3.98 .03 4.008
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 40 "a~~xx~~"))) 'ok))
9.71 .13 9.744
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 50 "a~~xx~~"))) 'ok))
17.34 .2 17.538
;Value: ok
At a count of 40 spoiler processing already takes ten seconds, and much higher runtimes can be obtained by increasing the count. The solution is to never rescan a section of between-code?:s2 that has already been covered, which brings the order of growth down to linear. However, since the four types of markup bugs covered in this series are interrelated, they require changes to the same parts of lib/markup.scm, so here is a combined fix for all four:
[2/3]
--- lib/markup.scm 2020-06-18 10:33:06.355741000 +0000
+++ lib/markup-fix.scm 2020-08-04 13:36:44.240919125 +0000
@@ -94,40 +94,58 @@
(define (string->sxml markup s)
- (define (string->sxml-rec s res)
- (let ((match (irregex-search (regex markup) s)))
- (cond ((string-null? s)
- res)
- ((not match)
- (append-element res s))
- (else
- (let* ((start (irregex-match-start-index match))
- (end (irregex-match-end-index match))
+ (define (string->sxml-iter s res isdel bc-close-failed)
+ (if (string-null? s)
+ res
+ (let ((match (irregex-search (regex markup) s)))
+ (if (not match)
+ (string->sxml-iter "" (cons s res) isdel bc-close-failed)
+ (let* ((start (irregex-match-start-index match))
+ (end (irregex-match-end-index match))
(substr (irregex-match-substring match))
- (s1 (substring s 0 start))
- (s2 (substring s end (string-length s))))
- (if (string-null? s1)
- (string->sxml-rec
- s2
- (append-element res ((transform markup) substr)))
- (if (and (eq? (name markup) 'del) ;; exception to escape spoiler inside code
- (between-code? s1 s2))
- (string->sxml-rec "" (append-element res (string-append s1 substr s2)))
- (string->sxml-rec
- s2
- (append-element res s1 ((transform markup) substr))))))))))
- (string->sxml-rec s '()))
+ (s1 (substring s 0 start))
+ (s2 (substring s end (string-length s))))
+ (cond ((string-null? s1)
+ (string->sxml-iter s2 (cons ((transform markup) substr) res) isdel bc-close-failed))
+ ((and isdel (not bc-close-failed)) ;; exception to escape spoiler inside code
+ (let ((pos (between-code? s1 s2)))
+ (if (< pos 0)
+ (string->sxml-iter s2 (cons ((transform markup) substr) (cons s1 res)) isdel (= pos -2))
+ (string->sxml-iter (string-tail s2 pos) (cons (string-append s1 substr (string-head s2 pos)) res) isdel #f))))
+ (else
+ (string->sxml-iter s2 (cons ((transform markup) substr) (cons s1 res)) isdel bc-close-failed))))))))
+ (reverse (string->sxml-iter s '() (eq? (name markup) 'del) #f)))
-;; edge false positive (between-code? "==code== ==code==" "==")
-;; could add another pass of spoiler, but ok good-enough
(define (between-code? s1 s2)
- (let ((m1 (irregex-search (irregex ".*==$|.*==[^ ]") s1)) ;opening code in s1
- (m2 (irregex-search (irregex ".*[^ ]==") s1)) ;closing code in s1
- (m3 (irregex-search (irregex "^==|.*?[^ ]==") s2)) ;closing code in s2
- (imei irregex-match-end-index))
- (if (and m1 m3 (or (not m2) (>= (imei m1) (imei m2))))
- #t
- #f)))
+ (if (between-code-scan-open s1)
+ (if (and (>= (string-length s2) 2)
+ (substring=? s2 0 2 "==" 0 2))
+ 2
+ (let ((m (irregex-search between-code-irx-close s2)))
+ (if (irregex-match-data? m)
+ (irregex-match-end-index m)
+ -2)))
+ -1))
+
+(define between-code-irx-open
+ (irregex "==[^ ]"))
+
+(define between-code-irx-close
+ (irregex "[^ ]=="))
+
+(define (between-code-scan-open s)
+ (define (iter pos end irx)
+ (let ((match (irregex-search irx s pos end)))
+ (if (not match)
+ pos
+ (iter (irregex-match-end-index match)
+ end irx))))
+ (let* ((slen (string-length s))
+ (pos (iter 0 slen (regex code)))
+ (open (irregex-search between-code-irx-open s pos slen)))
+ (or (irregex-match-data? open)
+ (and (>= (- slen pos) 2)
+ (substring=? s (- slen 2) slen "==" 0 2)))))
(define (lines->sxml markup l)
(append-map (lambda (e)
@@ -152,25 +170,25 @@
(define bold
(transform-rule
'bold
- (irregex "\\*\\*[^ ].*?[^ ]\\*\\*|\\*\\*[^ ]\\*\\*")
+ (irregex "\\*\\*[^ ](?!\\*\\*).*?[^ ]\\*\\*|\\*\\*[^ ]\\*\\*")
(lambda (sub) `(b ,(substring sub 2 (- (string-length sub) 2))))))
(define italic
(transform-rule
'italic
- (irregex "__[^ ].*?[^ ]__|__[^ ]__")
+ (irregex "__[^ ](?!__).*?[^ ]__|__[^ ]__")
(lambda (sub) `(i ,(substring sub 2 (- (string-length sub) 2))))))
(define code
(transform-rule
'code
- (irregex "==[^ ].*?[^ ]==|==[^ ]==")
+ (irregex "==[^ ](?!==).*?[^ ]==|==[^ ]==")
(lambda (sub) `(code ,(substring sub 2 (- (string-length sub) 2))))))
(define del
(transform-rule
'del
- (irregex "~~[^ ].*?[^ ]~~|~~[^ ]~~")
+ (irregex "~~[^ ](?!~~).*?[^ ]~~|~~[^ ]~~")
(lambda (sub) `(del ,(substring sub 2 (- (string-length sub) 2))))))
(define quotelink
[3/3]
This fix is only for the four issues above while staying within the rules inferred from the existing code, not a complete rewrite with new rules. New rules are also possible but those are outside the scope of this fix. Here are the new timings:
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 20 "a~~xx~~"))) 'ok))
.01 0. .01
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 30 "a~~xx~~"))) 'ok))
.01 0. .01
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 40 "a~~xx~~"))) 'ok))
.01 0. .011
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 50 "a~~xx~~"))) 'ok))
.01 0. .011
;Value: ok
1 ]=> (timeit (lambda () (string->sxml del (apply string-append (make-list 1000 "a~~xx~~"))) 'ok))
.12 0. .129
;Value: ok
The runtime for a count of 40 goes down from ten seconds to one hundredth of a second, for a speedup factor of roughly one thousand. The speedup factor increases with the loop count, and 40 is well below what fits within the original post size limit. The post-code spoilers >>27 work:
1 ]=> (string->sxml del "~~one~~ two ~~three~~ ==ab ~~cd~~ ef== gh ~~four~~ ij")
;Value 13: ((del "one") " two " (del "three") " ==ab ~~cd~~ ef==" " gh " (del "four") " ij")
The single-character content >>29 works:
1 ]=> (string->sxml del "~~M~~agneto~~H~~ydro~~D~~ynamics")
;Value 14: ((del "M") "agneto" (del "H") "ydro" (del "D") "ynamics")
And the two types of false positives >>30 are gone:
1 ]=> (string->sxml del "==code== ==code==~~spoiler~~==")
;Value 15: ("==code== ==code==" (del "spoiler") "==")
1 ]=> (string->sxml del "==one==, ~~two~~, ==three==")
;Value 16: ("==one==, " (del "two") ", ==three==")
The other two upgrades pending integration that others running their own instances may wish to apply are the performance enhancement using incremental HTML generation https://textboard.org/prog/39#t39p291 and the fix and additional checks >>21 for the string-split exploit >>20.
I see that in addition to the nginx hack for request size >>23 there is also one for paragraph length. This is not a problem for those of us not running our own instances, but if these are private then it seems the docker people do not have them available.
https://fossil.textboard.org/docker-schemebbs-nginx/artifact/6f4a819fea1ea1fa
There also doesn't seem to be a good reason to disallow posts filled up entirely with one paragraph.
>>31-33
What now?
>>35
Not sure what you mean.
>>36
He wrote an essay, about something that should be fixed or could be improved (i guess), and I want to know what he plans to do with this information now.
HTML5 and semantic elements would be kinda nice for SchemeBBS. Does it use ISO HTML for historical reasons?
>>35 >>37
See the posts-range/filter-func bugfix from a few months ago for how the development cycle works here.
https://textboard.org/prog/39#t39p113
Don't get me wrong, I've applied the patch, even pushed it to the repo. Better code is always welcome.
https://github.com/alyssa-p-hacker/SchemeBBS/commit/68bb8ea833185750f2e18c4d6453e3b5ed3efc8e
The subsequent commit
Allow to quote every single posts as comma separated values. Deduplicate regex code
Jun 11 2020
https://fossil.textboard.org/schemebbs/info/a47867c42f34156a
depends on it because the admin's posts-range aborted the running instance at a count of 183 out of 300.
https://textboard.org/prog/39#t39p112
1 ]=> (timeit (lambda () (posts-range (stress 183))))
;Aborting!: out of memory
;GC #45: took: 0.20 (100%) CPU time, 0.10 (100%) real time; free: 16769613
;GC #46: took: 0.10 (100%) CPU time, 0.10 (93%) real time; free: 16769646
1 ]=> (timeit (lambda () (posts-range64 (stress 183))))
.01 0. .017
;Value 25: #(#f #t #t [...] #t)
For the quadratic spoilers fix those running their own instances, such as the docker people >>34, have the option of waiting for the admin to have enough C**opious **Free Time or temporarily applying the patch themselves. Those of us using this site will wait for the admin, since while the current parsing is buggy and slow it is also usable enough.
>>38 i believe HTML 4.01 is used for aesthetic reasons, as a kind of subtle revolt against modern web. it's a somewhat popular thing to do, of course it's in part nostalgia, part creative restriction, but it also has technical merit: the result works in a really large variety of text browsers, reduced functionality browsers, retro machine browsers, accessible browsers, etc.
accessible browsers
You realize that the HTML5 standard introduces a lot of semantic tags specifically designed to aid accessibility tools interpreting web pages?
E. g. <nav> advises screen readers to omit the content of this tag in the initial rendering of the page to offer it later for further navigation etc.
I guess older versions of CSS are creative restriction but HTML5 really only removes old, unnecessary stuff like <blink> <marquee> and introduces more semantic tags like <section> <main> which have no special functionality other than indicate the type of their contents.
>>40,41
There honestly aren’t too many faults with HTML 5 in comparison to older versions of the standard, the only things that come to mind is the addition of the <canvas> tag, and the switch from W3C to WHATWG, and with the end of the standardization effort. In general XML and HTML especially are a bit nasty in that they’re super wasteful formats, and they aren’t as simple to parse as they should be. If you listen to verisimilitudes even text formats are inefficient and should be replaced, and he’s probably right.
>>41
A noble intention, but how widely are semantic tags actually used by websites and implemented by accessibility tools?
I don't have a disability, but in my limited experience accessibility follows simplicity.
>>41
You can just filter useless emls with a proxy. I don't know what to say about accessibility, it's a disaster like internationalisation.
>>43
accessibility follows simplicity
No.
The discussion over which HTML standard to use is also kinda pointless: modern browser don't really require any specific standard they just will render anything that can be remotely interpreted as a valid HTML page and just ignore the most glaring XML syntax errors etc.
If you listen to verisimilitudes even text formats are inefficient and should be replaced
This doesn't even make sense. What is UTF-8 encoded text at the end of the day than a binary format for… text?
>>46
If I remember correctly the basic idea is that most of the time characters aren’t the most efficient encoding of what we’re writing because we are writing in words. So you could encode to some standard dictionary with a way to reference different standard language dictionaries and a backup system for encoding things not in the dictionaries. It’s been a moment since I’ve read these so if you want more information you’ll have to go back to the source: http://verisimilitudes.net/2020-06-18 http://verisimilitudes.net/2020-06-18
might be more space efficient, but is inflexible (new words, abbreviations, missspellings, even deliberate ones appear every day) and more expensive to process (imagine how huge the lookup table has to be for any application looking to process text in multiple languages.
Just look at how huge Unicode is already just with the aim of representing all character-like symbols we have in use.
The approach might be interesting for archival or something, but you can also just compress text…
>>48
It's not inflexible because if something can't be encoded in a dictionary there is just a fall back to a more typical variable length encoding based on character set, it's additive. Whether or not it would be costly to process seems to depend on the application. Any sort of manipulations of words such as spell checking and word based search would be faster, as would transmission over a network. The size of the lookup table would not matter for anything but embedded machines, which shouldn't be doing text processing anyway. Character based manipulation would be slower, but in most cases it probably shouldn't be used anyway. If you have efficient structured binary encodings of source code, logging, and protocols what would you need efficient character based manipulation for?
>>41
legacy a11y software. this is my experiencing adding a11y feature to a large web based education technology solution: outside of u.s. there's a lot of national packages that are either stuck in pre html5 world, or else they've long adopted a strategy of mistrust, so html5 semantic is bolted on as a hint system, rather then "semantic first". a lot of a11y software sees the world through lynx-like eyes, rather than semantic web.
a11y software not trusting web developers. and for good reason, nerds are good at ocding over taxonomies ("i put <article> in <section> and there's a <summary>, i just orgasmed at how neat everything is"), but then fail to run the page through a screenreader. and we're talking about well-meaning people, nefarious agents might stick html5 elements, because pagerank will somehow go up, or to misdirect bots, or whatever.
semantic mismatch. html5 has incomplete, arbitrary and very much "we came up with this during a meeting" ontology, which fails to represent wide range of web experiences. there's no entities for text boards for example, there's no "post", etc. there's uncertainty about scope. what does "author" apply to for example? containing "article"? the whole page? the effect of this ontology on screen readers can be only partially reasoned from spec (you basically reason back from ARIA), so you have to run your app through a bunch of readers anyway and see the effects. and then you have a constant push/pull between "visual intent" and "semantic intent", and without your developers being constantly aware of the need for later, the later suffers. this subject has been discussed to death.
what we discovered works best is basically to go ARIA first: we encode intent (using our own ontology, expressed as react elements), which is then rendered into tagsoup with aria attributes. "navigation" role (nav in your example) requires more treatment than just "plop it here". it requires further extensive annotation as to order, labels, details, actions, etc of it's actionable elements.
tldr is that oftentimes "simpler html" does translate into "more accessibility". if you can read it in lynx, you can read it with a very wide range of screenreaders. but if you actually want to add accessibility to your pages, it's best to sprinkle aria annotations, that are explicitly about making the experience better for the people who need it.
this is gradually changing, as things like ChromeVox become dominant across the world. poor people move away from donated machines towards cheap chinese chrome machines, but then it's not really "HTML5 semantics!!" it becomes "do what google tells you to do"
Yeah, sounds sane. It also doesn't really hurt using HTML5 tags if they are appropriate, since they are mostly just a tagged <span> or <div>
Can you install this on github.io pages?
>>52
From what I've seen in the past any sort of github.io dynamic content has to be done client side using javascript, and this is a traditional web application with a database, and server side rendering, so it should not work.
>>52, 53
yeah github.io only serves static files, so you won't be able to run SchemeBBS
>>53,54
You can run SQL
https://sql-js.github.io/sql.js/examples/GUI/
A better idea:
https://medium.com/pan-labs/dynamic-web-apps-on-github-pages-for-free-ffac2b776d45
you guys, this is a textboard, written for _MIT Scheme_, that uses html4 for markup with tags in all caps, it's an aesthetic pose. hosting it on github completely defeats the concept. it's kind of like buying an db5 aston martin and then asking if you can use it as a daily commuter. you probably could figure out how, but why bother?
>>57
It's mirrored on Github, AFAIK the main repo is the fossil version: https://fossil.textboard.org/schemebbs/home
Everytime some technology is added to github someone comes up with some use for it which is definitely not intended and just reinvents existing stuff with the worst possible tech stack.
Newest instance of this is all the kind of stupid things people are doing with github actions.
>>59
Github is definitely not the intended use of git.
someone comes up with some use for it which is definitely not intended and just reinvents existing stuff with the worst possible tech stack.
Also called ``hacking''. Did someone get Doom to run on it yet?
>>61
Would using llvm to convert some doom implementation into a single file emcascript, then use view raw count?
>>61 it's like a "hackathon" where you get to build a shitty product using the sponsoring company's api definition of "hacking".
you can play multiplayer Diablo with github pages
but you can't have a BBS
>but you can't have a BBS
You could use git to store json, then make the users browser load the json over http, then do git requests over http for posting, if you were that much of a webshit hipster hacker. Don't forget to disregard the gaping modification hole, fellow hackers.
>>65
Not him, but
then do git requests over http for posting
how would you do git requests over http? I though GH pages had no way to run backend code.
>>61 Doom is simpler than Diablo which does run(its essentially a magically themed FPS with roguelike elements) >>64
>>66
They don't the users browser does, hence the gaping hole.
So are there any othee SchemeBBS instances by now?
Is it possible to edit/delete posts?
>>70
You can restore from a backup after a raid. You can also edit the sexp files directly and then regenerate, but while the admin has done this in the past this is also the road to censorship, so reserve it for extreme cases. Either way you have to use tools external to schemebbs.
tr:hover {
background-color: yellow;
}
Should be...
tbody > tr:hover {
background-color: yellow;
}
So the table header is not highlighted on hover.
I started using the "classic" theme which is ok even though the bricks are too small. Now I am not bothered by the flashing bright yellow table header.
Thread 174 has ended now, and has shown that this board needs spam and troll protection.
Two popular methods for this are anchoring and server-side word filters. I might look into implementing them, would there be any interest to merge them?
Hey lads, can SchemeBBS run on a Raspberry Pi?
I am trying to build a patched mit-scheme 9.2 and it throws an error:
checking build system type... armv7l-unknown-linux-gnu
...
checking for native-code support... configure: error: unable to determine native-code type
Does that mean Arm is not supported by Scheme?
Btw I really miss RSS feed on that site, it'd be a very handy feature
As in https://textboard.org/prog/201#t201p17 replacing the home page Emacs mode link href with https://fossil.textboard.org/sbbs/file?name=sbbs.el&ci=tip will always link users to the latest version.
po[,,
>>75
Scheme isn't MIT Scheme, so saying
Does that mean Arm is not supported by Scheme?
isn't correct, but since SchemeBBS depends on MIT Scheme, and it says (https://www.gnu.org/software/mit-scheme/documentation/stable/mit-scheme-user/Unix-Installation.html#Unix-Installation)
We will use as an example the installation for GNU/Linux. The installation for other unix systems is similar. There are several references to ARCH below; these refer to the computer architecture that Scheme is compiled for: *either ‘i386’ ‘x86-64’, or ‘svm1’.*
But there's also https://www.gnu.org/software/mit-scheme/documentation/stable/mit-scheme-user/Portable-C-Installation.html#Portable-C-Installation, which "should run with little or no trouble on most modern architectures and operating systems". If it that works for the Pi, you could use that.
>>75,79
You can also look into if any arm distributions package mit scheme, then pull in the patches from there if you still want native support.
A bit of trivia. Lynx uses HTTP 1.0 by default and textboard.org replies by closing the connection. After setting the HTTP protocol to HTTP 1.1 on lynx's Options screen things seem to work.
>>81
Spambots.
Is there a race condition in SchemeBBS when two people try to submit a reply at the same time?
>>83
Requests are handled sequentially in deps/server.scm:create-server:listen with a do loop:
(printf "Listening to port: ~A" tcp-port)
(dynamic-wind
(lambda () unspecific)
(lambda ()
(do () ((channel-closed? socket))
(let ((port (tcp-server-connection-accept socket #t #f)))
(dynamic-wind
(lambda () unspecific)
(lambda () (ignore-errors (lambda () (serve-request port))))
(lambda () (ignore-errors (lambda () (close-port port))))))))
(lambda () (channel-close socket)))))
>>84
Does this mean the the server can only handle one request at a time?
>>85
Yes.
The tcp-server-connection-accept for the next request will only be attempted after the current request's close-port has been run.
Please add thread headlines to the webpage title.
--- a/bbs.scm
+++ b/bbs.scm
@@ -111,9 +111,14 @@
(define (title board)
(string-append "/" board "/ - SchemeBBS"))
+(define (thread-title board headline)
+ (string-append "/" board "/ - " headline " - SchemeBBS"))
+
;;; views
(define (thread-template board thread posts headline filter-func)
- (main-template (title board) (thread-view board thread posts headline filter-func) "thread"))
+ (main-template (thread-title board headline)
+ (thread-view board thread posts headline filter-func)
+ "thread"))
(define (list-template board threads)
(main-template (title board) (list-view board threads)))
@@ -125,7 +130,9 @@
(main-template (title board) (preferences-view board query-string-list)))
(define (retry-thread-template board headline message flash)
- (main-template (title board) (make-thread-form board headline message flash) "thread"))
+ (main-template (thread-title board headline)
+ (make-thread-form board headline message flash)
+ "thread"))
(define (retry-post-template board thread frontpage? message flash)
(main-template (title board) `(dl ,(make-post-form board thread frontpage? message flash)) "thread"))
You might be able to get rid of patch-runtime_http-syntax.scm
by patching MIT Scheme at runtime instead. Here's an untested idea:
;;;; Patch MIT Scheme at runtime
(ge '(runtime http-syntax))
(load-option '*PARSER)
;; Redefine Location header to allow relative URIs, which RFC 7231 allows
(define-header "Location"
(direct-parser
(*parser
(alt parse-absolute-uri
parse-relative-uri)))
(lambda (value)
(and (uri? value)
(not (uri-fragment value))))
write-uri)
(ge '(user)) ; Go back to the default REPL environment
>>88
It works! I have tested it using MIT Scheme 9.2 in FreeBSD. SchemeBBS no longer requires a custom installation of MIT Scheme! Simply add the code in >>88 to a file named patches.scm
and add (load "patches.scm")
to the beginning of bbs.scm
.
Would it be possible to add auto-refresh as a built-in for schemebbs?
>>90
Implement it in a Userscript.
>>90
How would you implement it without JavaScript?
Implementing it using HTML <meta http-equiv=”refresh” content=”60" />
is not a good idea.
What if I am writing a post when a refresh happens? I could lose what I wrote.
>>92
xul maybe
I think the URL regex in SchemeBBS should be changed to exclude .== and ==)
that appear at the end of the URL. There are so many broken URLs on textboard.org because the regex includes the .== and ==)
as part of the address.
>>94
Wow. The parsing of inline code blocks is broken.
Anyway, this is what I meant to say:
I think the URL regex in SchemeBBS should be changed to exclude "." and ")" that appear at the end of the URL. There are so many broken URLs on textboard.org because the regex includes the "." and ")" as part of the address.
Wow. The parsing of inline code blocks is broken.
Explanation and solution in >>29 2020-08-02.
the URL regex in SchemeBBS should be changed to exclude "." and ")" that appear at the end of the URL
The closing paren cannot be naively excluded because it would break wikipedia links. Also, the regex library used by SchemeBBS is broken in numerous http://textboard.org/prog/39#t39p100 https://textboard.org/prog/34/133-164 ways, so it's best to keep regexps as simple as possible. Just put a space or a newline after your URLs.
TEXTBOARD WILL NEVAR BE JPGBOARD
>>97
It'd be cool to have an imageboard without any text whatsoever
it IS cool to get deez dubz
100 GET
How do I run SchemeBBS's test suite? It does not seem to be included in the source repository.
>>98
We used to do that at the bus stop, it fancied itself as a textboard but had an option to upload images. The admin did not like that people would just upload images without text and added a /lost-and-found/ board where posts without text would end up, but we just ended up deliberately posting images there until the admin banned us for it.
http://paste.textboard.org/ --> home page
What's up with the paster, B?
>>103 Broken like everything else on this site.
The old pastes appear to be back but http://paste.textboard.org/create gives
500 Internal Server Error
nginx/1.20.2
>>105
It's fixed. It wasn't configured for the changes in the spam filter.
>>106 Much obliged.
It's fixed
http doesn't redirect to https
Lol.
>>106
home page: A pastebin service for longer text files (<1 MB)
pasting http://0x0.st/opn7.txt 836860 bytes: 413 Request Entity Too Large
>>109
Ah yes, that's 1966013 bytes once it's URL-encoded, the file is not binary uploaded.
2022/09/04 16:49:16 [error] 92627#100776: *292567 client intended to send too large body: 1966013 bytes, client: X.X.X.X, server: paste.textboard.org, request: "POST /create HTTP/1.1", host: "paste.textboard.org", referrer: "https://paste.textboard.org/new"
anyway:
server {
server_name paste.textboard.org;
client_max_body_size 4M;
}
That was worth it, nice pattern: http://paste.textboard.org/80ff3804
The site is still in DEFCON mode but maybe the nazi kids got bored?
the file is not binary uploaded
OK. I just naively assumed.
nice pattern: http://paste.textboard.org/80ff3804
Thanks. Every cell-based renderer needs a Sierpinski carpet.