Say I want to make a text board using this script How would I go about making more boards in customizing the index to my liking
A request to /someboard/ will have nginx try to read the cache in /data/html/someboard/index
and if doesn't exist, the backend will generate it from scheme code in /data/sexp/someboard/index
So if you want to have board1 and board2 in your textboard, you need to (manually) create those directories
/data/sexp/board1/
/data/html/board1/
/data/sexp/board2/
/data/html/board2/
There's no automated index of boards but that's trivial to implement. Neither is there an automated procedure to generate those directories. I had a very short time ahead of me to finish the program and really halted the development as soon as I had something working. (I'm living a weird life right now, with very few opportunities to use an internet connection at all)
The relevant parts of nginx.conf
would be:
upstream http_backend {
keepalive 20;
server 127.0.0.1:8080;
}
server {
listen 80;
server_name textboard.org www.textboard.org;
location / {
root $prefix/data/html;
default_type text/html;
index index;
try_files $uri $uri/index @schemebbs;
location @schemebbs {
proxy_pass http://http_backend;
}
>>2
Thanks for the auto overflow.
>>3
You're welcome, sorry for the delay.
Thanks for the tip man I will be working on this as soon as I get back home in April also advertising your board allowed here
>>5
Really nginx does a lot of the job, it serves html pages directly without asking Scheme for instance. A lot of the routing too. That's how things get reasonably optimized.
I did a lot of research for a fast Scheme web server, Guile was the fastest by far but still a bit behind what could be done.
Chez Scheme is disappointing there. There's a nice chinese project with bindings for libuv but it's awfully unstable: https://github.com/guenchi/Igropyr
>>6
Did you run a benchmark? What did you find?
Is it possible to get a self assigned as SSL certificate working with this board Well you can run it using a Web server so I think it's possible Of course she had don't want to use a self assigned you want to find a valid one
What are those 2 boxes on the bottom there that say do not edit Is there a possibility to remove those
>>9
Those are in a fieldset hidden by css:
<FIELDSET class="comment"><LEGEND>do not edit these</LEGEND>
<P><INPUT type="text" name="name" class="name" size="11"><BR>
<TEXTAREA name="message" class="message" rows="1" cols="11"></TEXTAREA></P></FIELDSET>
fieldset.comment {
display: none;
}
If you see them in your viewer it doesn't respect that rule. The input is "name" and the textarea is "message", but their actual function is something the board owner might elucidate.
>>7
I did run extensive benchmarks for every web servers in Scheme, old and news. I was planning to publish them but I don't have them right now.
>>10
It's a simple honeypot for spambots, better than captchas and other annoyances and good enough for a small textboard. I don't know how you can remove them with emacs, CSS rules do that for web browsers.
>>9-11
grep
and other tools could do it on the webpage so you can probably hide these honeypot fields with emacs. I'd be interested in a solution.
>>2
New anon here. Cool stuff. I've (somewhat) gotten it working on my local server. Sorry for the stupid question, but is nginx all I would need to make this public-facing?
>>13
Nevermind, answered my own question. Pretty much works now. Once I've gotten a more permanent setup I'll post a link.
One of my main confusions as of now is the hash file. The file needs to exist fot posts to be made, and its value is present in the html for the input field.I just made a file called hash and gave it arbitrary contents and things started working, but I'm guessing the value is supposed to be changed dynamically (once per post, maybe)? So, what purpose does the file serve?
>>14
The hash file is read by get-form-hash in bbs.scm. It is called for the value of the ornamentum field by make-post-form and make-thread-form in templates.scm. The ornamentum field is read by validate-form in bbs.scm as the hash of the let. Validate-form then proceeds to do precisely nothing with the hash value. After the size checks it only cares that the message and name fields are empty. Therefore it appears that the hash/ornamentum is intended as another antispam measure that is yet to be implemented, and is currently a no-op.
>>15
Thanks, that explains why arbitrary values worked.
I haven't properly looked into this, but whenever posts are submitted there is an error about an invalid http header. Despite that, the posts are still accepted, so I haven't focused on that. Any idea as to why such an error would always be thrown?
whenever posts are submitted there is an error about an invalid http header
an error would always be thrown
I'm certain the error is not so secret that you can't paste it into the thread.
test
Looking at the sources of these two pages:
https://textboard.org/prog/
https://textboard.org/prog/?css=2ch
It's clear that the main difference between them is that a different stylesheet has been requested, and that all internal links preserve the value of the css variable.
When testing this feature on my own instance, this feature doesn't work, and the hard-coded value of default.css in the main-template procedure is always requested.
I was considering changing main-template to take the css parameter as an additional argument, but that won't help with the internal links. Does anyone know how the internal links are modified on this site?
>>19
In bbs.scm:route the query-string is only passed on to set-preferences, post-thread and post-message. The first one ends up in templates.scm:preferences-view where it is used to initialize the radio button selection. The other two are only used for the final redirection calls. So at first blush it would appear that the functionality in question is not present in the gitlab repo.
There is at least one known instance of the gitlab repo not being updated to reflect the deployed site. There was a request for a change in an overflow setting which the admin graciously granted, but that change did not make it into the gitlab repo.
$ diff schemebbs/static/styles/default.css ~/Downloads/default.css
121,122c121
< overflow: hidden;
< text-overflow: ellipsis;
---
> overflow: auto;
If the same holds for the internal link rewriting functionality then it appears schemebbs is not quite as open source as it claims to be. Perhaps the admin could rectify that.
>>20
Thanks for confirming my suspicions. I didn't want to say it outright, but I was really just posting because I couldn't find anything in the gitlab repo that would change the links or even display the CSS in accordance with the parameter.
It's been a while since the repo's been updated. Has something happened to the admin?
Has something happened to the admin?
I have no way of knowing, but there was this slightly worrying message back in February:
https://textboard.org/prog/1#t1p46
On the other hand, the first year of domain registration expired in September and was renewed for another year, and since that requires some money changing hands, he couldn't have fallen off the face of the planet entirely.
>>22
Thanks for the info. I'll remain hopeful that the admin sees this at some point. In the meantime, I've been trying to understand how the mit-scheme httpio library handles outgoing get/post requests, and I've been confusing myself tremendously.
I wanted to try having the backend make a request to an external website, and incorporate the result into a page that it serves, but I'm finding that it doesn't directly support https requests.
I've successfully made requests to http pages, but not https pages. I'm using http-get and http-post. To do this I did have to modify http-client-request so that it uses the value of uri verbatim rather than constructing a new one. There's been some other small changes as I worked through this but I haven't kept track of them very well.
In http-client.scm:call-with-http-client-socket; port 80 is specified as the fallback value. I haven't bothered with uri-authority-port yet, but I have tried just changing the fallback to 443. This results in the program hanging upon execution of read-http-response in http-client-exchange.
Using port 443, the program is at least able to make it as far as completing write-http-request in http-client-exchange.
I haven't verified that write-http-request actually sent out the request, or that the website received it.
I'm not sure where to go from here. It seems that the hang takes place in read-line (or in the procedure it calls), but when I try to redefine read-line, it seems to lose scope of a procedure that it calls. That being optional-input-port, which is unbound when I check it in a repl.
I guess this isn't directly relevant to the thread, but I don't think it really warrants a thread of its own. Sorry about that. Maybe I should just have the server use curl.
>>23
I do not understand the overall thrust of your plan. Either SSL/TLS support is present in the library or it isn't. If it isn't, it will not magically materialize as a consequence of telling plain http to use port 443. That will just cause the other end of the connection to think that your client is spewing nonsense. If you want to add SSL/TLS functionality yourself, the amount of work involved will dwarf the entire vanilla http library. The natural first choice is to look for libwget or libcurl bindings.
I guess this isn't directly relevant to the thread, but I don't think it really warrants a thread of its own.
We have a "What are you working on?" thread for this sort of thing.
>>24
That makes perfect sense. I hadn't even considered that it wouldn't have any way to process SSL/TLS. I guess this is what happens when I merely look for the source of an error without putting any thought into why the error occurs. I'd gotten too used to thinking of HTTPS as being "that encryption that gets done for you when you add an `s' to your link and use port 443".
Well, I've got some things to rethink, but I'll check out bindings for both of those before I get back to working on this.
>>19
This feature was intended to avoid any use of cookies and is actually a hack done in nginx.conf
Here's the relevant part.
#replace css and add query string to all internal links
subs_filter_bypass $bypass;
subs_filter '<LINK href="/static/styles/(.*?).css"' '<LINK href="/static/styles/$arg_css.css"' or;
subs_filter '<A href="((?!http).*?)(#.*?)?"' '<A href="$1$is_args$args$2"' gr;
subs_filter '<FORM action="(.*?post)"' '<FORM action="$1$is_args$args"' gr;
>>19,26
More query string preferences were planned such as an optional display of the list of the 40 latest threads at the top of the page to make the whole thing looks like the original 2ch, but were never implemented.
>>26
Not #t39p19, but thanks for the css info.
The nginx module for the admin's css "hack done in nginx.conf" seems to be:
https://github.com/yaoweibin/ngx_http_substitutions_filter_module
In bbs.scm:view-thread you are recomputing the unchanging (posts-range range) for every single invocation of the lambda that is assigned to filter-func. Just save (posts-range range) in the let* and use it in the lambda.
And a test for lib/markup.scm:quotelink + bbs.scm:posts-range:foo
>>30---,,,---,,,---30
>>30
As expected, bbs.scm:posts-range:foo bites the dust.
error 500
internal server error
You need to tighten up the irregex in lib/markup.scm:quotelink, or the parsing in bbs.scm:posts-range:foo, or preferably both.
Here is a regular grammar for quotelink:
number -> [1-9][0-9]*
range -> number "-" number
numberorrange -> number | range
list -> numberorrange ("," numberorrange)*
The resulting regex, with extra parens for pair highlighting:
(([1-9][0-9]*)|(([1-9][0-9]*)-([1-9][0-9]*)))(,(([1-9][0-9]*)|(([1-9][0-9]*)-([1-9][0-9]*))))*
^^ number ^ ^^ number ^ ^ number ^^^ ^^ number ^ ^^ number ^ ^ number ^^^
|+-----------+ |+-----------+ +-----------+|| |+-----------+ |+-----------+ +-----------+||
| | range || | | range ||
| +---------------------------+| | +---------------------------+|
| numberorrange | | numberorrange |
+-------------------------------------------+ +-------------------------------------------+
This can be "simplified" by making the range-closing optional, but such "simplifications" tend to be counterproductive for long-term maintainability. And you should still add strict sanity checks in bbs.scm:posts-range.
I just realized that bbs.scm:posts-range:foo calls iota directly on user-controlled inputs. A small test:
>>1-64001001001
The test result I get is:
error 504
gateway timeout
You should probably validate the range to between 1 and *max-posts* or similar.
>>30-34
Thanks! Will fix asap.
>>30-34
Thanks! Will fix asap.
Serious bug:
;Aborting!: out of memory
;GC #6: took: 0.10 (47%) CPU time, 0.30 (90%) real time; free: 16631589
;GC #7: took: 0.20 (67%) CPU time, 0.30 (93%) real time; free: 16631721
;GC #8: took: 0.20 (100%) CPU time, 0.30 (96%) real time; free: 16631707
>>30
There's really a function named foo
? That's saying something about it.
>>30---,,,---,,,---30
>>36
I sincerely apologize for triggering an abort on the live instance. I thought the test result would be at most a clean 500, as happened with the previous test, after which I would post the result and a proposed solution without further incident. I like this site and have no intention of causing any damage, my intention is simply to contribute to improving the robustness of its code base.
>>37
At the time of the test:
$ grep -C 1 -ne 'define *(foo' -r .
./bbs.scm-154-(define (posts-range range)
./bbs.scm:155: (define (foo x)
./bbs.scm-156- (cond ((> (length x) 1)
>>39
Don't worry about that. I'm sincerely grateful you found that severe bug, took the time to read the code and offered a way to fix the issue (I used your regex unmodified and checked post numbers against *max-posts* as you suggested)
It's true that it's the first reboot of the bbs image since it was launched, but it's also the first time I edited a file. Well, I had to start somewhere one day, now it's done and thank you.
I realize how the code is messy or incomplete at places (a function named foo probably means I tried some temporary thing with the intent to change it later and totally forgot about it because it seemed to work). I also remember how quickly everything was hacked together on a cheap netbook while couchsurfing in Austria. Sweet memory. Mind you, I was in such a hurry that I wrote everything in vi vi vi the editor of the beast (I gave up on edwin for some reasons and believed I had no time to install and configure from scratch emacs, paredit, mit-scheme mode...)
I used your regex unmodified and checked post numbers against *max-posts* as you suggested
While I cannot yet pull the new code since it has not yet made its way into the gitlab repo, I would like to reiterate that, as I tried to convey in the irregex posts, bbs.scm:posts-range needs its own strict parsing validation regardless of the changes to lib/markup.scm:quotelink. This is because quotelink only protects against user content that goes through the post form, while the range argument of posts-range comes from the match path in route, where path is just a split of the request uri without the qstring. Therefore, the range can be directly crafted with wget, curl and the like completely bypassing quotelink, at which point >>30 applies:
$ wget --user-agent="Mozilla/5.0 Firefox/66.0" --server-response -O test.html 'https://textboard.org/prog/39/30---,,,---,,,---30'
I actually had a longer series of stress tests planned for quotelink+posts-range rather than just those two, where a series of incremental changes would be suggested after each was justified by a test, instead of dumping a batch of suggested changes as an amorphous blob pulled out of my pineal gland. That regex was merely a first candidate. But after causing downtime I changed my mind and won't perform them on the live site when I expect the result to be a small error. Instead I think I'll take some time out and see whether there is some sufficiently hassle-free way to perform tests in a local MIT Scheme REPL without having to go full nginx, which I'd rather avoid.
vi vi vi the editor of the beast
Since your views align with rms on this point, one wonders why you chose the MIT license over (A)GPL for SchemeBBS.
While I cannot yet pull the new code since it has not yet made its way into the gitlab repo
It's a quickfix, just to get the instance running. I don't have a local instance of SchemeBBS anymore (SSD failure) but I will commit the changes anyway. I'm setting everything up right now to work locally. I'll probably face the same problems that you will in order to have everything running. It seems that porting to MIT/GNU Scheme 10.1 won't be that easy: a lot of things have been removed.
bbs.scm:posts-range needs its own strict parsing validation regardless of the changes to lib/markup.scm:quotelink
No more user input fed to iota at least. You'll still get 500 errors if you try to forge something like https://textboard.org/prog/39/30---,,,---,,,---30
There's no validation in the router, except that the thread part must be a number:
((,board ,thread ,posts) (integer? (string->number thread)) (view-thread board thread posts)
I'm working on the 500 errors from bbs.scm:posts-range.
I actually had a longer series of stress tests planned
We badly need those. Now is the time. I have to deploy SchemeBBS myself, so I can help you with that, too.
in a local MIT Scheme REPL without having to go full nginx, which I'd rather avoid
At one point it worked entirely without nginx, but the server.scm is so primitive that you definitely cannot expose it directly (I had to fix an awful bug in it that allowed anyone to kill the instance very easily). Also, for performance reasons nginx serves directly all the static files.
> vi vi vi the editor of the beast
Since your views align with rms on this point, one wonders why you chose the MIT license over (A)GPL for SchemeBBS.
The vi thing was more a joke than anything else. I'm more productive in emacs for big projects (once it is all setup) but to edit a small file on a server nothing can beat vi in speed. I happily use both editors.
I could really change the license in honor of Richard Stallman. I didn't put much thought in that choice: MIT Scheme, MIT License. But let's not forget it's actually MIT/GNU Scheme. The thing is that I'm not very knowledgeable on licenses and their mutual compatibility. Some files that are not mine are needed to run scheme.bbs (the files in /deps and I made changes to some of theme). Would the AGPL license allow me to redistribute them for instance?
I don't think there would be any problems from that. If anything, the current situation is more questionable, since you depend on GPL code. However, keep in mind that if you are to use AGPL (as you should), you will need to make available the source code to the version of SchemeBBS that the users interact with. Secretly patching the code and then withholding the changes for an indefinite time, like you do now, would not be permitted. Although, technically, since you are the sole copyright owner at the moment, you could just re-license it to yourself in secret. Please don't do that.
Sadly the AGPL is mostly based on trust, see this thread for example: https://wirechan.org/g/res/206.html
There's no validation in the router, except that the thread part must be a number
Given the string-split -> match -> string->number sequence, I wonder what happens with:
$ wget --user-agent="Mozilla/5.0 Firefox/66.0" --server-response -O test.html 'https://textboard.org/prog//1'
although I will not be triggering it myself on the live instance.
The thing is that I'm not very knowledgeable on licenses and their mutual compatibility. Some files that are not mine are needed to run scheme.bbs (the files in /deps and I made changes to some of theme). Would the AGPL license allow me to redistribute them for instance?
While IANAL, my understanding is that the license you choose applies to your work and your diffs on the deps/*. The original deps/* keep their existing licenses which you cannot switch out, and these already allow redistribution with suitable copyright notices. It is common for a project to have externally sourced subparts under a different source license. You could also simply dual-license your project. However, my question was one of curiosity as to rationale, I do not presume to tell you what license you should apply to your own work.
Secretly patching the code and then withholding the changes for an indefinite time, like you do now, would not be permitted.
That was presumptuous and rude. Bitdiddle has explained the logistical difficulties that cause the current situation.
It should be fixed and committed real soon, I just have to check if the range is correctly formed in the url dispatch which is trivial (something like 1-lol also triggers a 500. The fix yesterday was just to relaunch the image. I have no testing environment yet and restarting the world is not exactly a good workflow.
I'm trying to install SchemeBBS locally, but the port to MIT Scheme 10.1 is not going to happen yet. Everything seems to be broken, just look at this:
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2019 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Saturday August 10, 2019 at 6:28:48 PM
Release 10.1.10 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118
1 ]=> (vector 1 2 3)
;Value:
;Unbound variable: nmv-header?
;To continue, call RESTART with an option number:
; (RESTART 2) => Define nmv-header? to a given value.
; (RESTART 1) => Return to read-eval-print level 1.
I spotted it while playing with irregex. 10.1 is far from stable.
Patches are no longer secret: https://gitlab.com/naughtybits/schemebbs
Things like https://textboard.org/prog/39/30---,,,---,,,---30 or https://textboard.org/prog/39/1-lol will no longer trigger a 500.
Thank you again for discovering that awful bug. I believe your stress tests will hit the nginx cache, so they should be safe.
>>45,46
Thanks for the repo update.
so they should be safe
OK, then. I see that string->number returns #f on invalid inputs, so >>44 and similar are harmless 404s. A resource utilization test for string->number:
>>> ">>1-" + "9" * (4096 - 4)
Since irregex claims to support PCRE ranges
http://synthcode.com/scheme/irregex/#SECTION_3.3
if this causes a spike it can be mitigated by switching the range regex numbers from [1-9][0-9]* to [1-9][0-9]{0,2} with the limit raised when *max-posts* gains digits.
I also have a nerd rant about filter-func/posts-range design for another post.
A second resource utilization test, aimed at delete-duplicates:
>>> ">>1-300" + ",1-300" * ((4096 - 7) // 6)
if this causes a spike it can be mitigated by switching the final iteration of the range regex from (,numberorrange)* to (,numberorrange){0,11} or whatever upper limit the admin finds reasonable. However, this becomes less important if the rant is implemented.
Tough luck. Computer ded. Editing source files with ConnectBot on a mobile phone is going to be fun. ;_;
Hey you have to work with what you have. This is really a temp fix with magic numbers everywhere. Also the admin seems to have forgotten to implement an automated way of relaunching the app.
>>1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300
>>1,2-5,8,11-14,30-40
>>1-10
error 502
bad gateway
Not again. In case that was my stress test and not just some freaky coincidence, I'm really sorry but you said it should be safe and encouraged me twice and said that the stress tests are needed and that now is the time so... I'm really sorry.
Computer ded.
Hope that wasn't me. Sorry if it was.
OK, please excuse me while I rant all over this post. See >>47 and >>49.
Independently of the validation required and now included in bbs.scm:posts-range, there is something I don't get. The only consumer of posts-range is filter-func in view-thread, and the only consumer of filter-func is templates.scm:format-thread with:
(filter-map (lambda (p) (and (filter-func p) (format-post board thread p))) posts)
Therefore, filter-func's false return is acted upon while its non-false return is discarded by the 'and'. It might as well return false/true for selecting posts. The question then becomes, how on Earth was the current design of filter-func/posts-range chosen, with delete-duplicates, sort and member? Since *max-posts* is a small positive integer, the design that naturally offers itself is a vector of booleans of length (+ *max-posts* 1) initialized to all false. To compute posts-range once in the let* of view-thread, simply fold your flattened iotas, or your sequence of iota-producing lazy lambdas, with a pointwise vector true-setter. This completely eliminates the delete-duplicates+sort part which was quadratic, since duplicate elimination is now implicit in the construction. Filter-func comes down from linear via member to constant, since it simplifies to a vector-ref of a car. There must be some other piece I'm missing, because without it the current design of filter-func/posts-range seems very inefficient.
While I'm at it:
https://www.gnu.org/software/mit-scheme/
Note that you cannot build a working system from the source unless you have a working MIT/GNU Scheme compiler to do the compilation.
Someone, somewhere decided that this was a perfectly reasonable thing to do.
>>58
OK, this got me.
>>58-59
The citation is truncated
Note that you cannot build a working system from the source unless you have a working MIT/GNU Scheme compiler to do the compilation. (This doesn't apply to the portable C source, which requires only a C compiler.)
>>56
No, how could it be your fault? I tried the replace the battery pack of my netbook but it wasn't working properly. I foolishly updated the bios and the new version had no support for MBR partition table...
Compiling compilers with themselves has been the standard since the original discovery of LISP. In theory at least, in practice I am unsure when it became commonplace, but Thompson's famous ``Trusting Trust'' talk was about it, and that was in 1984. There has been recent attempts at breaking the cycle, most notably by GNU Mes for Scheme and GNU Guix for its packages. See: https://bootstrappable.org/
>>61
I see, that's sort of a relief but still bad that you're having trouble. I look forward to the repo being updated to the deployed code whenever you have the time, whether or not you consider >>57.
[1/2]
Since your BIOS is being a Basic Inducer Of Suffering, here is an implementation of >>57 to eliminate delete-duplicates, in the MIT/GNU Scheme 9.1.1 from the Ubuntu LTS, while keeping as much of your structure as possible:
(define (posts-range range)
(define (expand-range x)
(cond ((> (length x) 1)
(let* ((a (string->number (car x)))
(b (string->number (cadr x)))
(low (if (> a *max-posts*) *max-posts* a))
(high (if (> b *max-posts*) *max-posts* b))
(count (+ (- high low) 1)))
(if (> high low)
(lambda () (iota count low))
(lambda () (list low)))))
(else (let* ((a (string->number (car x)))
(low (if (> a *max-posts*) *max-posts* a)))
(lambda () (list low))))))
(define (invoke-loop-set vector lamb)
(for-each (lambda (e) (vector-set! vector e #t))
(lamb)))
(let* ((r1 (string-split range #\,))
(r2 (map (lambda (x) (string-split x #\-)) r1))
(r3 (map expand-range r2))
(vec (make-vector (+ *max-posts* 1) #f)))
(for-each (lambda (e) (invoke-loop-set vec e))
r3)
vec))
Here are some tests, keeping in mind that posts-range runs after the regex match:
1 ]=> (posts-range "1,3,5,7,290-300")
;Value 13: #(#f #t #f #t #f #t #f #t #f #f #f #f [...] #f #f #f #f #t #t #t #t #t #t #t #t #t #t #t)
1 ]=> (posts-range "1-9999999999")
;Value 14: #(#f #t #t #t #t #t #t #t #t #t #t #t [...] #t #t #t #t #t #t #t #t #t #t #t #t #t #t #t)
1 ]=> (define fulltest (apply string-append (cons "1-300" (make-list (quotient (- 4096 7) 6) ",1-300"))))
At this point fulltest is >>50.
1 ]=> (posts-range fulltest)
;Value 17: #(#f #t #t #t #t #t #t #t #t #t #t #t [...] #t #t #t #t #t #t #t #t #t #t #t #t #t #t #t)
1 ]=> (define (timeit proc)
(with-timings proc
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline))))
1 ]=> (timeit (lambda () (posts-range "1,3,5,7,290-300")))
0. 0. 0.
1 ]=> (timeit (lambda () (posts-range fulltest)))
.04 0. .044
[2/2]
The time for fulltest fluctuates between 4 and 5 centiseconds, but since the homepage has:
Runs well on "el cheapo" VPS
I recommend both regex refinements as well. To integrate with filter-func, here is a diff against the current gitlab version, which is the last commit of "20 Feb, 2020". This one is untested because I do not yet have a local instance, but it is simple enough that it should work.
$ TZ=GMT diff -u schemebbs/bbs.scm edit/bbs.scm
--- schemebbs/bbs.scm 2020-02-20 15:17:38.682224678 +0000
+++ edit/bbs.scm 2020-02-22 14:41:02.198743998 +0000
@@ -138,10 +138,12 @@
(let* ((t (call-with-input-file path read))
(headline (lookup-def 'headline t))
(posts (lookup-def 'posts t))
- (filter-func (if (default-object? range)
- identity
- (lambda (e) (member (car e) (posts-range range))))))
- (cond ((default-object? range)
+ (norange (default-object? range))
+ (rangeonce (if norange "unused" (posts-range range)))
+ (filter-func (if norange
+ (lambda (e) #t)
+ (lambda (e) (vector-ref rangeonce (car e))))))
+ (cond (norange
(if (not (file-exists? cache))
(write-and-serve cache (thread-template board thread posts headline filter-func))
(begin (display "reverse proxy miss") (serve-file cache)))) ;; we shouldn't go here, reverse proxy fetches the page itself
Here are the two regex refinements for digit count >>47 and interval count >>49. The SYNCs are there for grep. The irregex PCRE ranges are linked in >>47.
$ TZ=GMT diff -u schemebbs/bbs.scm edit/bbs.scm
--- schemebbs/bbs.scm 2020-02-20 15:17:38.682224678 +0000
+++ edit/bbs.scm 2020-02-22 15:40:23.041388774 +0000
@@ -153,7 +153,9 @@
(define (range? posts)
- (irregex-match "([1-9][0-9]*|([1-9][0-9]*)-([1-9][0-9]*))(,([1-9][0-9]*|([1-9][0-9]*-[1-9][0-9]*)))*" posts))
+ (irregex-match "(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" posts))
+ ; SYNC lib/markup.scm:quotelink
+ ; SYNC digit count of *max-posts*
(define (posts-range range)
(define (expand-range x)
$ TZ=GMT diff -u schemebbs/lib/markup.scm edit/lib/markup.scm
--- schemebbs/lib/markup.scm 2020-02-20 15:17:38.682224678 +0000
+++ edit/lib/markup.scm 2020-02-22 15:38:37.134770741 +0000
@@ -182,7 +182,8 @@
(define quotelink
(transform-rule
'quotelink
- (irregex ">>([1-9][0-9]*|([1-9][0-9]*)-([1-9][0-9]*))(,([1-9][0-9]*|([1-9][0-9]*-[1-9][0-9]*)))*")
+ (irregex ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}")
+ ; SYNC bbs.scm:range?
(lambda (sub) `(a (@ (href ,(string-append
"/" *board*
"/" *thread*
With all of the above applied, the maximum stress that can be put on posts-range is:
1 ]=> (define maxtest (apply string-append (cons "1-300" (make-list 11 ",1-300"))))
1 ]=> (posts-range maxtest)
which runs in zero time.
>>1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300,1-300
If the server machine is an honest-to-God potato, the 11 can be further reduced.
>>64
Is there a performance reason for expand-range
to have such convoluted logic?
(define (expand-range x)
(let ((first (min *max-posts* (string->number (car x))))
(last (min *max-posts* (string->number
(if (null? (cdr x)) (car x) (cadr x))))))
(lambda ()
(iota
(if (> last first) (- last -1 first) 1)
first))))
I would also recommend using the name ``thunk'' instead of ``lamb''.
>>65
Do you usually use these ``SYNC'' annotations in your projects? They seem prone to human error. I see no reason why the digit length of *max-posts*
couldn't be computed at startup and added to the regex string. If you extract the string from range?
into a variable, you could also easily reuse it in ==quotelink=.
>>64-67
Wow! Thanks for the patches. I can't express my gratitude enough in those apocalyptic times. (I just managed to have a working system again, I'm installing MIT Scheme 9.2 and will detail the long overdue installation of SchemeBBS)
While the regex seems correct when I try it at the REPL, there's a weird bug with the transform rule, as in >>65:
>>1,10-20,30
I have to sort this out.
>>47,64-65,67
It seems that I may have solved the regex problem. There was some bug probably due to the combination of quantifiers, greediness and backtracking or simply a bug in irregex.
This one seems to work:
(irregex ">>[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?(,[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?){0,11}")```
If you extract the string from range? into a variable, you could also easily reuse it in
quotelink
That would indeed avoid the use of a magic number.
Thanks for the patches. I can't express my gratitude enough in those apocalyptic times.
As the dumbass who stress tested on the live instance, I felt I should.
I just managed to have a working system again
For future Anons who might face the same issue, what did you do with an MBR-partitioned hard disk and a BIOS that refuses to boot MBR?
and will detail the long overdue installation of SchemeBBS
It would be tremendously helpful if you included a fully functional nginx.conf in your guide.
the regex seems correct when I try it at the REPL
there's a weird bug with the transform rule
I have to sort this out.
Did you undo all of your "temp fix with magic numbers everywhere" from >>52?
Also, thank you for AGPL. You did the right thing.
>>66
[1/2]
Is there a performance reason for expand-range to have such convoluted logic?
There is not. This structure is from the admin's "check if posts range are valid" commit on "Feb 20, 2020". The structure was kept as much as possible, as indicated in >>64. The reason it was kept is that this is a critical performance fix, and I do not believe in piggybacking code organization improvements onto performance fixes. They are logically separate issues so they should be separate patches/commits.
As for your proposed code, unifying the lambdas is a good idea. Switching the singleton lists to singleton iotas is not a problem. Combining the parsing and the min is also a good idea, because the raw number is not used for anything else. The admin had to give the raw number a name to avoid recomputation in his inlined min. However, I will note here that while the deps use min the rest of the codebase does not, so avoiding it might simply be the admin's preference, and if that is the case we don't get to complain because as the admin he gets to make the decisions.
$ grep -wne 'min' -r . | grep -ve '^[^:]*.js:'
./deps/irregex.scm:1652: (if lo2 (min lo2 lo3) lo3)
./deps/irregex.scm:1673: (return (+ lo2 (min lo3 lo4))
./deps/irregex.scm:3718:(define (min-char a b)
./deps/irregex.scm:3739: (min-char (cdr a-range) (cdr b-range)))))
./deps/httpio.scm:322: (let ((m (read-substring! buffer 0 (min n len) port)))
[2/2]
There are however two issues with your code. First, because
programs must be written for people to read, and only incidentally for machines to execute
keeping part of the iota count computation as (+ (- last first) 1) is the right thing to do, instead of having to pause for a second to satisfy oneself that the -1 is indeed right because it is inside the subtraction. The second issue is that your code actually introduces a performance downgrade. Because you have switched to a vanilla 'let', on the consequent branch of 'last' you are redoing the entire parsing and limiting work of 'first' from scratch for non-interval quote pieces. While expand-range is not in the innermost loop of posts-range, it is still in a loop, that of 'map'. As such, if I had control over what goes into SchemeBBS, which I don't, I would not accept such gratuitous duplication of the parsing and limiting work of the most frequent type of quote piece, when it is easily avoided. You need to switch back to let*, move the 'if' to the top level of 'last' and immediately reuse 'first' in the consequent.
I would also recommend using the name ``thunk'' instead of ``lamb''.
This is a very good idea.
Do you usually use these ``SYNC'' annotations in your projects?
I do not, because in code that I control I avoid having two sources of truth. The copypasting between quotelink and range? comes from the admin's same "check if posts range are valid" commit of "Feb 20, 2020". While I cannot speak for Bitdiddle, it is virtually certain that it was done this way because a quick fix was needed for a class of 500 errors in the previous posts-range, as you can see further up the thread. This is also why the assembled regex was included in >>32 in addition to the rules to assemble it from. The structure was kept in the performance fix as per the piggybacking explanation above. If you wish to submit a procedure to assemble the regex once, using the computed digit count, and use the result in range? and quotelink, you are welcome to do so. It would improve maintainability and clean up the code.
If you are interested in code organization improvements, you may wish to take a look at lib/markup.scm:line-scanner and the obvious copypasting going on there. That function is ripe for refactoring by extracting the repeated part. Here is one possible path:
https://textboard.org/prog/49#t49p4
There was some bug probably due to the combination of quantifiers, greediness and backtracking or simply a bug in irregex.
It's great that it works now, but we are programmers here and this is far too hand-wavy. Can you offer any example in the REPL where the >>65 regex fails, so I can see for myself?
Because on my end, with >>65 and your quote from >>68 it works:
;Loading "irregex.scm"... done
1 ]=> (irregex-match ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" ">>47,64-65,67")
;Value 13: #(*irregex-match-tag* #(#[compound-procedure 14] #[compiled-procedure 15 ("list" #x1) #x1a #x4c7052] #[compiled-procedure 16 ("list" #x42) #x1a #x4cc47a] #[compiled-procedure 17 ("list" #x48) #x1a #x4ccc42] #[compound-procedure 18] #f) () (">>47,64-65,67" 0 13) 0 (">>47,64-65,67" 0 13) 13 (">>47,64-65,67" 0 13) 2 (">>47,64-65,67" 0 13) 4 (">>47,64-65,67" 0 13) 2 (">>47,64-65,67" 0 13) 4 #f #f #f #f #f #f #f #f #f #f #f #f (">>47,64-65,67" 0 13) 10 (">>47,64-65,67" 0 13) 13 (">>47,64-65,67" 0 13) 11 (">>47,64-65,67" 0 13) 13 (">>47,64-65,67" 0 13) 11 (">>47,64-65,67" 0 13) 13 (">>47,64-65,67" 0 13) 5 (">>47,64-65,67" 0 13) 10 (">>47,64-65,67" 0 13) 5 (">>47,64-65,67" 0 13) 7 (">>47,64-65,67" 0 13) 8 (">>47,64-65,67" 0 13) 10 #f #f #f #f)
And here is your quote from >>67 with >>65:
;Loading "irregex.scm"... done
1 ]=> (irregex-match ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" ">>1,10-20,30")
;Value 13: #(*irregex-match-tag* #(#[compound-procedure 14] #[compiled-procedure 15 ("list" #x1) #x1a #x4c7052] #[compiled-procedure 16 ("list" #x42) #x1a #x4cc47a] #[compiled-procedure 17 ("list" #x48) #x1a #x4ccc42] #[compound-procedure 18] #f) () (">>1,10-20,30" 0 12) 0 (">>1,10-20,30" 0 12) 12 (">>1,10-20,30" 0 12) 2 (">>1,10-20,30" 0 12) 3 (">>1,10-20,30" 0 12) 2 (">>1,10-20,30" 0 12) 3 #f #f #f #f #f #f #f #f #f #f #f #f (">>1,10-20,30" 0 12) 9 (">>1,10-20,30" 0 12) 12 (">>1,10-20,30" 0 12) 10 (">>1,10-20,30" 0 12) 12 (">>1,10-20,30" 0 12) 10 (">>1,10-20,30" 0 12) 12 (">>1,10-20,30" 0 12) 4 (">>1,10-20,30" 0 12) 9 (">>1,10-20,30" 0 12) 4 (">>1,10-20,30" 0 12) 6 (">>1,10-20,30" 0 12) 7 (">>1,10-20,30" 0 12) 9 #f #f #f #f)
1 ]=>
Did you undo all of your "temp fix with magic numbers everywhere" from >>52?
They're not related to posting and the parser. They only prevented abusive requests like http://textboard.org/prog/1/1-300,1-300,...
. (I'm actually running two instances of SchemeBBS, one for POST one for GET: rl Cheapo multithreading and a fast way to make the board read-only.
>>73
It also worked fine for me in the REPL with calling irregex-match
on all the tests but however it wasn't working with calls to irregex
compilation in a running instance of SchemeBBS. Just look at all the failures in this thread. Anyway I believe our two regexes are equivalent. I'll commit that and will now take a closer look at the most interesting optimization you made, in the posts-range
function. And I do keep in mind your previous rant about the filter-func
. I can't remember the reason why I implemented it like that, but it does feel hacky.
What I really need now is a working local instance, I can't go on like that, live testing on the server. And you also need it, if you'd like to test along. I'll publish the necessary nginx.conf
which does a lot of caching for improving speed.
Btw, I took some time to read about the compatibility of MIT/BSD and GPL licenses and found that it made more sense to include MIT licensed code in a GPL product than the other way round. After all there's patches I had to made in MIT/GNU Scheme, httpio.scm and most importantly server.scm which was totally buggy. I might redistribute the whole bundle at some point, including MIT Scheme's customized runtine (besides the only BSD -licensed dependency is irregex, SRFIs being public domain)
I had to gave up on many planned features at the time, ambitious things like hot code swapping, so there is no real need to run SchemeBBS in an interpreter with its verbose console output. It could use some logger and compiled the scheme files. Errors handling and a supervisor would be a nice improvement too.
Anyway I believe our two regexes are equivalent.
This is irrelevant. It is the bug hunt I'm after. I already know that the regex in >>65 is correct, otherwise I wouldn't have posted it.
It also worked fine for me in the REPL with calling irregex-match on all the tests
OK, so >>65 cannot be made to fail in the REPL. This is my result as well.
but however it wasn't working with calls to irregex compilation in a running instance of SchemeBBS
All right, provide any firm statement that names a string that fails this way.
Just look at all the failures in this thread. Anyway [...]
No, this is not an answer. Name the exact quote attempt in this thread that failed with >>65. The only quote attempts after >>65 was posted, that are not fully links, are ">>64-67" and ">>1,10-20,30" from >>67. They both stop being links at the hyphen. Do these fail with >>65 and "irregex compilation in a running instance of SchemeBBS"? Yes or no. This is programming, not religion. I need any firm assertion that I can then verify myself as being true or false.
Are you aware that clicking any of the quotelinks lead to the internal server error page?
The second issue is that your code actually introduces a performance downgrade.
Compared to what? How did you measure it? I doubt parsing three digits is such a cardinal sin, but I am eager to see your numbers, as, in good LISP programmer tradition, I am ignorant of the costs.
Are you aware that clicking any of the quotelinks lead to the internal server error page?
This will make single links work:
https://textboard.org/prog/34#t34p98
Compared to what?
I have explained in detail how you are gratuitously replacing N operations with 2N operations by duplication, when this is easily avoided. I have also provided instructions that achieve this. If it is not clear to you how 2N is a performance downgrade from N, I am not sure how else to help you.
How did you measure it? I doubt parsing three digits is such a cardinal sin, but I am eager to see your numbers
If you will take a few minutes to read through the stress tests and the fix in detail, you will see that the entire point of the new vector-based posts-range is that on inputs that pass the regex from >>65 it "runs in zero time" as stated in the same post.
>>77
Damn it. There's no ">>" to match in the URL.
This is irrelevant. It is the bug hunt I'm after.
All right, provide any firm statement that names a string that fails this way.
No, this is not an answer. Name the exact quote attempt in this thread that failed with >>65. The only quote attempts after >>65 was posted, that are not fully links, are ">>64-67" and ">>1,10-20,30" from >>67. They both stop being links at the hyphen. Do these fail with >>65 and "irregex compilation in a running instance of SchemeBBS"? Yes or no. This is programming, not religion. I need any firm assertion that I can then verify myself as being true or false.
It's actually quite easy to check in the REPL that the regex in >>65 will fail at the first hyphen, as you noted:
]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" "11,20,25,34-80" "fails")
;Value 13: "fails-80"
1 ]=> (irregex-replace "[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?(,[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?){0,11}" "11,20,25,34-80" "works")
;Value 14: "works"
The bug is in irregex
not in your regex. There is an inconsistency between irregex-match
and irregex-replace
or something I don't get. Anyway, rewriting the regex seems to circumvent it.
>>79
Sorry
1 ]=> (irregex-replace "(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" "11,20,25,34-80" "fails")
;Value 13: "fails-80"
1 ]=> (irregex-replace "[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?(,[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?){0,11}" "11,20,25,34-80" "works")
;Value 14: "works"
1 ]=> (irregex-match "(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" "11,20,25,34-80")
Two times zero is still zero.
>>79
Thank you for the new information. Based on it, I have reproduced the behavior using the irregex-search used by lib/markup.scm:string->sxml.
$ mit-scheme --load irregex.scm
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Tuesday February 6, 2018 at 6:31:25 PM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "irregex7.scm"... done
;Loading "test.scm"... done
1 ]=> (irregex-match-substring (irregex-search ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" ">>64,1,2,3,55,56-62"))
;Value 13: ">>64,1,2,3,55,56"
1 ]=> (irregex-match-substring (irregex-match ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}" ">>64,1,2,3,55,56-62"))
;Value 14: ">>64,1,2,3,55,56-62"
1 ]=>
As you say, the bug is evidently in irregex. It is present both in the "0.9.6: 2016/12/05" version from SchemeBBS and in the latest "0.9.7: 2019/12/31" from
http://synthcode.com/scheme/irregex/
The high-level problem is that irregex-search violates "the POSIX leftmost, longest semantics" guaranteed by the documentation:
Matching follows the POSIX leftmost, longest semantics, when searching. That is, of all possible matches in the string, irregex-search will return the match at the first position (leftmost). If multiple matches are possible from that same first position, the longest match is returned.
I will investigate this further.
And I am still interested in the outstanding question from >>70. For future Anons who might face the same issue, what did you do with an MBR-partitioned hard disk and a BIOS that refuses to boot MBR?
Can someone please explain to me why people in this thread are trying to parse a context-free language using regular expressions?
As you say, the bug is evidently in irregex. It is present both in the "0.9.6: 2016/12/05" version from SchemeBBS and in the latest "0.9.7: 2019/12/31"
Yes I have tested against this version as well and the bug is still there. But at least Alex Shinn is still maintaining irregex. It just needs nice a PR with those reproducible steps.
what did you do with an MBR-partitioned hard disk and a BIOS that refuses to boot MBR?
It came a few weeks after an SSD failure, my system wasn't even fully configured yet. It happened 2 days after I finally installed MIT/GNU Scheme to work locally on SchemeBBS. I wiped the disk and quickly reinstalled linux, a couple of dotfiles, sent the faulty battery pack back, and I'm back to where I was before the incident. Relaunching SchemeBBS from the mobile phone wasn't really fun. I felt I didn't have time to instigate how one can boot from MBR on an UEFI locked BIOS. I had no documents to lose, so...
My laptop is falling apart, I'm looking for a used Thinkpad. There are some good deals and they're rock solid. The perfect machine for someone like me.
Give me another few days, kind anon, and I'll write a full documentation for the SchemeBBS installation. I have to do it myself, and I will take notes along the way. I'll share the nginx.conf file too, I don't think it's too much of a security issue.
>>83
The quotelink language is composed of iterations, concatenations and alternations of atoms, ergo it is regular.
But at least Alex Shinn is still maintaining irregex. It just needs nice a PR with those reproducible steps.
I think we can produce a much smaller test case to demonstrate the issue.
I wiped the disk
OK, I guess it was naive of me to expect a magic solution.
Give me another few days, kind anon, and I'll write a full documentation for the SchemeBBS installation.
No rush. I'm focusing on the irregex bug now.
I'll share the nginx.conf file too, I don't think it's too much of a security issue.
Whatever you strip out, please make it fully functional. For example, in your css answer >>26 you did not share the replacement variables' computation.
Can someone please explain to me why people in this thread are trying to parse a context-free language using regular expressions?
Parsing is done in Scheme, not with regexes, they're only used for substitions of inline elements which (with the exceptions of spoilers) are not composable.
That minimalist typographically sound markup bold italics is utter nonsense was designed almost a decade ago (without spoilers). First implemented in C, then in Scheme, then in Erlang. The Erlang textboard was 95% finished when w4ch's /prog/ got captchas but the progrider's admin quickly hosted a tablecat alternative and I eventually gave up on writing that alternative textboard software (it was announced in a shelter thread at tablecat, if anyone remembers that). I then ported the Erlang's parser which was a really nice one-pass parser for binary strings, in MIT Scheme. In the very limited time frame I had to write SchemeBBS, I had to make some trade-offs to "finish" the project in time. Regex weren't supposed to be used at all and it's very plausible that I'll rewrite the whole parser one day. I find it to be too slow
Here is a minimal set of dependencies for lib/markup.scm:string->sxml that, together with deps/irregex.scm, allow string->sxml to be exercised in the REPL, with the smallest number of moving parts and without needing a local instance. The append-element function is from lib/utils.scm and the rest from lib/markup.scm. The quotelink is as it was after "20 Feb, 2020 4 commits" and quotelink2 is the same with the regex from >>65.
$ cat test.scm
(define (append-element l . e)
(append l e))
(define *board* "prog")
(define *thread* "39")
(define (transform-rule name regex transform)
(define (dispatch op)
(cond ((eq? op 'name) name)
((eq? op 'regex) regex)
((eq? op 'transform) transform)))
dispatch)
(define (transform markup) (apply markup '(transform)))
(define (regex markup) (apply markup '(regex)))
(define (name markup) (apply markup '(name)))
(define (string->sxml markup s)
(define (string->sxml-rec s res)
(let ((match (irregex-search (regex markup) s)))
(cond ((string-null? s)
res)
((not match)
(append-element res s))
(else
(let* ((start (irregex-match-start-index match))
(end (irregex-match-end-index match))
(substr (irregex-match-substring match))
(s1 (substring s 0 start))
(s2 (substring s end (string-length s))))
(if (string-null? s1)
(string->sxml-rec
s2
(append-element res ((transform markup) substr)))
(if (and (eq? (name markup) 'del) ;; exception to escape spoiler inside code
(between-code? s1 s2))
(string->sxml-rec "" (append-element res (string-append s1 substr s2)))
(string->sxml-rec
s2
(append-element res s1 ((transform markup) substr))))))))))
(string->sxml-rec s '()))
;; edge false positive (between-code? "==code== ==code==" "==")
;; could add another pass of spoiler, but ok good-enough
(define (between-code? s1 s2)
(let ((m1 (irregex-search (irregex ".*==$|.*==[^ ]") s1)) ;opening code in s1
(m2 (irregex-search (irregex ".*[^ ]==") s1)) ;closing code in s1
(m3 (irregex-search (irregex "^==|.*?[^ ]==") s2)) ;closing code in s2
(imei irregex-match-end-index))
(if (and m1 m3 (or (not m2) (>= (imei m1) (imei m2))))
#t
#f)))
(define quotelink
(transform-rule
'quotelink
(irregex ">>([1-9][0-9]*|([1-9][0-9]*)-([1-9][0-9]*))(,([1-9][0-9]*|([1-9][0-9]*-[1-9][0-9]*)))*")
(lambda (sub) `(a (@ (href ,(string-append
"/" *board*
"/" *thread*
"/" (string-tail sub 2))))
,sub))))
(define quotelink2
(transform-rule
'quotelink
(irregex ">>(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))(,(([1-9][0-9]{0,2})|(([1-9][0-9]{0,2})-([1-9][0-9]{0,2})))){0,11}")
(lambda (sub) `(a (@ (href ,(string-append
"/" *board*
"/" *thread*
"/" (string-tail sub 2))))
,sub))))
The string->sxml equivalents of >>82:
$ mit-scheme --load irregex.scm --load test.scm
[...]
;Loading "irregex.scm"... done
;Loading "test.scm"... done
1 ]=> (string->sxml quotelink ">>1,3,5,111-222,300")
;Value 13: ((a (@ (href "/prog/39/1,3,5,111-222,300")) ">>1,3,5,111-222,300"))
1 ]=> (string->sxml quotelink2 ">>1,3,5,111-222,300")
;Value 14: ((a (@ (href "/prog/39/1,3,5,111")) ">>1,3,5,111") "-222,300")
1 ]=>
The outer iteration and the digit iteration are used, but the alternation is not, in violation of "leftmost, longest".
Here is something peculiar. If we switch >>65 so ranges are placed as alternates before single posts, irregex-search works too:
1 ]=> (define re65sw ">>((([1-9][0-9]{0,2})-([1-9][0-9]{0,2}))|([1-9][0-9]{0,2}))(,((([1-9][0-9]{0,2})-([1-9][0-9]{0,2}))|([1-9][0-9]{0,2}))){0,11}")
;Value: re65sw
1 ]=> (irregex-match-substring (irregex-match re65sw ">>1,3,5,111-222,300"))
;Value 15: ">>1,3,5,111-222,300"
1 ]=> (irregex-match-substring (irregex-search re65sw ">>1,3,5,111-222,300"))
;Value 16: ">>1,3,5,111-222,300"
1 ]=>
This gives me an ides.
*idea
obviously
>>88
That's what I tried, before rewriting the regex differently. Can't remember what the problem was. I haven't triple checked: I usually remove some of your parenthesis. I could have miswritten it because it does seem to work in the REPL.
Here are the two conditions necessary for irregex-search to break its "leftmost, longest" contract:
1. Range iteration instead of star iteration.
2. An alternation where the first branch is a prefix of the second branch.
A minimal example:
$ mit-scheme --load irregex.scm
[...]
;Loading "irregex.scm"... done
1 ]=> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
;Value: imsis
1 ]=> (imsis "(a|ab){0,3}" "abab")
;Value 35: "a"
Irregex-match keeps working:
1 ]=> (define (imsim re str) (irregex-match-substring (irregex-match re str)))
;Value: imsim
1 ]=> (imsim "(a|ab){0,3}" "abab")
;Value 36: "abab"
If only one condition is met, irregex-search works too:
1 ]=> (imsis "(ab|a){0,3}" "abab")
;Value 34: "abab"
1 ]=> (imsis "(a|ab)*" "abab")
;Value 33: "abab"
I think this might be worth telling the shinnoid. I cannot see any obvious contact info on
http://synthcode.com/scheme/irregex/
but his git commits on
https://github.com/ashinn/irregex
are by "Alex Shinn <alexshinn@gmail.com>". Well, at least he's not one of those protonmail people. If someone could drop him a line, that would be great.
That's what I tried
Can't remember what the problem was
I could have miswritten it
Clearly, it was a typo of some sort. No worries.
Here is something peculiar. Irregex-match uses irregex-match/chunked which has two branches, one with dfa-match/longest and another with irregex-nfa. But the irregex-search/matches used by irregex-search has three branches: dfa-match/longest, dfa-match/shortest and irregex-search/backtrack. However, it is not a difference between these branches that breaks irregex-search. With some bronze age debug prints we have:
$ guile -l irregex.scm
[...]
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (imsis "(a|ab){0,3}" "abab")
is/m->backtrack $1 = "a"
scheme@(guile-user)> (imsis "(ab|a){0,3}" "abab")
is/m->backtrack $2 = "abab"
scheme@(guile-user)> (imsis "(a|ab)*" "abab")
is/m->dfa is/m->shortest $3 = "abab"
scheme@(guile-user)>
So both range iteration versions take the backtrack branch, but only the prefix-first version breaks irregex-search. It would seem that the problem might be somewhere within irregex-search/backtrack.
Irregex PCRE x{m,n} ranges are parsed into SRE (** m n x) ranges in string->sre -> named let lp -> main parsing -> case c -> branch (#\{). The crucial bit is:
(m
(lp (+ j 1) (+ j 1) flags `((** ,n ,m ,x) ,@tail) st))
which is at lines {962,963} in the "0.9.6: 2016/12/05" version of deps/irregex.scm from SchemeBBS on "20 Feb, 2020 4 commits". The irregex-search breakage does not depend on PCRE parsing and can be demonstrated directly on SRE. The same conditions I explained in >>91 hold for SRE:
$ guile -l irregex.scm
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (imsis '(** 0 3 (or "a" "ab")) "abab")
is/m->backtrack $1 = "a"
scheme@(guile-user)> (imsis '(** 0 3 (or "ab" "a")) "abab")
is/m->backtrack $2 = "abab"
scheme@(guile-user)> (imsis '(* (or "a" "ab")) "abab")
is/m->dfa is/m->shortest $3 = "abab"
scheme@(guile-user)>
The bug is now quite likely to be in irregex-search/backtrack with (** m n (or x y)) constructs.
SRE irregex reference:
http://synthcode.com/scheme/irregex/#SECTION_3.2
Irregex-search/backtrack uses irregex-nfa, which is just an accessor into the irregex representation, which is a vector. The nfa is index 3 in make-irregex, and is computed by the final branch of sre->irregex via sre->procedure. Sre->procedure composes a tree of lambdas over the tree of a SRE like '(** 0 3 (or "a" "ab")). The "case (car sre)" has (or) and (**) branches, the bug should be here.
With the benefit of hindsight, this can be gleaned from the REPL:
$ guile -l irregex.scm
scheme@(guile-user)> (irregex '(** 0 3 (or "a" "ab")))
$2 = #(*irregex-tag* #f #f #<procedure 55af37871090 at irregex.scm:3161:21 (cnk init src str i end matches fail)> 0 0 #((0 . 6)) ())
scheme@(guile-user)>
Irregex.scm:3161:21 is where the lambda of (**) lives.
Here are two more pieces of information. First, the (or) branch of sre->procedure doesn't follow "leftmost, longest":
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (imsis '(** 1 1 (or "free" "freedom")) "freedom")
$11 = "free"
The fixed range of 1 is only there to cause irregex to use sre->procedure. A plain 'or' uses a vanilla NFA, the implementation of which seems to work. Second, the 'or' can be forced to use later branches to reach a non-zero lower range limit, but no further, again in violation of "leftmost, longest":
scheme@(guile-user)> (imsis '(** 2 4 (or "a" "ab")) "abababab")
$12 = "aba"
scheme@(guile-user)> (imsis '(** 3 4 (or "a" "ab")) "abababab")
$13 = "ababa"
The way in which the (or) branch is broken seems fairly clear, but for the (**) branch I do not have all the details yet.
Beware of the bug chaser.
As an aside, before I explain why (or) and (**) are broken in sre->procedure, there is a "Fix exponential explosion in backtrack compilation" commit to irregex by Peter Bex on "Dec 5, 2016".
https://github.com/ashinn/irregex/commit/a16ffc86eca15fca9e40607d41de3cea9cf868f1
It only came to my attention because it contains the current implementation of the (+) branch of sre->procedure. While "define * in terms of +, instead of vice versa" is a fine idea, you still need a working (+). This (+), however, also takes a light-hearted comedy approach to the "POSIX leftmost, longest semantics" guaranteed by the documentation. As explained in >>95 the fixed range of 1 is only there to cause irregex to use sre->procedure.
$ guile -l irregex.scm
[...]
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (define (inout re n)
(let* ((sin (string-join (make-list n "a") ""))
(sout (imsis re sin)))
(simple-format #t " in ~A ~A\nout ~A ~A\n" (string-length sin) sin (string-length sout) sout)))
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 8)
in 8 aaaaaaaa
out 6 aaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 9)
in 9 aaaaaaaaa
out 9 aaaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaa" "aaaaa"))) 10)
in 10 aaaaaaaaaa
out 9 aaaaaaaaa
This class of tests can also be made to fail if the prefix is the second alternative:
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 9)
in 9 aaaaaaaaa
out 8 aaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 10)
in 10 aaaaaaaaaa
out 10 aaaaaaaaaa
scheme@(guile-user)> (inout '(** 1 1 (+ (or "aaaaa" "aaa"))) 11)
in 11 aaaaaaaaaaa
out 10 aaaaaaaaaa
Contrast this with grep, whose authors appear to actually know what they're doing. The {1,1} is only there for equivalence.
$ g () {
> local sin sout len
> len () { echo "$1" | gawk '{ print length ($0) }'; }
> sin=$(printf "%$2s" "" | tr ' ' 'a')
> echo " in $(len "$sin") $sin"
> sout=$(echo "$sin" | grep -E -oe "$1")
> echo "out $(len "$sout") $sout"
> }
$ g '(aaa|aaaaa)+{1,1}' 8
in 8 aaaaaaaa
out 8 aaaaaaaa
$ g '(aaa|aaaaa)+{1,1}' 9
in 9 aaaaaaaaa
out 9 aaaaaaaaa
$ g '(aaa|aaaaa)+{1,1}' 10
in 10 aaaaaaaaaa
out 10 aaaaaaaaaa
And with reversed alternatives:
$ g '(aaaaa|aaa)+{1,1}' 9
in 9 aaaaaaaaa
out 9 aaaaaaaaa
$ g '(aaaaa|aaa)+{1,1}' 10
in 10 aaaaaaaaaa
out 10 aaaaaaaaaa
$ g '(aaaaa|aaa)+{1,1}' 11
in 11 aaaaaaaaaaa
out 11 aaaaaaaaaaa
Great commit you have there.
>>96
Indeed, I hear entomologists can be quite peculiar people.
I like taking notes during debugging, it helps a lot. I use an org-mode file, per project, for it. It can contain links to the code, so I can record where I have been while tracking down a fault. Very handy when working on foreign codebases, it's slowly growing into something like a travel guide. It can also have snippets, checklists, etc., anything one could want. When I fix an issue, I write a summary of what was wrong and why I think my fix is the correct one. After that, I change the TODO
state into DONE
and feel good about myself.
Especially useful for projects done in spare time, where it helps tremendously in getting back in context, even after longer intermissions.
May you show us a bit of an org file to demonstrate?
[1/5]
OK, here is the cold, hard truth. The (or) branch of sre->procedure looks like this:
((or)
(case (length (cdr sre))
((0) (lambda (cnk init src str i end matches fail) (fail)))
((1) (rec (cadr sre)))
(else
(let* ((first (rec (cadr sre)))
(rest (lp (sre-alternate (cddr sre))
(+ n (sre-count-submatches (cadr sre)))
flags
next)))
(lambda (cnk init src str i end matches fail)
(first cnk init src str i end matches
(lambda ()
(rest cnk init src str i end matches fail))))))))
It tries the first alternative, and if this fails it tries the (or) of the other alternatives. There is not even an attempt at the "POSIX leftmost, longest semantics". Since the (or) might be the entire regex, this is enough to make the sre->procedure call break the same semantics. Furthermore, in the general case where we do not restrict ourselves to alternatives of simple structure to construct a minimal >>91 failure case, the trick of reordering the alternatives cannot save the current (or). Consider "(aaa)*(ba)*" and "(aaaaa)*(ab)*". If we put the 3-group first, we can test on "aaaaaabab":
$ guile -l irregex.scm
[...]
scheme@(guile-user)> (define (imsis re str) (irregex-match-substring (irregex-search re str)))
scheme@(guile-user)> (imsis "((aaa)*(ba)*|(aaaaa)*(ab)*){1,1}" "aaaaaabab")
$1 = "aaaaaaba"
which leaves off the final "b". The second branch would have taken one more character:
scheme@(guile-user)> (imsis "((aaaaa)*(ab)*){1,1}" "aaaaaabab")
$2 = "aaaaaabab"
If we put the 5-group first, we can test on "aaaaaababa":
scheme@(guile-user)> (imsis "((aaaaa)*(ab)*|(aaa)*(ba)*){1,1}" "aaaaaababa")
$3 = "aaaaaabab"
which leaves off the final "a". The second branch would again have taken one more character:
scheme@(guile-user)> (imsis "((aaa)*(ba)*){1,1}" "aaaaaababa")
$4 = "aaaaaababa"
The test string can be grown to arbitrary size by prefixing groups of 15 'a's. There is no trick that can substitute in the general case for the (or) properly searching its match space and returning the match candidates in non-increasing order of length.
[2/5]
As for (**), the reason it breaks is only marginally more complicated. To illustrate the explanation with an example, we begin by replacing the string atom matcher with a thin proxy that reports the progress of the computation:
$ TZ=GMT diff -u schemebbs/deps/irregex.scm edit/deps/irregex.scm
--- schemebbs/deps/irregex.scm 2020-02-17 21:53:31.563445679 +0000
+++ edit/deps/irregex.scm 2020-02-29 02:01:57.844878000 +0000
@@ -3508,7 +3508,10 @@
(fail))))
))
((string? sre)
- (rec (sre-sequence (string->list sre)))
+ (let ((sub (rec (sre-sequence (string->list sre)))))
+ (lambda (cnk init src str i end matches fail)
+ (simple-format #t "trying ~A at ~A\n" sre i)
+ (sub cnk init src str i end matches fail)))
;; XXXX reintroduce faster string matching on chunks
;; (if (flag-set? flags ~case-insensitive?)
;; (rec (sre-sequence (string->list sre)))
The (**) branch is:
((**)
(cond
((or (and (number? (cadr sre))
(number? (caddr sre))
(> (cadr sre) (caddr sre)))
(and (not (cadr sre)) (caddr sre)))
(lambda (cnk init src str i end matches fail) (fail)))
(else
(letrec
((from (cadr sre))
(to (caddr sre))
(body-contents (sre-sequence (cdddr sre)))
(body
(lambda (count)
(lp body-contents
n
flags
(lambda (cnk init src str i end matches fail)
(if (and to (= count to))
(next cnk init src str i end matches fail)
((body (+ 1 count))
cnk init src str i end matches
(lambda ()
(if (>= count from)
(next cnk init src str i end matches fail)
(fail))))))))))
(if (and (zero? from) to (zero? to))
next
(lambda (cnk init src str i end matches fail)
((body 1) cnk init src str i end matches
(lambda ()
(if (zero? from)
(next cnk init src str i end matches fail)
(fail))))))))))
Here is a talkative run on the last example of >>95:
scheme@(guile-user)> (imsis '(** 3 4 (or "a" "ab")) "abababab")
trying a at 0
trying a at 1
trying ab at 1
trying ab at 0
trying a at 2
trying a at 3
trying ab at 3
trying ab at 2
trying a at 4
trying a at 5
trying ab at 5
$1 = "ababa"
scheme@(guile-user)>
The "a at 0" and "a at 2" are both retried with "ab" because the overall repeat counts they yielded were 1 and 2, under the lower limit of 3. These retries happen on the alternate of the innermost conditional of 'body'. But after "a at 4", which makes both attempts at 5 fail, there is no retry with "ab at 4". This is because the produced repeat count is now 3, and the same conditional declares it a success because extending to 4 repeats failed. There is no attempt to extend with further retries, even though it would obviously work since the target string is composed of 4 "ab"s.
The () branch is not interested in "leftmost, longest", it stops at the first repeat count extension failure within the allowed range. The only way it can produce "leftmost, longest" on its own is to only have repeat count extension failures under the lower limit, but smooth sailing from the lower limit to the upper limit. Since the () might be the entire regex, this is also enough to make the sre->procedure call break the same semantics.
[3/5]
However, I promised the cold, hard truth and this was only the cold part, so here is the rest. The disregard for "leftmost, longest" in sre->procedure is not limited to (or) and (**), it is present in all branches that can take multiple paths. This is meant to be salvaged by the external user of sre->procedure via the 'fail' lambda that is returned as part of the 'matches' object in the named let lp. On a successful match, 'fail' is actually the retry continuation. This is how irregex-match works. Its driver for sre->procedure is the 'else' branch of irregex-match/chunked:
(define (irregex-match/chunked irx cnk src)
(let* ((irx (irregex irx))
(matches (irregex-new-matches irx)))
(irregex-match-chunker-set! matches cnk)
(cond
((irregex-dfa irx)
[...]
(else
(let* ((matcher (irregex-nfa irx))
(str ((chunker-get-str cnk) src))
(i ((chunker-get-start cnk) src))
(end ((chunker-get-end cnk) src))
(init (cons src i)))
(let lp ((m (matcher cnk init src str i end matches (lambda () #f))))
(and m
(cond
((and (not ((chunker-get-next cnk)
(%irregex-match-end-chunk m 0)))
(= ((chunker-get-end cnk)
(%irregex-match-end-chunk m 0))
(%irregex-match-end-index m 0)))
(%irregex-match-fail-set! m #f)
m)
((%irregex-match-fail m)
(lp ((%irregex-match-fail m))))
(else
#f)))))))))
Whenever there is a match that does not exhaust the input, and a retry continuation exists, the retry is called by the "(lp ((%irregex-match-fail m)))" branch. This means that if a full match is possible, it will be found. Here is the above (**) example with irregex-match and one more debug print:
scheme@(guile-user)> (define (imsim re str) (irregex-match-substring (irregex-match re str)))
scheme@(guile-user)> (imsim '(** 3 4 (or "a" "ab")) "abababab")
trying a at 0
trying a at 1
trying ab at 1
trying ab at 0
trying a at 2
trying a at 3
trying ab at 3
trying ab at 2
trying a at 4
trying a at 5
trying ab at 5
retry by irregex-match/chunked
trying ab at 4
trying a at 6
retry by irregex-match/chunked
trying ab at 6
$1 = "abababab"
scheme@(guile-user)>
Irregex-match/chunked has to override sre->procedure's result twice to get the full match.
[4/5]
By contrast, irregex-search's driver for sre->procedure is irregex-search/backtrack:
(define (irregex-search/backtrack irx cnk init src i matches)
(let ((matcher (irregex-nfa irx))
(str ((chunker-get-str cnk) src))
(end ((chunker-get-end cnk) src))
(get-next (chunker-get-next cnk)))
(if (flag-set? (irregex-flags irx) ~searcher?)
(matcher cnk init src str i end matches (lambda () #f))
(let lp ((src2 src)
(str str)
(i i)
(end end))
(cond
((matcher cnk init src2 str i end matches (lambda () #f))
(irregex-match-start-chunk-set! matches 0 src2)
(irregex-match-start-index-set! matches 0 i)
matches)
((< i end)
(lp src2 str (+ i 1) end))
(else
(let ((src2 (get-next src2)))
(if src2
(lp src2
((chunker-get-str cnk) src2)
((chunker-get-start cnk) src2)
((chunker-get-end cnk) src2))
#f))))))))
If there was no match, it advances one character or acquires more input. But if there was a match, it is immediately returned. What is completely missing is any backtracking to attempt to find a longer match at the current position, to achieve "leftmost, longest". This procedure performs backtracking in the same way that the DPRK is democratic, all of the backtracking is in the procedure name, none in the code. According to git blame, this way of doing backtracking is from the "http-url fix to support multiple directories in the path" commit by ashinn on "Aug 15, 2012".
https://github.com/ashinn/irregex/commit/3de590ee0b517ab786a42a6a75922890e7972acf
Here is an example to illustrate the underlying issue. Consider 5-groups and 3-groups on an input of 9:
scheme@(guile-user)> (imsim '(** 1 1 (+ (or "aaaaa" "aaa"))) "aaaaaaaaa")
trying aaaaa at 0
trying aaaaa at 5
trying aaa at 5
trying aaaaa at 8
trying aaa at 8
retry by irregex-match/chunked
retry by irregex-match/chunked
trying aaa at 0
trying aaaaa at 3
trying aaaaa at 8
trying aaa at 8
retry by irregex-match/chunked
trying aaa at 3
trying aaaaa at 6
trying aaa at 6
trying aaaaa at 9
trying aaa at 9
$1 = "aaaaaaaaa"
scheme@(guile-user)>
In order to get the longest overall match, the 5-group needs to back off twice, at 0 and at 3. The (or) itself did nothing wrong here, it provided the longest match it could at each step, due to the order of alternatives. It just so happens that in this case those are not the paths that lead to overall success. In the general case, proper backtracking is not a gentle suggestion but a hard requirement. As this example shows, and as one might recall from paying attention in college, the crux of the matter is this: regex matching can possess global maxima that are assembled entirely from local minima.
[5/5]
And as a bonus, here is a DOS attack on irregex. The (+) branch of sre->procedure is:
((+)
(cond
((sre-empty? (sre-sequence (cdr sre)))
(error "invalid sre: empty +" sre))
(else
(letrec
((body
(lp (sre-sequence (cdr sre))
n
flags
(lambda (cnk init src str i end matches fail)
(body cnk init src str i end matches
(lambda ()
(next cnk init src str i end matches fail)
))))))
body))))
It unconditionally loops its subexpression and only concedes when that fails. It does not even give one neutrino that the input might be exhausted. Protection from being passed a null matcher is provided by the sre-empty? test. Sre-empty? looks like this:
;; returns #t if the sre can ever be empty
(define (sre-empty? sre)
(if (pair? sre)
(case (car sre)
((* ? look-ahead look-behind neg-look-ahead neg-look-behind) #t)
((**) (or (not (number? (cadr sre))) (zero? (cadr sre))))
((or) (any sre-empty? (cdr sre)))
((: seq $ submatch => submatch-named + atomic)
(every sre-empty? (cdr sre)))
(else #f))
(memq sre '(epsilon bos eos bol eol bow eow commit))))
It handles pairs and symbols. Irregex represents string atoms with raw strings, so that we can write '(or "a" "b") instead of having to write something like '(or (s "a") (s "b")). This means that strings can be SREs and they have their own branch in sre->procedure, it is the (string? sre) branch that we patched above for debug prints. But sre-empty? returns #f for everything it doesn't care about, and it doesn't care about strings. Naturally, this is an invitation to pass the empty string to (+) via SREs, e.g.:
scheme@(guile-user)> (sre-empty? '(or "aa" ""))
$1 = #f
scheme@(guile-user)> (imsis '(** 1 1 (+ (or "aa" ""))) "aaaa")
This will keep one of your CPU cores on 100% for a while. According to git blame, this sre-empty? is from the same commit by ashinn as the backtracking. In light of the comment above the procedure, "can ever be empty", here is how complicated the fix is: you have to consider the possibility that the empty string might sometimes be empty. Until you do, this is an easy DOS where SREs are directly accepted. The PCRE equivalent is:
scheme@(guile-user)> (imsis "(aa|){1,}" "aaaa")
which will also spin up your fans, and uses the fact that the author thought it was perfectly reasonable to add null matcher protection to * and + but not to upper unbounded ranges, neither in string->sre nor in sre->procedure. As I have already asked in >>91 perhaps somebody could bring the shinnoid up to speed.
And now you can have your thread back. I'm done with this irregex thing.
Where you have the bold nonsense in [2/5] just imagine (**) for ().
>>91-95,97,100-105
Well, that's an impressive debugging job. Just like Bruce Lee, you don't want to do anything halfway, it has to be perfect. And you're not to be messed with. (^.~)☆
I think this might be worth telling the shinnoid. I cannot see any obvious contact info on http://synthcode.com/scheme/irregex/ but his git commits on https://github.com/ashinn/irregex are by "Alex Shinn <alexshinn@gmail.com>". Well, at least he's not one of those protonmail people. If someone could drop him a line, that would be great.
Let's file a PR then, maybe it's worth linking to this thread?
Anyway, using Alex Shinn's irregex
(or Dorai Sitaram's pregexp
) wasn't absolutely necessary since MIT Scheme provides its own star-parser
.
And now you can have your thread back.
Yep. And we'll be back soon with very good news.
>>106
Anons can send him whatever they wish, I have no control over that. In my opinion he might be sent at least the minimal failure case and the DOS. From there he can figure things out himself faster than I did, since he knows his own library. As for a PR however, I do not care for sites that require any of: accounts, verification and enabling remote code execution, so I will not be interacting with his github myself.
And we'll be back soon with very good news.
Let me guess. You've seen the light and decided to ditch your slowpoke posts-range/filter-func for the superior version.
Let me guess. You've seen the light and decided to ditch your slowpoke posts-range/filter-func for the superior version.
That also. I haven't forgotten about >>64-66.
>>109
>>110
The performance improvement is insignificant. The problem is clearly not the speed of calculating post ranges or iterating through a list whose length is less than 300.
``optimized'' quoted links are a legacy from 2channel and Shiichan, that's why they were implemented. They're the only dynamic content served, everything else is cached as HTML files. The bottleneck is the HTML generation from S-expressions. (btw SchemeBBS doesn't need Nginx, it works standalone, except for the CSS hack)
Futaba and other imageboards dropped that feature: they only have single ``unoptimized'' quotes, linking to an anchor in the thread. Someone even wrote a userscript to mimic this behaviour.
I have to rethink this. Of course, Nginx does some caching, so if you repeat the same query over and over that won't be a problem. Randomized requests on post ranges would be a disaster though. Flat file storage... Trade-offs between storing whole threads as HTML files or storing single posts in directories and having multiple disk accesses to display a single thread...
[1/2]
The performance improvement is insignificant. The problem is clearly not the speed of calculating post ranges
insignificant
clearly
$ cat test.scm
(define *max-posts* 300)
; Splits the input string 'str into a list of strings
; based on the delimiter character 'ch
; © (Doug Hoyte, hcsw.org)
(define (string-split str ch)
(let ((len (string-length str)))
(letrec
((split
(lambda (a b)
(cond
((>= b len) (if (= a b) '() (cons (substring str a b) '())))
((char=? ch (string-ref str b))
(if (= a b)
(split (+ 1 a) (+ 1 b))
(cons (substring str a b) (split b b))))
(else (split a (+ 1 b)))))))
(split 0 0))))
(define (flatten l)
(letrec
((flat
(lambda (l acc rest)
(cond ((null? l)
(if (null? rest)
(reverse acc)
(flat (car rest) acc (cdr rest))))
((pair? (car l))
(flat (car l) acc (if (null? (cdr l))
rest
(cons (cdr l) rest))))
(else
(flat (cdr l) (cons (car l) acc) rest))))))
(flat l '() '())))
(define (posts-range range)
(define (expand-range x)
(cond ((> (length x) 1)
(let* ((a (string->number (car x)))
(b (string->number (cadr x)))
(low (if (> a *max-posts*) *max-posts* a))
(high (if (> b *max-posts*) *max-posts* b))
(count (+ (- high low) 1)))
(if (> high low) (iota count low) low)))
(else (string->number (car x)))))
(let* ((r1 (string-split range #\,))
(r2 (map (lambda (x) (string-split x #\-)) r1))
(r3 (flatten (map expand-range r2))))
(sort (delete-duplicates r3) <)))
(define (posts-range64 range)
(define (expand-range x)
(cond ((> (length x) 1)
(let* ((a (string->number (car x)))
(b (string->number (cadr x)))
(low (if (> a *max-posts*) *max-posts* a))
(high (if (> b *max-posts*) *max-posts* b))
(count (+ (- high low) 1)))
(if (> high low)
(lambda () (iota count low))
(lambda () (list low)))))
(else (let* ((a (string->number (car x)))
(low (if (> a *max-posts*) *max-posts* a)))
(lambda () (list low))))))
(define (invoke-loop-set vector lamb)
(for-each (lambda (e) (vector-set! vector e #t))
(lamb)))
(let* ((r1 (string-split range #\,))
(r2 (map (lambda (x) (string-split x #\-)) r1))
(r3 (map expand-range r2))
(vec (make-vector (+ *max-posts* 1) #f)))
(for-each (lambda (e) (invoke-loop-set vec e))
r3)
vec))
(define (timeit proc)
(with-timings proc
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline))))
(define (stress n) (apply string-append (cons "1-300" (make-list n ",1-300"))))
[2/2]
$ mit-scheme --load test.scm
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Tuesday February 6, 2018 at 6:31:25 PM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "test.scm"... done
[...]
1 ]=> (timeit (lambda () (posts-range (stress 20))))
.02 0. .025
;Value 16: (1 2 3 [...] 300)
1 ]=> (timeit (lambda () (posts-range64 (stress 20))))
0. 0. 0.
;Value 17: #(#f #t #t [...] #t)
[...]
1 ]=> (timeit (lambda () (posts-range (stress 182))))
.15 .18 .328
;Value 23: (1 2 3 [...] 300)
1 ]=> (timeit (lambda () (posts-range64 (stress 182))))
.01 .01 .018
;Value 24: #(#f #t #t [...] #t)
1 ]=> (timeit (lambda () (posts-range (stress 183))))
;Aborting!: out of memory
;GC #45: took: 0.20 (100%) CPU time, 0.10 (100%) real time; free: 16769613
;GC #46: took: 0.10 (100%) CPU time, 0.10 (93%) real time; free: 16769646
1 ]=> (timeit (lambda () (posts-range64 (stress 183))))
.01 0. .017
;Value 25: #(#f #t #t [...] #t)
I guess speeding up your code by a factor of 10+ and no longer breaking the memory limit just doesn't appeal to some people.
I guess speeding up your code by a factor of 10+ and no longer breaking the memory limit just doesn't appeal to some people.
Don't get me wrong, I've applied the patch, even pushed it to the repo. Better code is always welcome.
But your stress test is on range calculation, mine is on the requests/second. Generating HTML is what takes long and eats memory, not generating numerical posts range. And by a huge factor.
On a DoS test, the code in the master branch is slowed down to an average of 1.51 req/s while the code in the optimized filter branch manage a whopping 1.7 req/s. It's good, not negligible. But it's not enough, there's much more optimization to be done.
If you'd like to test it yourself, the only thing missing thing to boot SchemeBBS was data directories and index files for a dummy board. I've added a board named "foo" in the repo and a shell script to create as many boards as you want. SchemeBBS works standalone and out of the box (well, provided you've patched the files mit-scheme-9.2/src/lib/runtime/http-syntax.scm
and mit-scheme-9.2/src/runtime/httpio.scm
which use outdated RFCs)
Did anon not read about premature optimization?
Disk size is definitely not a problem for textboards. But data replication always feels ugly.
Here's how flat file storage is organized. Let's say we have board1 with 3 posts and board2 with 2 posts.
data/
├── html
│ ├── board1
│ │ ├── 1
│ │ ├── 2
│ │ ├── 3
│ │ ├── index
│ │ └── list
│ └── board2
│ ├── 1
│ ├── 2
│ ├── index
│ └── list
└── sexp
├── board1
│ ├── 1
│ ├── 2
│ ├── 3
│ ├── index
│ └── list
└── board2
├── 1
├── 2
├── index
└── list
SchemeBBS will serve static HTTP files if it can find them (but this is actually done directly and much more efficiently by Nginx.) If there's no HTTP file, then it will generate HTML from S-Expressions. Threads in sexp
are lists of posts. It made sense for complete threads but vectors would have been more suitable for random access with posts range. The choice wasn't entirely rational, I wanted lists.
Ok. Here's the plan
[x] Make some pancakes - done
[ ] Release SchemeBBS 1.0 (with Anon's optimized code)
[ ] Start working on a version with a small database (IIRC the only options are ==gdbm== and SLIB's relational database which wasn't usable.)
There was an attempt to write bindings for sqlite3 but I don't believe it's in a usable state: https://lists.gnu.org/archive/html/mit-scheme-devel/2013-05/msg00011.html
But your stress test is on range calculation, mine is on the requests/second. Generating HTML is what takes long and eats memory, not generating numerical posts range. And by a huge factor.
All of this is true, just as it is also true that if delete-duplicates is allowed to abort on memory limit the code won't even get to the sxml phase, let alone sxml->html conversion.
Since you have a deployed instance, do you have a profiled run of a request so we can see which parts of the scheme code take the most time in the full request serving pipeline?
And if you are considering 1.0, please replace lib/markup.scm:line-scanner because that amount of copypasting hurts my eyes. Before:
(define (line-scanner l)
(let ((b (partial lines->sxml bold))
(i (partial lines->sxml italic))
(tt (partial lines->sxml code))
(ql (partial lines->sxml quotelink))
(a (partial lines->sxml link))
(spoiler (partial lines->sxml del)))
((compose spoiler tt a ql b i) l)))
After:
(define line-scanner-order (list
del code link quotelink bold italic))
(define (line-scanner l)
((apply compose (map (lambda (tr) (partial lines->sxml tr)) line-scanner-order)) l))
>>114
The posts-range function wasn't randomly selected for grilling, but because the previous version could be and was used to bring down the site. Additionally, the admin's filter-func recomputed the exact same (posts-range range) for every single invocation.
>>119-120
Branches have been merged.
And if you are considering 1.0, please replace lib/markup.scm:line-scanner because that amount of copypasting hurts my eyes.
Commit f6105666
authored 2 minutes ago: remove an inelegant amount of copypasting hurting Anon's eyes
I had a crazy idea when I wrote SchemeBBS, something quite amusing and never seen before in any textboards, but of course, back then, I ran short of time. I have the motivation to try and implement it now. I'm starting a new branch, stay tuned
Branches have been merged.
remove an inelegant amount of copypasting hurting Anon's eyes
Thanks, adminoid, you might be a good guy after all.
I wasn't going to comment on this since it's minor, but I see you have a commit named "typo". In README.md you have:
SchemeBBS should not directly serves client
You probably meant "serve clients".
>>123
There's no way I would have left a typo like that in the README for over a year! Yeah, I just checked, it says "SchemeBSS should not directly serve clients" right there in the README. You know how sometimes bytes come all garbled from the internet tubes. It happens to me all the time when I use unreliable networks such as my neighbor's WiFi. You should have a word with your ISP about this issue and I'm glad we sorted the problem out. I'd have died of shame and embarrassment otherwise.
>>124
Right. Please take it easy on the drugs.
This one was even worse. Not because of the typo but because of the absurd statement:
"this stub is here to stay"
"can't user(sic) blowfish or mcrypt, so..."
1 ]=> (md5-sum->hexadecimal (md5-string "fakenews"))
;Value 14: "3aefb76f8d53d8d1f7b160df9d2ac56d"
Unfortunately, I don't do drugs. I can't use that as an excuse, what happened then?
On the other hand, a lot of functions are undocumented in MIT Scheme, and you have to read the source code if you want to use them. And there isn't exactly a huge community of users outside of the academic world. I wouldn't know where to ask for help. Anyway, the cryptography does seem usable. I could try to implement tripcodes for fun.
1 ]=> (mcrypt-algorithm-names)
;Unassigned variable: mcrypt-algorithm-names-vector
;To continue, call RESTART with an option number:
; (RESTART 3) => Specify a value to use instead of mcrypt-algorithm-names-vector.
; (RESTART 2) => Set mcrypt-algorithm-names-vector to a given value.
; (RESTART 1) => Return to read-eval-print level 1.
2 error> (restart 1)
;Abort!
1 ]=> (mcrypt-available?)
;Loading "/usr/local/lib/mit-scheme-x86-64/lib/prmcrypt.so"... done
;Value: #t
Whatever, let's go on.
1 ]=> (mcrypt-algorithm-names)
;Value 15: ("tripledes" "rc2" "enigma" "blowfish" "xtea" "serpent" "rijndael-256" "des" "blowfish-compat" "wake" "saferplus" "rijndael-192" "loki97" "cast-256" "arcfour" "twofish" "rijndael-128" "gost" "cast-128")
1 ]=> (mcrypt-mode-names)
;Value 16: ("stream" "ofb" "nofb" "ncfb" "ecb" "ctr" "cfb" "cbc")
So far, so good. In src/runtime/crypto.scm
there's a function mcrypt-encrypt
with the following signature:
(define (mcrypt-encrypt context input input-start input-end output output-start encrypt?) "...")
A context can be defined like that: (define c (mcrypt-open-module "des" "ecb"))
, encrypt?
is a boolean, #t
for encryption, #f
for decryption.
... and I'm stuck.
I could try to implement tripcodes for fun.
I hope you realize that a script would be posted implementing the usual functionality of replacing the body of any post with a trip with "I am a child seeking attention" or similar.
>>127
I don't intend to use them on this site. You can't even have a name here, you're forced to be a number.
But why not provide a full (allmost) textboard implementation and let the potential user chose names and trips if they feel so inclined? I do hope that, in those circumstances, someone will make a script replacing the body of any post with a trip with "I am a child seeking attention". That's the right thing to do.
In all fairness, tripcodes weren't the reason I needed cryptographic functions. I envisioned a bot trap with tokens. They'd double as a permanent API key for Emacs/Scheme clients users.
>>45
I thought the Unbound variable: nmv-header?
bug might be fixed in the git version but it does not build.
(echo '(with-working-directory-pathname "cref"' && \
echo ' (lambda () (load "cref.cbf")))') \
| 'mit-scheme-x86-64' --batch-mode --no-init-file --load runtime/host-adapter.scm --eval '(begin )'
;Loading "cref.cbf"... done
(echo '(with-working-directory-pathname "runtime"' && \
echo ' (lambda () (load "runtime.cbf")))') \
| 'mit-scheme-x86-64' --batch-mode --no-init-file --load runtime/host-adapter.scm --eval '(begin )'
;Loading "runtime.cbf"... done
(. etc/functions.sh && get_fasl_file && cd runtime \
&& (echo '(disk-save "../lib/runtime.com")' \
| ../run-build --batch-mode --fasl "${FASL}"))
Bad compiled-code version in FASL File: make.com
File has: compiled-code interface 3; architecture 14.
Expected: compiled-code interface 4; architecture 14.
Microcode Error: No error handlers.
Error code 0x3a (fasload-compiled-mismatch).
**** Stack Trace ****
{0x80ff88}
Return code: [return-code pop-return-error]
Expression: #F
{0x80ff98}
Return code: [return-code internal-apply]
Expression: #F
{0x80ffa8} ...: [false 0x2]
{0x80ffb0} ...: [primitive BINARY-FASLOAD]
{0x80ffb8} ...: "make.com"
{0x80ffc0}
Return code: [return-code combination-save-value]
Expression: [combination [primitive SCODE-EVAL] ... (2 args) 0x857b20]
{0x80ffd0} ...: #F
{0x80ffd8} ...: [manifest-nm-vector 0x1] (skipping)
{0x80ffe8} ...: #F
{0x80fff0}
Return code: [return-code non-existent-continuation]
Expression: #F
No error handler.
make[1]: *** [Makefile:821: lib/runtime.com] Error 1
make[1]: Leaving directory '/home/ben/mit-scheme/mit-scheme-10.1-git/src'
make: *** [Makefile:699: all] Error 2
Anon should consider starting a Scheme imageboard software consulting firm.
So... are you guys going to use SQLite?
>>131
Someone tried to write bindings for sqlite3 once: https://lists.gnu.org/archive/html/mit-scheme-devel/2013-05/msg00005.html
I don't think it's usable.
gdbm is the only database choice at the moment. Berkeley DB might be another option but I haven't tried it yet.
"Cleaning" is not a useful commit message, Bitdiddle. I'm certain you have the ability to write a few words on the substance of the change.
>>133
I know you're watching and that's a lot of stress. The "cleaning" message are not the problem! I f*ing added 78 MB of recompiled MIT Scheme binaries in the commit log to provide the (x86-64) user with a straightforward way to install and use SchemeBBS with my modified distribution of MIT Scheme, so that she doesn't have to install MIT Scheme before recompiling MIT Scheme and instalingl MIT Scheme again*. Then I realized that I should end myself. Cloning the repo is going to take forever now. Talk about small codebase!
There will be cleaning. Real cleaning. The only commit you're going to see for SchemeBBS 1.0 is gonna be "Initial Commit"
https://gitlab.com/naughtybits/mit-scheme-9.2
____
* It was automated with a build script but still...
>>134
commit message*
Since this has become a general bug-reporting thread, I'd like to point out that for some reason in http://textboard.org/sexp/prog/39, >>113 contains a symbol (mit-scheme-9.2/src/runtime/httpio.scm):
(p "If you'd like to test it yourself, the only thing missing thing to boot SchemeBBS was data directories and index files for a dummy board. I've added a board named \"foo\" in the repo and a shell script to create as many boards as you want. SchemeBBS works standalone and out of the box (well, provided you've patched the files " (code "mit-scheme-9.2/src/lib/runtime/http-syntax.scm") " and (code " mit-scheme-9.2/src/runtime/httpio.scm ") which use outdated RFCs)")
>>136
Man edited file. Never do that.
From "wget http://textboard.org/sexp/prog/39":
(p "If you'd like to test it yourself, the only thing missing thing to boot SchemeBBS was data directories and index files for a dummy board. I've added a board named \"foo\" in the repo and a shell script to create as many boards as you want. SchemeBBS works standalone and out of the box (well, provided you've patched the files " (code "mit-scheme-9.2/src/lib/runtime/http-syntax.scm") " and " (code "mit-scheme-9.2/src/runtime/httpio.scm") " which use outdated RFCs)")
From your post:
(p "If you'd like to test it yourself, the only thing missing thing to boot SchemeBBS was data directories and index files for a dummy board. I've added a board named \"foo\" in the repo and a shell script to create as many boards as you want. SchemeBBS works standalone and out of the box (well, provided you've patched the files " (code "mit-scheme-9.2/src/lib/runtime/http-syntax.scm") " and (code " mit-scheme-9.2/src/runtime/httpio.scm ") which use outdated RFCs)")
Differences start at: ^^^
>>134
I think I've found a way out of this mess.
>>26
I've added those lines to the top of the server section of an nginx.conf file, and it does seem to work (after installing the nginx module posted by >>29).
My only question is about the first line: What is the value of the bypass variable? I assume that it would be some sort of match for pages that do not make use of the shared CSS files, but it would be nice to know if there's more to it.
>>140
Also, sorry to be somewhat offtopic, but as another anon said this has become something of a bug report thread.
Right now all of the threads on https://textboard.org/prog/list/ are being linked to incorrectly.
This seems to be caused by the last "/" in the URL.
>>140
There are two modules used:
https://github.com/yaoweibin/ngx_http_substitutions_filter_module
https://github.com/aperezdc/ngx-fancyindex
The first one is used for the ``css toggle without cookies'' hack and the second is just to make nice directory listings.
I think some variables were missing in the post where I explained the css switching, so here's the part that takes care of adding the query strings to all links:
server {
if ( $query_string = "" )
{
set $bypass "1";
}
#replace css and add query string to all internal links
subs_filter_bypass $bypass;
subs_filter '<LINK href="/static/styles/(.*?).css"' '<LINK href="/static/styles/$arg_css.css"' or;
subs_filter '<A href="((?!http).*?)(#.*?)?"' '<A href="$1$is_args$args$2"' gr;
subs_filter '<FORM action="(.*?post)"' '<FORM action="$1$is_args$args"' gr;
}
I call it a hack because in a perfect world, elements of your app should be loosely coupled and SchemeBBS would handle this, not Nginx. But sometimes you have to chose the way of the ninja especially for nice but non vital features.
cont. (I hit the post length limit. I got a new server today with decent specs so the site will be migrated and the post limit raised)
>>140,142
You'll also probably want to serve static files directly with nginx, even if SchemeBBS can serve them too. And you probably want to cache requests. So, here's a fairly complete nginx.conf
:
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=cache:10m inactive=600s max_size=100m;
#limit post requests
map $request_method $limit {
default "";
POST $binary_remote_addr;
}
# Creates 10mb zone in memory for storing binary ips
limit_req_zone $limit zone=post_limit:10m rate=11r/m;
upstream http_backend {
keepalive 20;
server 127.0.0.1:8080;
}
server {
listen 80;
server_name tld.org www.tld.org;
set $prefix "/path/to/schemebbs"
# site root, a static page
location = / {
rewrite ^ /static/index.html;
}
location = /favicon.ico {
rewrite ^ /static/favicon.ico;
}
# static files
location /static/ {
alias $prefix/static/;
}
# serve s-expressions as static files
location /sexp {
alias $prefix/data/sexp/;
autoindex on;
default_type text/x-scheme;
fancyindex on;
fancyindex_time_format "%F %R";
fancyindex_footer "/static/lisp.html";
}
location / {
root $prefix/data/html;
default_type text/html;
index index;
try_files $uri $uri/index @schemebbs;
}
#replace css and add query string to all internal links
#snippet above goes here
location @schemebbs {
proxy_intercept_errors on;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Accept-Encoding "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache cache;
proxy_cache_key $scheme$host$request_method$request_uri;
proxy_cache_valid 200 30s;
proxy_pass http://http_backend;
}
error_page 400 /400.html;
error_page 403 /403.html;
error_page 404 /404.html;
error_page 405 /405.html;
error_page 500 /500.html;
error_page 502 /502.html;
error_page 503 /503.html;
error_page 504 /504.html;
error_page 429 /429.html;
location ~ ^/(400|403|404|405|429|500|502|503|504)\.html {
root $prefix/static/errors;
}
}#end server
}#end http
>>140
As you can see, with the try_files
directive, Nginx will first search for the file to serve it without calling Scheme and if it can't find the resource asked, the request is passed to SchemeBBS. Threads, index and thread list are generated on demand and not after posting, once again because of performance considerations. The modest webapp runs on the good old academic MIT Scheme, not Scala or node.js. I didn't want to let the poster wait with a "That was VIP quality/here's a candle flag on the Moon" message trick. Users don't like to wait.
Right now all of the threads on https://textboard.org/prog/list/ are being linked to incorrectly. This seems to be caused by the last "/" in the URL
Indeed, it shouldn't be hard to fix. It's funny nobody noticed earlier, thanks.
>>140,140-144
Finally, another cheap optimization to make up for the fact that SchemeBBS isn't a ``highly scalable concurrent webapp'' and that it is a running on a $3.99/month VPS: posting and processing the markup shouldn't block readers, so why not launching two images of SchemeBBS? Some more Nginx ninjutsu to the rescue:
http {
upstream http_backend_GET {
keepalive 20;
server 127.0.0.1:8080;
}
upstream http_backend_POST {
keepalive 20;
server 127.0.0.1:8081;
}
server {
location @schemebbs {
[...]
#backend for POST and GET
proxy_pass http://http_backend_$request_method;
}
}
}
Then you simply run two instances of SchemeBBS one listening on port 8080 and the other one on port 8081. As an added benefit you get a basic admin interface with a DEFCON kill switch enabling the sysop to turn the board in read only mode! You just have to kill the POST backend and quickly implement BOPM blacklist, bayesian spam filtering and ban management. I've heard Lisp is good for this kind of things!
>>142
Different Anon. Thanks for the new nginx.conf bits.
I think some variables were missing in the post where I explained the css switching, so here's the part that [...]
This was pointed out to you in >>85 on 2020-02-24, 47 days ago.
$ date "+%Y-%m-%d"
2020-04-11
$ date --date="47 days ago" "+%Y-%m-%d"
2020-02-24
>>142-143
Thanks for the conf. It's pretty different to the very basic one that I was using before, so I'll see if I can get it operational tomorrow.
Are there any running SchemeBBS instances, besides this one?
>>148
I've just been messing around with the codebase and seeing what I can do. Maybe someday I'll make a proper instance though. I haven't seen any links posted to other instances anywhere, so I'm guessing that it's the same for others in this thread.
>>146
I'm truly sorry for slacking like that. Please do insist if something is missing.
Did you manage to patch MIT Scheme?
There's a pre-patched binary compiled for x86_64 at https://textboard.org/static/mit-scheme-9.2 but you should patch it yourself against the official distribution of MIT Scheme. You need to have a working installation of MIT Scheme 9.2 to compile the patched source of MIT Scheme 9.2. It's a bit tedious but without those modifications SchemeBBS won't run. The http libraries in MIT Scheme 9.2 have bugs that I had to fix.
Except that part, the installation of SchemeBBS should be straightforward. I've added a small shell script to create new boards easily. You can have as many as you want. The plan was to let users create an infinite number of boards but I removed the functionality. It's easy to put it back.
Anyway the released custom distribution of MIT Scheme is signed and here's my public key:
-----BEGIN PGP PUBLIC KEY BLOCK-----
xsBNBF6Ca3UBCADdkiFnrbTVxXrpJjk15TsQalzRrWx5t+dBq05Ri43KqKbHZWtq
a93GpWulnuG7fqU/pQaWtGW0dUua4EPuFW5NqfcvFtdaOZtFPm5Vkn6go+rpeJCi
+0ny9WeQxJai+5berNYKcvjVP5FCiiMbfbnNCUBMAQTnpdnJRDRsvHtn+9daVAzA
fLObQM+L0XVzpFI9xnVwZCCZKosRmuWfNRQ1QxFe3fGgD/07BtcsTlqK9YOyhND7
zigSgknfyddlq84FwStmnBeEuVqQmZi4p3NvH9rH3TKf7JtWpGHITJXvfP3LaTW1
C42oQ4nfhUIeNUyCm/sqkgWM2rLnoO3IQh7lABEBAAHNLEJlbiBCaXRkaWRkbGUg
PGJlbi5iaXRkaWRkbGVAcHJvdG9ubWFpbC5jb20+wsCUBBMBCAA+FiEEfsZQ21i8
jErBlcPxSU4XzT6Y9gkFAl6Ca3UCGwMFCQHhM4AFCwkIBwIGFQoJCAsCBBYCAwEC
HgECF4AACgkQSU4XzT6Y9gkzYAf6Ausl8leNwpwkPLUlaxq2pHzjzp0OR9TJUS1e
xsAo/w+yTiGGlPywI40bRn5iGV7gTOzsAbtLp/7sw2la9Emj31h6TeVkFNUvBulI
GppSA3KOHYgbzO//2cvUr3TIBAamdTpuc9LLKa/hqFPRpQBvo7lUlsuSCUaUYPXt
qQL+QvsbRFObm/K/York1TyDVtR/eoOthmIFwXjh52SnX7yuwJSYBH60wqD6A5oF
hyaQqQKamOvqyl4dgSSgLiilZ8nzmY+A9Ybky1jE3ILBnbU6jd7EBko3hm7SC/7T
6YdUjblu97I7ZXlQJYzzYdY5OmDej4zdEwdHsHHsyf4DSnUif87ATQRegmt1AQgA
srkpPnUOG7mYYWG99mEWk3VSWf0dKiVjMNMDS4M+ZsTXN6qZ2WLjIUYPN0yDFt1F
XNBukgVLLKj0kbDrHn6/9CiOUz0n8fYshOoV5E1i+uHDLL0OcxpJ9lLyldG3Isqw
IklCrg3XcZEbPFx3+OkjHtho+rug4w2Uqu2keCw4bmcvjEq0qrMeHSYJEdTbdUsJ
GuQ7hkpdYvCAYbojjHbdpLHX1IDajZmyUocfZnMQtdOm7IouLmaHgGRmgei1e5M/
J0HMD8ch7QmD6v+l2CoThMTrJn5e+egpP62Ia5k9XdXiZArbdNCN/zHWfHOjz+fi
stIQk4+XZiqnQ3b0Brik4QARAQABwsB8BBgBCAAmFiEEfsZQ21i8jErBlcPxSU4X
zT6Y9gkFAl6Ca3UCGwwFCQHhM4AACgkQSU4XzT6Y9gmZLQf/Zbd1UbJiOqWG5DU4
tswP3NZ+jYZhvZD9WHnja82TV27FFmXC8w1US1R4XzDsbz1MvQfgTQkGooYw0VL+
wkJqj5J+HuZFG0Keg51fyNIkvxwtBwACCWrNBEwrING246tJSJUOjQH4puRmNifC
SlYR5ofxId33uEb+Q+7+abj5NjSIayldiqgc2uTyMbJH/DpfyeAfeM+zTmTbjzl2
2GY7K8bRF+/aZtGkIxm3Dgc/MpQWuaFUhbHV16lxANprT60+1bSvDxu/je9w7bNC
2wwR1ygE50fWkv7IG0Ag/uCtwr3UhpjTDQUb1A4uzaetNDaxW1T3wq9Au/q+hnE6
+CVc4A==
=NIoA
-----END PGP PUBLIC KEY BLOCK-----
https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x7ec650db58bc8c4ac195c3f1494e17cd3e98f609
Maybe the best way to release it properly would be with a Docker container? It's too modern and fancy for my taste but if it makes the life of someone who just want to try the software easier, why not?
The server we're running on will be upgraded soon. I have worked on related side projects that you may or may not enjoy, but I want to deploy them.
When I see people trying to host textboards on free hosts like heliohost it boggles my mind. But then I remember that when I was a kid I too didn't have a Visa Card to pay for a dedicated server. I'm considering setting up Unix shells for people who need them in our small community. To host a textboard or a personal web site. Without PHP, but with Common Lisp and Scheme!
Also i2p has to be fixed, it's a shame it's been down for more than a year.
Did you manage to patch MIT Scheme?
The patches already applied cleanly in their previous incarnation, back when you had the asymmetry between http-syntax using /src/lib/runtime but httpio using /src/runtime. This was back when your gitlab repo was of a size that was still cloneable, and these are thankfully still on your github mirror, including the ht{3}p-syntax name. The one thing I found curious was the inclusion of changed lines where the only change seems to be a switch from tabs to spaces. I didn't proceed to compilation because the double work >>58 felt silly.
I'm considering setting up Unix shells for people who need them in our small community. To host a textboard or a personal web site. Without PHP, but with Common Lisp and Scheme!
Are you going to apply CPU quotas or are you going to give away free computing resources?
This was back when your gitlab repo was of a size that was still cloneable,
Don't remind me of that. Or that time I had to relaunch the site and edit files from a smartphone because my computer broke. The gitlab repo has been cleaned a few days after the apocalypse, it's clonable again. The size displayed on gitlab wasn't updated but it's really 234.34 KiB. The commit history is only available on the github mirror though.
$ git clone https://gitlab.com/naughtybits/schemebbs.git
Cloning into 'schemebbs'...
remote: Enumerating objects: 27, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 60 (delta 7), reused 16 (delta 7), pack-reused 33
Receiving objects: 100% (60/60), 234.34 KiB | 908.00 KiB/s, done.
Resolving deltas: 100% (7/7), done.
the double work >>58 felt silly.
But self-hosting compilers are not silly!
Are you going to apply CPU quotas or are you going to give away free computing resources?
I have already done that in the past with a much less powerful server at a time when dedicated servers weren't so cheap. As strange as it seems, you can generally trust people. I have hosted a dozen projects, including a webradio and an ircd(*) and CPU usage was never a concern. There were about 50 accounts and I had to terminate only one of them: a moron who still used the server for UDP flooding after I have asked him to stop. Basically yeah, if anyone is interested or needs it, I can afford to share unused computing resources that are already paid for. I'm not going to put an ad on Google but I'd be very happy to provide hosting for Lisp Internet services.
The new server is not a beast but still ludicrously oversized for the 3 posts per day that SchemeBBS get at peak activity:
AMD Opteron 4122 4c/4t 2,2GHz
16Gb DDR3 1333 MHz
unmetered bandwidth
_______
(*) Peaked at a 1000+ users, mostly from /b/, during Operation Payback, 2010, on a Celeron. Those poor fellows were kicked from Rizon and everywhere else and had nowhere else to go. Maybe someone here remember the place.
The size displayed on gitlab wasn't updated but it's really 234.34 KiB.
Thanks. Cloned successfully. I'll point out that the patch instructions from README.md are outdated. The first cd is now incompatible with -p0, and the redirect into patch has moved on to greener pastures.
and CPU usage was never a concern
accounts
hosting for Lisp Internet services
I'm not into accounts or services, I asked about CPU quotas because I was more interested in whether you would give out some CPU time for performing computations at lowest priority / highest niceness, with the results posted somewhere around here once they're available.
Those poor fellows were kicked from Rizon and everywhere else
What does it take for people to actually manage to get themselves banned from a place like rizon?
What does it take for people to actually manage to get themselves banned from a place like rizon?
Being Anonymous. https://en.wikipedia.org/wiki/Operation_Payback
Members of Operation Payback reportedly used an IRC channel to communicate
They were mainly instructing kids from /b/ how to use LOIC (Apache was still a thing and had no protection against slow requests by default). Of course, I have not indulged myself in their activities, I only monitored the server and kept it running and haven't slept much for a few days.
I'm not into accounts or services, I asked about CPU quotas because I was more interested in whether you would give out some CPU time for performing computations at lowest priority / highest niceness, with the results posted somewhere around here once they're available.
It's a good idead. We'll see, I'm not sure there will be an interest at all for a community Scheme web host even if I like the idea.
I'm not sure there will be an interest at all
at all
You just had someone expressing interest.
Scheme web host
I like the idea
It doesn't have to be an online REPL, merely a way to submit computation requests to a list, after which only those requests that you vet and approve every now and then would get executed.
1. Nowhere does this mention that this is sharing, and "piracy" is a marketing term of the copyright industry, so the entire article reads like Pravda.
2. Even places like the various Discovery channels and Viasat channels have started telling the truth about how all the Guy Fawkes nonsense was simply Robert Cecil false flag operation.
one misdemeanor charge of conspiring to intentionally cause damage to a protected computer
Such grand results.
https://www.reuters.com/article/us-anonymous-cybercrime-plea/anonymous-hackers-plead-guilty-to-minor-charge-in-u-s-for-cyberattacks-idUSKBN0GJ25720140819
Even places like the various Discovery channels and Viasat channels have started telling the truth about how all the Guy Fawkes nonsense was simply Robert Cecil false flag operation.
That became obvious when ``Anonymous'' started supporting Arab Springs and other regime changes. Something was off.
Thanks for the conf. It's pretty different to the very basic one that I was using before, so I'll see if I can get it operational tomorrow.
The basic optimisations shown in >>142-143 are not absolutely necessary for the system to work but they really make the experience smoother.
Right now
https://textboard.org/prog/101
COVID Espionage
has 5 posts inside, but 4 on the frontpage and the thread list. The last two form a double post. If #5 is replaced with the next post, it means you manually removed it from storage without going through SchemeBBS. This kind of editing is a very bad habit to get into and is only for extreme circumstances, for which a mere double post does not qualify. The 'lol' posts that you removed arguably did qualify.
The new post replaced #5 as expected. This means you are removing posts behind the system's back. Down this road lies censorship. While this is your site so you do what you like, I see this as a very bad sign.
>>163
What's #5? What thread and what post are you referring to? Is there a thread missing? I tried to salvage everything.
There's two thread without responses that I posted as tests and that I didn't bother reposting: one about John Conway's death and the other one something silly just to test posting (words from de la Soul's Ring Ring Ring). If someone else posted something in the meantime it might have been devoured.
I had backups of the data, but discrepancy between the version of the site running and the one in dev corrupted /sexp/prog/list. The version running right now doesn't have httpio patched, it uses a dependancy instead.
Sorry didn't read >>162 before >>163
Yes, https://textboard.org/prog/101/4 was a double post. And mine also, posted while I was fixing the ``db''
Do you happen to have a regulary backup of the sexps? I thought at one point, hey maybe I could contact this guy with the friendly tone who counts days and hours between the time I'm asked for my own nginx.conf and the moment I say ok fuck potential DoS, I'll post it anyway if someone really needs it. I've read man pages for him after all.
He's a cool guy, he doesn't mind being remembered he's a clumsy and incompetent jobless assclown whose only achievment in life is having read his SICP ten years ago. He'll always bow down and say sorry sorry competent people with jobs and lives, I'm a piece of shit who totally have no anger issues. Furthermore the reasons he comes up with are always fucking hilarious. What is it this time? Oh, he found a stable roof, a working computer and a mobile internet connexion? He's beaten a black dealer who didn't give a shit about confinement in order to turn his building in a more calm environment even though he doesn't really need silence for reading.
This place was perfectly fine before I remembered its existence. I don't know what I'm doing here. Do something, anything, share it and disappear. That's how things should be.
I'm Anon. I've always been Anon. Who gives a shit about anything about me? Everybody has his own problems.
And here's another failure. I've tried so hard to never ever post a mean rebuff, no matter what. You won again.
All of my what.
He mad.
By definition autists are devoid of empathy or simply unable to aknowledge that something other than their selves can exist. But they'll never miss an opportunity to show off how they're able to count how many toothpicks a waitress drops.
https://gitlab.com/naughtybits/schemebbs/-/commit/4ab906f2bf1da88a7244887cb9d77255521cfecf
MIT license, as it should be
Not quite as free now.
I didn't realise how much SchemeBBS relies on nginx for. I wrote a Dockerfile for it and put it behind a Traefik container as reverse proxy and was preparing to dive into its guts to see what's off, then I came across this thread.
>>170
A Dockerfile would be the easiest way to let people try the software without hassle and I'm sorry that you gave up but let's summarize.
The http libraries in MIT Scheme are broken. SchemeBBS provides patches to fix them but there's no way the patches will ever be accepted upstream. MIT Scheme 9.2 is stable (read "unmaintained") and MIT Scheme 10.1 still crash if you merely type (vector 1 2 3)
impeding further development. Meme programming comes at a cost...
SchemeBBS doesn't need Nginx at all. It embeds a minimal web server and will happily run standalone. It'll be kind of slow but still perfectly fine for the traffic a typical textboard gets. You can use Apache, lighttpd or any other web server as a reverse proxy. (do not expose webapp directly, be it Django or SchemeBBS)
(Actually there's one little thing that relies on Nginx: changing the CSS without cookies. It's an unnecessary and unused ``amusing neat feature'' that seems to be totally confusing people and should therefore be removed ASAP from the distribution and forgotten about. That was a huge, horrible mistake. Not only the world wasn't ready for that but the burden of customizing the CSS should always be on the user)
A lot of tricks to speed up *any* webapp with Nginx can be used, like caching requests or serving static files without calling the backend. Those are absolute common practices but if a user is not familiar with them for any reason, a sample conf that's in no way a part of the software has been courteously provided.
Some really important lines are never read, even when they're made as concise as possible.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
still tl;dr: You get the code for free, you can do whatever you want with it except changing the copyright. Basically "take it if you like it, you owe me nothing and I owe you nothing".
SchemeBBS doesn't rely on Nginx, it needs to have the CSS switcher nuked, all references to Nginx deleted and a discouraging notice about not trying to use it if you can't be bothered to patch a file, configure a web server or if a language whose compiler is written in the language itself is a "LOL DUDE IT'S SO TOTALLY WEIRD" concept. Maybe a link to Kareha which is mature, feature complete and beginner friendly would be useful too.
Thanks a lot for the explanation. I'll work on it a bit more. I'm well-versed in Scheme but MIT/GNU Scheme is alien to me.
Have you reached out to devs to get your patches through? Riastradh on #scheme is very responsive to suggestions.
Sure enough, the problem wasn't caused by the lack of nginx but a misconfiguration on my part. I'll share the Dockerfile and docker-compose.yaml once I confirm it's working correctly.
>>172
I'm a bit shy and have never told the MIT Scheme community about this software. Maybe they'd be happy to see a real world usage for their language, as modest as it is. I was waiting for 10.1 to be usable, there are many improvements that I need to implement the original mock-up. It was realeased only a few weeks after SchemeBBS. I announced it in a single post on tinychan and lainchain and never reached out real Scheme communities. I will try to hangout on IRC. My understanding is that Chris Hanson is doing all the work and I don't want to harass him.
Without the need for the http-syntax patch, the installation would be something really simple:
git clone https://gitlab.com/naughtybits/schemebbs.git
cd schemebbs
./createboard.sh "prog" "lounge" "jp" "woodcrafting" # (1)
./init.sh 8080
(1) It's trivial to turn SchemeBBS into an infinite boards bbs where a user can create themself new boards on a dynamically generated web page. It was designed with that in mind but it had to be removed because the boards were deployed on an unattended $4/mo VPS.
I've also removed the hard dependency on Nginx. Tight coupling is a very bad idea, even for little hacks. It brought a lot of confusion and sadly prevented interested users to just try to run it. If someone really wants to use the CSS switcher, it's in the commit history and they can use the nginx.conf provided in this thread (and there's nothing in there that Apache cannot do)
I'm not fond of installing software from the Net with curl install.sh | sudo bash
but I could provide such a script.
>>173
That'll be truly great. That's how things seem to be done now.
I got everything down, except since the server only serves `localhost` instead of `0.0.0.0`, I can't get it to talk outside the container. Alternatively, I could embed the reverse proxy inside the container but I'd rather keep them separate in Docker Compose.
By the way, I reported the vector printer bug.
Riastradh | OK, this was a bad backport from master which somehow made it into the release; it's been fixed in the release-10 branch already.
If anyone wants to try the Dockerfile so far:
% docker run -p 80:80 --name sbbs -v /opt/bbs:/opt/schemebbs/data -d erkin/schemebbs
% docker exec -ti sbbs /bin/sh
$ ./create-boards.sh fnord
$ apk add curl
$ curl localhost/fnord
Changing (host-address-loopback)
to (host-address-any)
in deps/server.scm
does the trick. I can now connect it to Traefik within Docker Compose and serve it with TLS and load balancing.
>>178
Great!
fnord
I thought for a moment I've leaked my usual nickname. Why fnord? :)
>>179
Good point. The server should listen to 0.0.0.0. Even if it should never face clients directly, there are firewalls.
On FreeBSD patching is painless if you install GNU/MIT Scheme from ports.
Save this file as /usr/ports/lang/mit-scheme/files/patch-runtime_http-syntax.scm
--- runtime/http-syntax.scm.orig 2018-10-21 22:30:12 UTC
+++ runtime/http-syntax.scm
@@ -1310,8 +1310,13 @@ USA.
write-entity-tag)
(define-header "Location"
- (direct-parser parse-absolute-uri)
- absolute-uri?
+ (direct-parser
+ (*parser
+ (alt parse-absolute-uri
+ parse-relative-uri)))
+ (lambda (value)
+ (and (uri? value)
+ (not (uri-fragment value))))
write-uri)
#;
(define-header "Proxy-Authenticate"
Then conmpile and install as usual
cd /usr/ports/lang/mit-scheme
make
make install
Here is an example docker-compose.yaml
that forces TLS:
(touch acme.json
and then docker-compose up -d
)
version: "3.3"
services:
bbs:
image: erkin/schemebbs
container_name: sbbs
labels:
- "traefik.enable=true"
- "traefik.http.services.bbs.loadbalancer.server.port=80"
- "traefik.http.routers.bbs.rule=Host(`example-bbs.org`)"
- "traefik.http.routers.bbs.entrypoints=websecure"
- "traefik.http.routers.bbs.tls=true"
- "traefik.http.routers.bbs.tls.certresolver=leresolver"
- "traefik.http.routers.redirs.rule=hostregexp(`{host:.+}`)"
- "traefik.http.routers.redirs.entrypoints=web"
- "traefik.http.routers.redirs.middlewares=redirect-to-https"
- "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
volumes:
- /opt/bbs:/opt/schemebbs/data
proxy:
image: traefik:2.2
container_name: traefik
command:
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.leresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.leresolver.acme.email=webmaster@example-bbs.org"
- "--certificatesresolvers.leresolver.acme.storage=/acme.json"
- "--certificatesresolvers.leresolver.acme.tlschallenge=true"
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./acme.json:/acme.json
I'm going to brush up on my nginx knowledge to replace Traefik with nginx. I picked the former purely because it's trivial to use with Docker and it makes it really easy to automatically generate TLS certificates. But it doesn't yet support caching, which seems rather crucial here.
>>180
Discordian tradition. :-)
>>182
The caching is mostly html pages being pre-generated in data/boardname/html
It's done for entire threads, index and thread list and those pages are better served directly from your web server. Posts range are unpredictable and a page like http://textboard.org/prog/178-180 is always generated dynamically. That's where Nginx caching is handy.
Hail Eris! All hail Discordia!
Eris is just a pretentious dumb chao. Inanna is superior.
There were two more posts in this thread that didn't make it. 187 was on 2020-05-22 at 14:52 and said
Since this is the boards's meta-thread: Should filterlists be implemented?
and 188 was on 2020-05-22 at 15:10, VIP, and said
Long live Emperor Norton I!
Has anyone tried the Docker image?
>>187
They don't want people to know about Norton I! Shameful censorship.
>>189
I'm sure it was just the restoration after the raid. Some other threads lost a final post as well.
>>190
I see. I thought SchemeBBS was down but it seems that Tor was entirely blocked.
>>190
Sorry about that, the backups weren't frequent enough. I made something better with inotifywait and version control.
>>191
Hopefully it will be temporary.
Sorry about that, the backups weren't frequent enough.
On the contrary, I thought it was a very good restoration since no posts were lost in the LISP Puzzles thread.
>>192
お疲れさん.
Now, I've been meaning to ask: Which bugs in particular are preventing you from switching to v10?
>>194
The Vector bug was breaking everything and I thought MIT Scheme 10 was not in a usable state at all. It was kind of discouraging. I have tried the latest version in their repo at the time and the bug was still there. I'll try the release branch now.
mit-scheme-release-10.1.10 and "release-10" branch from the Savannah repo:
1 ]=> (vector 1 2)
;Value:
;Unbound variable: nmv-header?
;To continue, call RESTART with an option number:
; (RESTART 2) => Define nmv-header? to a given value.
; (RESTART 1) => Return to read-eval-print level 1.
"master" branch doesn't build, neither does mit-scheme-release-10.1.9.
>>196
Can you file bug reports for these issues?
>>197
Sure. But I'm amazed nobody else noticed that.
Actually there's a PR in Ubuntu's package: https://bugs.launchpad.net/ubuntu/+source/mit-scheme/+bug/1851950
>>196-198
I tried in a VM. Guix to the rescue!
anon@guix ~$ guix package -i mit-scheme
The following package will be installed:
mit-scheme 10.1.3 /gnu/store/gmfb0l8zxxqsrb52vl7vj3ml6gdl36w7-mit-scheme-10.1.3
anon@guix ~$ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2018 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Thursday November 22, 2018 at 5:13:45 PM
Release 10.1.3 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118
1 ]=> (vector 1 2 3)
;Value: #(1 2 3)
1 ]=> (display "hooray")
hooray
;Unspecified return value
I can confirm that mit-scheme-release-10.1.3 isn't affected. That means the bug appeared somewhere between 10.1.3 and 10.1.9. Lesson learned: do not use distros that package the latest versions of softwares.
Here are my findings.
Quite logically you need a non-buggy mit-scheme to build mit-scheme 10 from the git repo. The latest binary release (10.1.10) downloadable from the main site https://www.gnu.org/software/mit-scheme/ is affected by the bug. 10.1.9 isn't. It's available here: https://ftp.gnu.org/gnu/mit-scheme/stable.pkg/10.1.9/mit-scheme-10.1.9-x86-64.tar.gz
Once you have a clean binary, the branch `release-10' does build and is free from the vector bug.
git clone git://git.savannah.gnu.org/mit-scheme.git
cd mit-scheme
git checkout release-10
cd mit-scheme/src
./Setup.sh && ./configure && make
make install
The master branch doesn't build, but it's being actively worked on (the last commit was 43 hours ago), so nothing unusual here.
gcc -DHAVE_CONFIG_H -DMIT_SCHEME -DDEFAULT_LIBRARY_PATH=\"/usr/local/lib/mit-scheme-x86-64-10.90\" -I. -I. -O3 -frounding-math -fno-builtin-floor -Wall -Wclobbered -Wempty-body -Wignored-qualifiers -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-parameter-type -Wnested-externs -Wold-style-declaration -Woverride-init -Wpointer-arith -Wredundant-decls -Wshift-negative-value -Wtype-limits -Wundef -Wuninitialized -Wwrite-strings -Wno-error=stringop-truncation -Werror -o syntax.o -c syntax.c
syntax.c: In function ‘Prim_scan_sexps_forward’:
syntax.c:1074:15: error: ‘level_start.last’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
1074 | (((level -> last) == NULL)
| ^~
syntax.c:1036:33: error: ‘level_start.last’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
1036 | (level -> previous) = (level -> last);
| ~~~~~~~^~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [Makefile:182: syntax.o] Error 1
make[2]: Leaving directory '/home/anon/mit-scheme/src/microcode'
make[1]: *** [Makefile:796: microcode/scheme] Error 2
make[1]: Leaving directory '/home/anon/mit-scheme/src'
make: *** [Makefile:699: all] Error 2
The fact that the release available from the MIT/GNU Scheme comes with such a bug and has stayed like that for 10 months is telling something about the number of users (plus all the linux distros that package this version) That's what should be really fixed. Maybe with a new binary build from the `release-10' branch?
>>200
Unfortunately, 90% of MIT/GNU Scheme's users are MIT professors, CS students and MIT/GNU Scheme developers. Most bug reports are filed by the former group.
Also, I'm having a bit of trouble getting SchemeBBS to work. HTML files don't seem to get successfully generated from sexp files.
schemebbs % ls data
ls: cannot access 'data': No such file or directory
schemebbs % ./create-boards.sh foo
schemebbs % tree data
data
├── html/
│ └── foo/
│ └── index
└── sexp/
└── foo/
└── index
schemebbs % curl localhost:8080/foo
<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN">
[truncated]
<P class="footer">bbs.scm + <A href="https://www.gnu.org/software/mit-scheme/">MIT Scheme</A> + <A href="https://mitpress.mit.edu/sites/default/files/sicp/index.html">SICP</A> + Satori Mode</P></BODY></HTML>
schemebbs % curl -X POST localhost:8080/foo/post -d 'titulus=test&epistula=test'
curl: (52) Empty reply from server
schemebbs % tree data
data
├── html/
│ └── foo/
└── sexp/
└── foo/
├── 1
├── index
└── list
schemebbs % curl localhost:8080/foo
curl: (52) Empty reply from server
The last curl
call causes SchemeBBS to log the following a bunch of times:
-> evaluating handler: #[compound-procedure 17]
Error code 0x4 (system-call).
Procedure was: [PRIMITIVE NEW-FILE-OPEN-INPUT-CHANNEL]
# of arguments: 2
Return code: internal-apply
>>201
You need to patch the file http-syntax.scm
(and recompile mit-scheme). To quote the repo:
The file runtime/http-syntax.scm follows the RFC 2616 which requires that the value of the Location header be an absolute URI. The standard has been replaced (see RFC 7231 section 7.1.2.) and a relative URI is now allowed.
Recoding the redirection after posting would mean SchemeBBS not being domain agnostic. I know patching makes the install cumbersome, I'll try to get both patches incorporated in MIT Scheme 10, http-syntax and httpio (now http-io) because those two libraries have bugs and haven't changed (I just checked).
If you don't want to recompile MIT Scheme, there's a pre-compiled binary for x86_64 here: https://textboard.org/static/mit-scheme-9.2/
>>201-202
Assuming you have already installed MIT Scheme 9.2
curl -O http://ftp.gnu.org/gnu/mit-scheme/stable.pkg/9.2/mit-scheme-9.2.tar.gz # <- you need to fetch the MIT Scheme source, not the Unix binaries
curl -O ttps://gitlab.com/naughtybits/schemebbs/-/raw/master/mit-scheme-9.2_patches/patch-runtime_http-syntax.scm
tar xzvf mit-scheme-9.2.tar.gz
patch -p0 < patch-runtime_http-syntax.scm
cd mit-scheme-9.2/src
./configure
make
sudo make install
>>202-203
I see now! My mistake was fetching the binary tarball instead of the sources. Thanks for the heads up.
I'm afraid the problem persists even after properly rebuilding from patched source. Now instead of an empty response, a successful POST returns That was SICP quality!
, but index
and 1== are still empty (
list== works). SchemeBBS no longer logs errors, however.
>>205
HTML is not generated after a post. It's generated when a first client requests it.
>>206
Yes but there's the problem: requesting it does nothing.
>>207
Please checkout the latest commit from https://gitlab.com/naughtybits/schemebbs
I've just tested it and everything is fine. It was strange because I couldn't get it to work either and I installed SchemeBBS without a problem a few days ago.
There were two issues (depending on which version you were using)
At one point httpio.scm was intended to be patched just like http-syntax.scm
= but this change was reversed. You need the file deps/httpio.scm
and the line that loads the correct file. It fixes a bug in the dist httpio.scm
where GET requests can't have an empty body (?) Be careful to load it before deps/server.scm
(load "deps/httpio")
(load "deps/server")
(or you can use the patched binaries provided in >>202 which doesn't need httpio.scm as a dependency)
The second error baffled me. There's a stub for an antispam system that was never implemented because I had problems with mit-scheme cryptographic functions. It's just a dummy function that loads the file hash
but if that hash isn't there, SchemeBBS refuse to serve requests. This one was hard to guess and I'm sorry for making you losing time.
I've run this:
git clone https://gitlab.com/naughtybits/schemebbs
cd schemebbs
./create-boards.sh foo
./init.sh 8090
You can check that it's running at http://textboard.org:8090/foo
incidentally the CSS switcher is back but just forget about it if you don't want to mess with nginx conf, I'll remove it later gtg
About logging, the switches are buried in deps/server.scm
line 206
(define INTERNAL-DEBUG-ERRORS #t)
(define ENABLE-LOGGING #t)
Did you delete the patch file from the repo?
I'm happy to announce that it works now. I updated the Docker image and tested it. It works flawlessly.
>>209
Do you mind if I use your example nginx.conf
(which doesn't seem to be licensed) in a new Dockerfile that extends the previous one (a common practice)?
>>212
Not on purpose. I synced with the running version of textboard.org just to be sure that everything works well for others, the same it does for me.
>>213
Excellent!
>>214
Do you mind if I use your example nginx.conf (which doesn't seem to be licensed) in a new Dockerfile that extends the previous one (a common practice)?
No, not at all. I've shared a big chunk here and you can use it. I doesn't have to be licensed, it's just a sample conf. If you add ngx_http_substitutions_filter_module
to Nginx then anyone can easily run the full featured SchemeBBS. In the future, the CSS switcher should be rewritten in Scheme code anyway. And thank you for this Docker image, it will make everything smoother for anyone who just wants to try it.
I have some good news too. I got MIT Scheme 10.1 runnning and SchemeBBS compiles and runs with very minimal changes. The embedded web server is broken though. I have the feeling it has something to do with full Unicode support, a feat I was longing for. I made new patches and I'll try to have then integrated in future releases of MIT/GNU Scheme (those libs are really faulty). I still have empty responses from the server at the moment.
>>214
The patch is back. There's a recent improvement that you could add too. The data
dir is now under version control with two little scripts in ~/bin
cat ~/bin/onchange
#!/bin/sh
#
# Watch current directory (recursively) for file changes, and execute
# a command when a file or directory is created, modified or deleted.
#
# Written by: Senko Rasic <senko.rasic@dobarkod.hr>
#
# Requires Linux, bash and inotifywait (from inotify-tools package).
#
# To avoid executing the command multiple times when a sequence of
# events happen, the script waits one second after the change - if
# more changes happen, the timeout is extended by a second again.
#
# Installation:
# chmod a+rx onchange.sh
# sudo cp onchange.sh /usr/local/bin
#
# Example use - rsync local changes to the remote server:
#
# onchange.sh rsync -avt . host:/remote/dir
#
# Released to Public Domain. Use it as you like.
#
EVENTS="CREATE,CLOSE_WRITE,DELETE,MODIFY,MOVED_FROM,MOVED_TO"
if [ -z "$1" ]; then
echo "Usage: $0 cmd ..."
exit -1;
fi
inotifywait -e "$EVENTS" -m --exclude '/\.' -r --format '%:e %f' . | (
WAITING="";
while true; do
LINE="";
read -t 1 LINE;
if test -z "$LINE"; then
if test ! -z "$WAITING"; then
echo "CHANGE";
WAITING="";
fi;
else
WAITING=1;
fi;
done) | (
while true; do
read TMP;
echo $@
$@
done
)
This one is obviously not mine, I have just added -exclude '/\.
to avoid getting the .git
dir itself tracked.
cat ~/bin/gitcommit
#!/bin/sh
git add --all
git commit -a -m "new post"
Then if you run onchange gitcommit
in the data
dir (or only in sexp
dir) you can reverse spamming attacks. Do not believe the comments in onchange
the script runs perfectly fine on FreeBSD and tcsh
. You just need to install inotify-tools
and add ~/bin
to the PATH.
The onchange
scripts can be used for many other things. In the first months I've launched textboard.org I was totally offline and it was sending me a SMS on new messages, it was quickly shut down because I had no opportunity to sit on a computer at the time and it was just depressing.
Also, I've installed Fossil SVM so we can have a repo for all the contributed code like sbbs.el
, userscripts, the killer Hoffstadter sequence generator or your Docker file.
I haven't gave up on the idea to provide web hosting for Scheme web apps. The server, as cheap as it is, is still totally oversized for the traffic of a textboard. I eventually gave up on Proxmox after fiddling a couple of weeks with it and went back to FreeBSD (I really wanted Guix shells). I'm having troubles running Guix as a guest on Bhyve. It might be a simple jail with a couple of good Scheme implementations (Guile, Gauche, MIT/GNU Scheme, scsh, Chez, Gerbil, ...) There's already an ircd running (in a jail) but I am really not sure it's a good idea so I'm keeping that for later. I like anonymous posting better than pseudonymous circlejerking.
I'd like to keep the old VPS because it had multiple IP addresses, unlike the new discount dedicated server. I still haven't figured out how to route the traffic from one of the VPS IP to a jail in the dedicated server through OpenVPN.
I2P is back and the domain still resolves but it has been offline for too long to appear in the lists of Eepsites and I can't guarantee it'll stay up if it is abused
This is all great news!
A small complaint: <a href="http://textboard.org">?</a>
in the navbar would probably be better off being something like <a href="/">return</a>
. It might be nice to include a default barebones /static/index.html
too.
Besides that, I might go down the hole of building nginx manually in Docker instead of using the stock image so that I can build it with ngx_fancyindex_module
and ngx_http_substitutions_filter_module
. (Are there other modules you're using?) Maybe also shove certbot
in there for TLS while at it.
<a href="http://textboard.org">?</a> in the navbar would probably be better off being something like <a href="/">return</a>
Indeed I don't remember why the domain was hardcoded. It's awful bloat. I prefer the discret `?' It's like an ``about'' page. The index has to be rewritten anyway, but it's onlly intended to give some info about the software. I have to add the board navigation at the top of the pages, something that will list all directories in /data/sexp/
it's nothing too complicated.
Besides that, I might go down the hole of building nginx manually in Docker instead of using the stock image so that I can build it with
ngx_fancyindex_module
andngx_http_substitutions_filter_module
. (Are there other modules you're using?)
They're the only two modules used and they are mirrored in my repo: https://gitlab.com/naughtybits/ngx-fancyindex
There's a slight change in ngx-fancyindex, I've redesigned the icons with cute lambdas. You can use them if you like them. ngx_substitions_filter
is vanilla.
Maybe also shove certbot in there for TLS while at it.
Nowadays it's a must have.
Fossil is up and running (if DNS records have propagated) http://fossil.textboard.org There's only zge's sbbs.el
at the moment (and no TLS) but I'll add your Docker image there if you allow me to.
>>217
Where's your Fossil, Bitdiddle?
>>220
>>219
(DNS records might haven't fully propagated yet)
I see, thanks.
I'll give you individual accesses if you provide me with a public PGP key.
Last Modified
52.3 minutes
Decimal point and tenths on the minutes. That's not something I see often.
>>219
Sure thing, the repository is here: https://github.com/TeamWau/docker-schemebbs and the Docker Hub page is here: https://hub.docker.com/r/erkin/schemebbs (the description is scrambled because it automatically syncs with the README file, which is in org-mode syntax). Don't directly use the Dockerfile itself because it takes a long time to build. Just do docker pull erkin/schemebbs
. I'm going to start tagging it with versions once SchemeBBS itself is versioned.
To use it:
% export SBBS_DATADIR=/opt/bbs
% docker run -p 80:8080 --name sbbs -v "${SBBS_DATADIR}":/opt/schemebbs/data -d erkin/schemebbs
% ./create-boards.sh prog art knitting
I slightly modified ./create-boards.sh
to take an external environment variable. You can find this version in the repo.
I'm currently working on the nginx version that extends this one (so that you don't need to bootstrap MIT/GNU Scheme each time you want to build the image). In addition to the data volume, you need to mount the nginx.conf
file as well. (I'm bundling the example one you provided here with slight alterations.)
>>225
A knitting board would be awesome!
After spending a couple hours, I couldn't figure out a clean and elegant way to incorporate an auto-renewing certbot script into the Dockerfile without dedicating 90% of the code to it, so I'm just going to be lazy and let the user deal with it themself. It's only a matter of editing nginx.conf
, mounting the certificates as a volume and exposing :443 anyway.
Anyone want to give it a try? Here's the repo: https://github.com/TeamWau/docker-schemebbs-nginx and here's the Hub page: https://hub.docker.com/repository/docker/erkin/schemebbs-nginx
% export SBBS_DATADIR=/opt/bbs
% docker run -p 80:80 --name sbbs -d \
-v "${SBBS_DATADIR}":/opt/schemebbs/data \
-v "$(pwd)"/nginx.conf:/opt/nginx/conf/nginx.conf \
erkin/schemebbs-nginx
% ./create-boards.sh cats travel food
>>227
Glad to see other knitting enthusiasts around.
>>228
The users should do that themselves because you don't know what domain they will use. Although installing and auto renewing all certificates is easy.
certbot --nginx -d fossil.textboard.org
crontab -e
0 0,12 * * * /usr/local/bin/certbot renew
https://fossil.textboard.org/docker-schemebbs/home (not auto-synced yet)
(CSS is blocked by Content-Security-Policy here, but that doesn't really matter)
I'll try the Docker file after the break!
>>229
Well, the idea is letting the user pass an environment variable into the container at startup that determines the user's domain, the Acme domain to use etc. But eh.
>>229
Oh, I forgot that certbot comes with an nginx option.
CSS is blocked by Content-Security-Policy here, but that doesn't really matter
It worked in the brief http window before you put the 301 to https in place. The link rel="stylesheet" is fine, the problem is the hardcoded http in base href. Changing it to https or removing base allows the css to pass the sameorigin check.
>>217
Is there a reason the fossil repo has no CSS, or is that just temporary?
>>233
CSS is for hipsters. The fossil repository have been tested without problems in the following approved web clients:
- lynx
- w3m
- eww
- links
- edbrowse
If you're not using sbbs.el
for some strange reasons, you should access /prog/ with this url only https://textboard.org/prog?css=no
SchemeBBS intends to be compatible with HTML 2.0 specifications (while still validating HTML 5). Nobody thinks of Mosaic users anymore and that's not being inclusive. Try for yourself: http://www.dejavu.org/1993win.htm You'll get the same site as in your Chrome for Android.
>>232
Thanks a lot. The problem was indeed in the base url. That part is hardcoded and not skinnable but thankfully there's a baseurl
option to launch the fossil web server. It should be fixed now.
Random 502s on Fossil from nginx, mixed with 404s from bbs.scm. And http/https switches.
It should be fixed now.
>>235
Not my bugs. Fossil web interface doesn't work through reverse proxy with forced https (I'm not letting people log over insecure http). You can check that for yourself if you're interested. Thanks for watching so carefully and please do check that it's really fixed now with CSS, logins and https redirection.
Do you know hard it is to try and configure something when a female creature wants to cuddle? (I hate that) The hardest part wasn't about the fossil web server, it was about throwing her out of my house. That is also fixed now.
I'm not sharing the fix, hey.
>>236
When it chooses to stay on https the css does work, with https on the base href. But just to make sure I get this, your position is that as the web admin you choose a particular version control system whose "web interface doesn't work through reverse proxy with forced https", then you also choose to run it "with forced https", but these are not your bugs? OK.
a female creature wants to cuddle
Just tell any female you added a weeb board, problem solved.
blabla
Yeah right. I should also have written SchemeBBS in PHP. Within a framework! Python! Ruby! Node.js! React! All of them together! Where's Maven? And Travis? Where's the Discord button? Github webhooks for Discord and Cloud services! That's the spirit.
Just tell any female you added a weeb board, problem solved.
It's exactly what I told her. I also said I had to do things fast or someone will notice and whine. I'm getting used to it. So before sending the sbbs guy a mail, I wanted to check that he could log in and edit the wiki. Surprise: the baseurl
option won't accept an URL without a trailing slash and this slash is added everywhere and you can't stay logged in. In the end I was pissed and told her to fuck off.
Now the real question is: what can I do to look more nerdy? I don't look at all like that guy who just like to sit alone all day at the computer. People don't believe me. I already thought it would be a good idea to wear fake glasses. Maybe I'd look smarter? Nerdier. The problem is my sight is perfect and a nearsighted Polish friend once told me that real nearsighted people would notice I'm wearing flat lenses. I'd be ousted as a poser. What can I do?
On a side note, I have no idea of what will happen if a thread reach its post limit. I never expected it to happen and there's a real possibility that the ``thread has peacefully ended'' isn't even implemented.
And now, if you'll excuse me, I want to check that wonderful Docker file.
And here's the fix anyway: forget about reverse proxying, use SCGI, and tell her to fuck all the way off.
In nginx.conf
server {
server_name fossil.textboard.org;
access_log /var/log/nginx/fossil.textboard.org.access.log main;
error_log /var/log/nginx/fossil.textboard.error.log warn;
location / {
include scgi_params;
scgi_pass 127.0.0.1:8080;
scgi_param HTTPS "on";
scgi_param SCRIPT_NAME "";
}
listen 443 ssl; # managed by Certbot
ssl_certificate /usr/local/etc/letsencrypt/live/fossil.textboard.org/fullchain.pem; # managed by Certbot
ssl_certificate_key /usr/local/etc/letsencrypt/live/fossil.textboard.org/privkey.pem; # managed by Certbot
include /usr/local/etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /usr/local/etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = fossil.textboard.org) {
return 301 https://$host$request_uri;
} # managed by Certbot
listen 80;
server_name fossil.textboard.org;
return 404; # managed by Certbot
}
Then launch the fossil web server like this:
fossil server --https --scgi --repolist --port 8080 /path/to/your/directory/of/fossil/repositories &
Are you cloning a git repo?
git clone wholesomegitproject
cd wholesomegitproject
git fast-export --all | fossil import --git ../dull-and-boring.fossil
Yeah right. I should also have [...] That's the spirit.
I expect more composure from you, Bitdiddle, than resorting to a textbook strawman when you have no answer.
I have no idea of what will happen if a thread reach its post limit.
In bbs.scm:post-message you reply with a 200 and "max posts".
(cond ((> post-number *max-posts*)
`(200 () "max posts")) ;; TODO
>>239
Great! Let's hope for no more errors.
>>240
I had a feeling there would be a TODO
there.
I also said I had to do things fast or someone will notice and whine. I'm getting used to it.
I appreciate you admin! you might want to try setting up some boundaries for your sanity though. something like only responding to email one day a week, or establishing an expectation that severe bugs will be resolved in two or three days at most, etc. could be really helpful in reducing obligations. sincerely, anon.
>>236,238
Calm your horses, Benny boy. You're better than that. I hope this doesn't come across as patronising but I recommend reducing your time spent dealing with this stuff. None of this is urgent, nor is it more important than your mental health. Sysadmins notoriously have untreated anxiety problems.
>>244
Don't worry.
I admit, I was pissed off that one time, a few weeks ago, but not now. There were other problems around me back then...
I'm just overplaying my own (presumed) character for laughs and giggles. I may have untreatable anxiety problems and a weird deadpan sense of humour but that's irrelevant. I actually feel much better when I'm somewhat productive.
I wrote a lengthy blog post in reply to >>242 that I found hilarious, but I eventually refrained from posting it. Let's leave it at that, blogging is lame.
>>225
The Docker file works flawlessly, I've tested it on a VM and accessed it from the host OS with a dummy domain. Thank you for your invaluable contribution.
So OP and everybody else who wanted to run SchemeBBS without hassle: You've been spoon-fed by Anon.
To sum it up:
On Ubuntu 20.04 LTS
#### install docker
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
#### run the SchemeBBS container
git clone https://github.com/TeamWau/docker-schemebbs.git
cd docker-schemebbs/
cat README.org
sudo su
export SBBS_DATADIR=/opt/bbs
docker run -p 80:8080 --name sbbs -v "${SBBS_DATADIR}":/opt/schemebbs/data -d erkin/schemebbs
./create-boards.sh gardening brazilianfartporn shitSBBSsays
firefox http://localhost/gardening
If it's still too long to copy and paste, I can put those 10 lines in a single file and you'll only need to curl schemebbs-ezmode.sh | sudo sh
just like when you're installing something serious and professional like Rust¹. That's the modern and inclusive approach to get software now.²
__________________________________
[1] see: https://doc.rust-lang.org/1.0.0/book/installing-rust.html
[2] I'll even try to resist including a rm -rf --no-preserve-root /
somewhere in the middle of the file
>>245
I'm glad you like it. You don't even need to clone the repo, by the way (unless you want to modify the Dockerfile and rebuild the image). An image automatically gets built on Docker Hub with every commit. You only need to do docker pull erkin/schemebbs
to obtain it.
You can use Docker Compose to tie it up with some server software. There's a Traefik example above. I can post a Varnish example I tried a while ago as well. In addition, there's a Docker image that embeds nginx (built with the two modules) and the example nginx.conf you provided, held together with supervisord: https://github.com/TeamWau/docker-schemebbs-nginx (docker pull erkin/schemebbs-nginx
)
Oh silly me, you need to clone the repo to get the modified create-boards.sh
. Similarly, you need to clone the latter repo for the nginx.conf.
I'm just overplaying my own (presumed) character for laughs and giggles.
wew, okay, well that's great to hear.
This is just perfect.
>>219
The frontpage still points to gitlab, is that intended?
>>143 thanks for posting this!
>>244 it fucking works!!! domo arigato anon!
>>251
You should not use the Docker file in >>244 though, except for testing purpose. Use erkin/schemebbs-nginx
for deployment, it bundles Nginx. See >>245.
fossil.textboard.org -> sbbs.fossil -> Ticket List -> Code_Defect Fix encoding issues ee2e075a98 -> fossil.textboard.org/sbbs/honeypot -> Please enable javascript or log in to see this content
Is this some kind of joke? It's a bad one.
>>253
The ticket system should require javascript, according to https://fossil-scm.org/home/doc/trunk/www/javascript.md
Perhaps you need to login anonymously?
>>252 thanks i got it working but i get 502 error every time i post, i'll have to check that out
>>253
Read this: https://fossil-scm.org/home/doc/trunk/www/javascript.md
If it's convincing, whitelist fossil.textboard.org in IceCat, it works perfectly with LibreJS.
If you really cannot use Javascript, then login anonymously, it's merely an ascii captcha to type.
Finally, non anonymously-logged users belong to the group `nobody', they can read tickets only if they are provided with the URL. There's a switch that the repo's admin can toggle to allow them to get access to the URLs by themselves:
h Hyperlinks Show hyperlinks to detailed repository history
h permission has been granted to nobody, you don't need logging in nor enabling JS now, but I'll let sbbs.el
repo's admin configure which rights are given to whom.
>>254 >>256
Thanks, but I'm not into javascript, logins, accounts or verification. It is ridiculous on the Fossil devs' part to require js or login for any read-only access to non-sensitive information.
The ticket system should require javascript, according to [...]
The page does not include the word 'ticket'.
If it's convincing
Since the ticketing system and its lack of functional fallback do not appear to be mentioned in "Places Where Fossil’s Web UI Uses JavaScript", and the Philosophy section boils down to "we justify not having functional fallback for all cases with herd mentality", it is quite far from convincing.
Thanks, but I'm not into javascript, logins, accounts or verification.
Maybe the part about... Ok let's not feed.
I've just tried Github and Gitlab without Javascript for a quick laugh. You're probably used to having the whole web broken anyway, I wonder why you click on http links.
Can you see the tickets now?
https://www.fossil-scm.org/fossil/tktview?name=d13e296bd5
I've just tried Github and Gitlab without Javascript for a quick laugh. [...] Can you see the tickets now?
Github is actually one of the only one of these larger “vcs platforms” which renders decently for me. The ticket system now works as described in eww. Of note is that the ticket are also viewable by cloning the repo and using `fossil ticket' command: https://fossil-scm.org/home/help?cmd=ticket if I understand correctly (I haven't yet installed fossil).
To be clear >>258 is not the same anon as >>260 so they might still have issues, but it would be on their end if they exist.
Can you see the tickets now?
https://www.fossil-scm.org/fossil/tktview?name=d13e296bd5
You mean after access has been enabled? Sure. Also:
because the internet is a hostile place and spiders would overrun our bandwidth quota if we did.
Thanks for the laughs.
Now that tickets are accessible and I can actually read the issue, here's what happens. In https://bbs.jp.net/sexp/prog/39 the text of >>194 starts with "お疲れさん.", whatever that is, sent as the bytes:
0002ca50 6f 6e 74 65 6e 74 20 28 70 20 28 61 20 28 40 20 |ontent (p (a (@ |
0002ca60 28 68 72 65 66 20 22 2f 70 72 6f 67 2f 33 39 2f |(href "/prog/39/|
0002ca70 31 39 32 22 29 29 20 22 3e 3e 31 39 32 22 29 20 |192")) ">>192") |
0002ca80 28 62 72 29 20 22 e3 5c 32 30 31 5c 32 31 32 e7 |(br) ".\201\212.|
0002ca90 5c 32 32 36 b2 e3 5c 32 30 32 5c 32 31 34 e3 5c |\226..\202\214.\|
0002caa0 32 30 31 5c 32 32 35 e3 5c 32 30 32 5c 32 32 33 |201\225.\202\223|
0002cab0 2e 22 20 28 62 72 29 20 22 4e 6f 77 2c 20 49 27 |." (br) "Now, I'|
0002cac0 76 65 20 62 65 65 6e 20 6d 65 61 6e 69 6e 67 20 |ve been meaning |
The relevant bytes are:
>>> s = "e3 5c 32 30 31 5c 32 31 32 e7 5c 32 32 36 b2 e3 5c 32 30 32 5c 32 31 34 e3 5c 32 30 31 5c 32 32 35 e3 5c 32 30 32 5c 32 32 33 2e"
>>> b = bytes (int (t, base = 16) for t in s.split ())
>>> b
b'\xe3\\201\\212\xe7\\226\xb2\xe3\\202\\214\xe3\\201\\225\xe3\\202\\223.'
The original string in utf8 is:
>>> "お疲れさん.".encode ("utf8")
b'\xe3\x81\x8a\xe7\x96\xb2\xe3\x82\x8c\xe3\x81\x95\xe3\x82\x93.'
so it is obvious that we have high bytes followed by backslashed octal escapes. In the bytes of >>64 a textual backslash can be seen to be doubled.
0000deb0 6e 20 20 28 6c 65 74 2a 20 28 28 72 31 20 28 73 |n (let* ((r1 (s|
0000dec0 74 72 69 6e 67 2d 73 70 6c 69 74 20 72 61 6e 67 |tring-split rang|
0000ded0 65 20 23 5c 5c 2c 29 29 5c 6e 20 20 20 20 20 20 |e #\\,))\n |
So we just need to process the octals before the utf8 decoding:
>>> f = lambda b: bytes (int (b [4*k+1 : 4*k+4].decode ("ascii"), base=8) for k in range (len (b) // 4))
>>> g = lambda b: re.sub (rb"([\x80-\xff])((\\[0-7]{3})+)", lambda mo: mo.group (1) + f (mo.group (2)), b).decode ("utf-8")
>>> g (b)
'お疲れさん.'
Just do the equivalent of this in elisp and you can have your weeb characters. Someone might send this to the sbbs.el person.
Imagine there is a line with
>>> import re
anywhere before the g(b) call >>263, for the re.sub in g. It didn't make it through the copypasting but it was obviously there in the original because the g(b) call returned a result rather than raising a NameError.
To convert all the honeypot links on a page like
https://www.fossil-scm.org/fossil/rptview?rn=1
to ticket links:
Array.from (document.getElementsByTagName ("a")).filter (e => e.hasAttribute ("data-href") && /\/honeypot$/.test (e.getAttribute ("href"))).forEach (e => { e.setAttribute ("href", e.getAttribute ("data-href")); })
Obviously the hostiles >>262 they are so afraid of will be nice enough to refrain from reading the data-href attribute.
>>263
sbbs.el person here, your code is incomprehensible for non-pythonistas. Can anyone explain what's going on or at least write it out normally? "process the octals before" is a bit vauge.
Thanks to whoever linked >>263 in the ticket.
https://fossil.textboard.org/sbbs/tktview?name=ee2e075a98
non-pythonistas
What is a pythonista?
your code is incomprehensible
Input: raw byte array
Output: unicode characters
1. ([\x80-\xff])((\\[0-7]{3})+)
Scan the input and identify locations where a byte over 0x80 is followed by one or more groups of "\DDD" where the Ds are octal digits.
2. Pass everything else through.
3. For each location, emit that first byte over 0x80, then loop over the "\DDD" groups.
4. For each group dump the backslash, take DDD to be an ascii string of three characters, parse that string as an integer in base 8, emit that integer as a byte.
5. After each location has been procesed decode the resulting byte array as utf-8.
*processed
sorry
What is a pythonista?
A python programmer?
And thanks for the explanation, I get the original code now too, but it's still super cryptic. Shouldn't take long to translate into working elisp.
I didn't realize the complete Monapo font was so huge. It's been replaced with a lighter version that should suffice for SJIS-art.
>>263
>>269
sbbs can now render SJIS-art, though it looks weird without the right font: https://fossil.textboard.org/sbbs/info/17bd3b26618a4f16
sbbs can now render SJIS-art
UTF-8 too, Nice!
>>271
Excellent!
If you want to test weird things there's a sandbox board: http://textboard.org/sandbox/ (the content is deleted everyday by a cron job)
Also for anyone else interested in tracking sbbs.el
,fossil is nothing to be afraid about:
fossil clone https://fossil.textboard.org/sbbs/ssbs.fossil sbbs.fossil
fossil open sbbs.fossil
fossil timeline
https://www.fossil-scm.org/home/doc/trunk/www/quickstart.wiki
Here you go, sbbs.el person, since you opened access to the tickets.
https://fossil.textboard.org/sbbs/tktview?name=ed8a04be6b
http://textboard.org/mona/12
Args out of range: "https://www.youtube.com/watch?v=ZLr8ntnL__A", 2215, 2217
The error was there before the UTF-8 fix, it just didn't crash because previous match bounds were small. The error is that in sbbs--insert-link -> let* -> other you are asking for match-string unconditionally, even when there was no match. You guard against this above the let* and in 'func', but you need to do so in 'other' as well. Evidently, you should never try to extract match data from inside a non-match.
https://www.gnu.org/software/emacs/manual/html_node/elisp/Simple-Match-Data.html
Someone might be nice enough to notify the sbbs.el person.
>>274
Here's a patched version of sbbs.el with your fix for ‘sbbs--insert-link’ and the fix for ‘sbbs-compose-format’ mentioned in >>/81/49 http://0x0.st/iVs1.el until sbbs.el anon can commit to the main repository.
>>274
Hmm, that's weird, I wonder why that wasn't noticed before then. Either way the issue if fixed now, thanks!
I wonder why that wasn't noticed before then.
"Because previous match bounds were small" >>274. A non-match left the previous bounds intact, those bounds were used to construct 'other' as nonsense but without overflow, and 'other' was not used for anything else.
https://www.gnu.org/software/emacs/manual/html_node/elisp/Simple-Match-Data.html
A search which fails may or may not alter the match data. In the current implementation, it does not, but we may change it in the future. Don't try to rely on the value of the match data after a failing search.
https://fossil.textboard.org/sbbs/tktview?name=812dd05990
When a HTTP request fails, the buffers seem to not be cleaned up.
string-match-p " \\*http textboard\\.org"
In sbbs--*-loader you have a (kill-buffer) right after (read (current-buffer)), but this is missing from the :error branch before the error call. You might also want to clean up even if (buffer-live-p buf) is false. Also note that in sbbs--thread-loader both of your search-forwards are using "#f", even the one with (replace-match "t").
Someone might send this to the sbbs.el person.
>>278
I think it's better to keep these kind of posts in the sbbs thread or ideally add them to the fossil repo.
or ideally add them to the fossil repo
Take a few minutes to read the thread from >>253.
keep these kind of posts in the sbbs thread
Bitdiddle was curious about what happens when the thread fills.
The author of Paster read the issue on Github and released it under an MIT license, I can legally hack the code and redistribute it now!
https://fossil.textboard.org/paster/
The number of projects without Free or Open Source license in the wild is mind-blowing
The index page is starting to look a bit messy, should it be redesigned?
>>282
Hasn't it always been ugly and uninviting? Who's reading that?
(define *sexp* "data/sexp")
[...]
(define *max-posts* 300)
(define *board-list* (map pathname-name (cddr (directory-read "data/sexp/*"))))
(define *range-regex* "[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?(,[1-9][0-9]{0,2}(-[1-9][0-9]{0,2})?)")
[...]
(define (range? posts)
(irregex-match (string-append *range-regex* "{0," (number->string *max-posts*) "}") posts))
That string-append is a constant computation. Is there some reason for it not to happen in the global define block, instead of being repeated on every range? Similarly, *board-list* and *sexp* seem to duplicate the sexp path instead of using a string-append, while the commit label is
Allow to quote every single posts as comma separated values. Deduplicate regex code
>>383
It's never been pretty, but at least it was clean. Ideally it would contain a overview of all posts on all boards, so that one could quickly see if a thread one was participating in is active.
>>284
Yes the same regex was used twice, in bbs.scm and in templates.scm, but you're right, the repeat limit should be defined globally. There are still magic numbers here and there that I need to get rid of.
>>285
A list of the last active threads is a very good idea. That's what they do that on imageboards I think. I never use index pages, I usually go directly to the boards I visit. I'm also not very good at web design, that's why I never really cared about the index, but I admit it's `uninviting''. SchemeBBS itself just prints "site root" if you query the index.
>>263,267
Why doesn't the admin produce proper UTF-8 files? That seems much better than processing them after the fact. What even is this encoding? Seems like a bug.
>>288
That but is called MIT Scheme.
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Unicode.html
>>287
The sexp files are written in bbs.scm:post-message:
(call-with-output-file path (lambda (port) (write t port)))
This 'write' is a built-in of MIT/GNU Scheme and therefore the bug is not the admin's.
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Output-Procedures.html#index-write-2117
Rest assured that if the bug had been Bitdiddle's, this would have been stated explicitly.
Why doesn't the admin produce proper UTF-8 files?
To answer this question narrowly, the reason is that there is no built-in pair that reads/writes general scheme objects with proper utf-8 support. If you wish to submit such a pair of functions yourself, the admin will probably accept them if they pass correctness stress tests and the efficiency loss is not too great. But that is by no means a small undertaking. Patching the decoding was far easier.
The HTML files are written in actual utf-8, as far as we've seen.
[1/2]
Incremental HTML Generation
For threads with triple-digit post counts, like this one, the first page load after a new post starts to lag, because the page is rebuilt from scratch. There was a request for some timings in >>119 and a timing method in >>64, but in the intervening months no timings were forthcoming. That's OK, I hacked together my own. Here is the timing method:
$ cat test.scm
(define (timeit proc)
(with-timings proc
(lambda (run-time gc-time real-time)
(write (internal-time/ticks->seconds run-time))
(write-char #\space)
(write (internal-time/ticks->seconds gc-time))
(write-char #\space)
(write (internal-time/ticks->seconds real-time))
(newline))))
This diff comments out the server stuff and allows bbs.scm to be partially tested in the REPL of MIT/GNU Scheme 9.1.1. Serve-file and read-file are from deps/server.scm.
$ cat base.diff
--- bbs.scm 2020-06-14 19:41:25.881472281 +0000
+++ bbs-edit.scm 2020-06-14 23:20:48.760044390 +0000
@@ -11,14 +11,24 @@
(load "lib/utils")
(load "deps/irregex")
(load "deps/srfi-26")
-(load "deps/httpio")
-(load "deps/server")
+;(load "deps/httpio")
+;(load "deps/server")
(load "lib/html")
(load "lib/parameters")
(load "lib/markup")
(load "templates")
+(define (serve-file path #!optional headers)
+ (if (default-object? headers) (set! headers '()))
+ (let ((content (read-file path)))
+ `(200 ,headers ,content)))
+
+(define (read-file filename)
+ (call-with-input-file filename
+ (lambda (port)
+ (read-string (char-set) port))))
+
(define (get-form-hash)
"TODO"
(call-with-input-file "hash" read))
@@ -31,7 +41,7 @@
(define (make-abs-path . args)
(string-join (cons "" args) "/"))
-(define server (create-server))
+;(define server (create-server))
(define (make-response template)
`(200 ,(list (make-http-header 'content-type "text/html; charset=utf-8"))
@@ -43,11 +53,11 @@
(make-http-header 'cache-control "Private"))))
;;; static files
-(get server (serve-static "static") '("static"))
+;(get server (serve-static "static") '("static"))
-(get server (lambda (req params) (serve-file "static/favicon.ico")) '("favicon.ico"))
+;(get server (lambda (req params) (serve-file "static/favicon.ico")) '("favicon.ico"))
-(add-handler server (lambda (req params) (route req)))
+;(add-handler server (lambda (req params) (route req)))
(define (ignore-qstring fullpath)
(let ((l (string-split fullpath #\?)))
@@ -414,4 +424,4 @@
(decode-formdata message)
(cdr validation)))))
-(listen server (string->number (car (command-line))))
+;(listen server (string->number (car (command-line))))
Here is the generation from scratch for https://textboard.org/sexp/prog/39 at 286, followed by serving from cache:
$ mit-scheme --load bbs-edit.scm --load test.scm
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Tuesday February 6, 2018 at 6:31:25 PM
Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41
LIAR/x86-64 4.118 || Edwin 3.116
;Loading "bbs-edit.scm"...
; Loading "format.com"... done
; Loading "lib/utils.scm"... done
; Loading "deps/irregex.scm"... done
; Loading "deps/srfi-26.scm"... done
; Loading "lib/html.scm"... done
; Loading "lib/parameters.scm"... done
; Loading "lib/markup.scm"... done
; Loading "templates.scm"... done
;... done
;Loading "test.scm"... done
1 ]=> (timeit (lambda () (view-thread "prog" "39") 'ok))
1.02 .03 1.055
;Value: ok
1 ]=> (timeit (lambda () (view-thread "prog" "39") 'ok))
.1 0. .092
;Value: ok
First is one second, the other a tenth. Here is a diff for incremental generation. The old cache is renamed on post rather than deleted. Only the new posts go through templating and sxml conversion by controlling the filter, then plain string processing without regex is used to merge the old cache with the new content. The result is identical to the full uncached generation.
[2/2]
$ cat full.diff
--- bbs.scm 2020-06-15 23:08:54.965513796 +0000
+++ bbs-edit.scm 2020-06-15 22:57:40.125009000 +0000
@@ -11,14 +11,24 @@
(load "lib/utils")
(load "deps/irregex")
(load "deps/srfi-26")
-(load "deps/httpio")
-(load "deps/server")
+;(load "deps/httpio")
+;(load "deps/server")
(load "lib/html")
(load "lib/parameters")
(load "lib/markup")
(load "templates")
+(define (serve-file path #!optional headers)
+ (if (default-object? headers) (set! headers '()))
+ (let ((content (read-file path)))
+ `(200 ,headers ,content)))
+
+(define (read-file filename)
+ (call-with-input-file filename
+ (lambda (port)
+ (read-string (char-set) port))))
+
(define (get-form-hash)
"TODO"
(call-with-input-file "hash" read))
@@ -31,7 +41,7 @@
(define (make-abs-path . args)
(string-join (cons "" args) "/"))
-(define server (create-server))
+;(define server (create-server))
(define (make-response template)
`(200 ,(list (make-http-header 'content-type "text/html; charset=utf-8"))
@@ -42,12 +52,19 @@
(serve-file path (list (make-http-header 'content-type "text/html; charset=utf-8")
(make-http-header 'cache-control "Private"))))
+(define (write-and-serve-text path text)
+ (with-output-to-file path (lambda () (write-string text)))
+ (list 200
+ (list (make-http-header 'content-type "text/html; charset=utf-8")
+ (make-http-header 'cache-control "Private"))
+ text))
+
;;; static files
-(get server (serve-static "static") '("static"))
+;(get server (serve-static "static") '("static"))
-(get server (lambda (req params) (serve-file "static/favicon.ico")) '("favicon.ico"))
+;(get server (lambda (req params) (serve-file "static/favicon.ico")) '("favicon.ico"))
-(add-handler server (lambda (req params) (route req)))
+;(add-handler server (lambda (req params) (route req)))
(define (ignore-qstring fullpath)
(let ((l (string-split fullpath #\?)))
@@ -149,7 +166,10 @@
(lambda (e) (vector-ref rangeonce (car e))))))
(cond (norange
(if (not (file-exists? cache))
- (write-and-serve cache (thread-template board thread posts headline filter-func))
+ (let ((old (name-cache-old cache)))
+ (if (file-exists? old)
+ (view-thread-ihg cache old board thread posts headline)
+ (write-and-serve cache (thread-template board thread posts headline filter-func))))
(serve-file cache)))
((and (string->number range)
(> (string->number range) (length posts)))
@@ -185,6 +205,28 @@
r3)
vec))
+(define (name-cache-old cache)
+ (string-append cache "-old"))
+
+; incremental html generation
+(define (view-thread-ihg cachepath cacheoldpath board thread posts headline)
+ (let* ((oldtext (read-file cacheoldpath))
+ (lastadt (string-search-backward "</A></DT>" oldtext))
+ (adtpos (- lastadt 9))
+ (prevgt (substring-search-backward ">" oldtext 0 adtpos))
+ (postlimit (string->number (substring oldtext prevgt adtpos)))
+ (newfilter (lambda (e) (>= (car e) postlimit)))
+ (prevdt (substring-search-backward "<DT>" oldtext 0 (- prevgt 1)))
+ (newsxml (thread-template board thread posts headline newfilter))
+ (newtext (with-output-to-string (lambda () (sxml->html newsxml))))
+ (firstdl (string-search-forward "<DL>" newtext))
+ (merged (string-append
+ (string-head oldtext (- prevdt 4))
+ (string-tail newtext (+ firstdl 5))))
+ (result (write-and-serve-text cachepath merged)))
+ (delete-file cacheoldpath)
+ result))
+
(define (view-list board)
(let* ((path (make-path *sexp* board "list"))
(cache (make-path *html* board "list"))
@@ -232,7 +274,7 @@
(vip . ,vip)
(content . ,sxml)))))
(call-with-output-file path (lambda (port) (write t port)))
- (if (file-exists? cache) (delete-file cache))
+ (if (file-exists? cache) (rename-file cache (name-cache-old cache)))
(if vip
(update-post-count board thread date post-number)
(update-thread-list board (string->number thread) date post-number))
@@ -414,4 +456,4 @@
(decode-formdata message)
(cdr validation)))))
-(listen server (string->number (car (command-line))))
+;(listen server (string->number (car (command-line))))
Timing for adding one post, with a 39-old at 285:
$ rm data/html/prog/39
$ mit-scheme --load bbs-edit.scm --load test.scm
[...]
1 ]=> (timeit (lambda () (view-thread "prog" "39") 'ok))
.12 .01 .12
;Value: ok
It's about 9 times faster and practically as fast as serving from cache. When this thread fills, we have a few other triple-digit post counts to test on. When the templates are updated, like when adding the board list, a drop-caches.sh of some sort will have to be run. If templates.scm:format-thread is drastically changed, bbs.scm:view-thread-ihg will need the equivalent update.
>>288-290
Ah, that's unfortunate. I don't know Scheme so nope, no chance of me submitting some kind of patch.
>>291-292
That's a huge performance improvement. Isn't adding html strings what Kareha does too?
Isn't adding html strings what Kareha does too?
Kareha
#!/usr/bin/perl
captcha.pl
I have no idea what Kareha does, other than it has captcha.
>>263,267
I'm a bit late to the party and I don't know elisp very well but if I take the string of >>194 in the sexp file, I can get back the utf-8 representation like this:
ELISP> (string-as-multibyte (apply #'unibyte-string (mapcar 'multibyte-char-to-unibyte "ã\201\212ç\226²ã\202\214ã\201\225ã\202\223.")))
"お疲れさん."
>>296
If it works, great! Ticket link is in >>267 if you want to post it to the sbbs.el person.
While you're at it, you might as well post https://textboard.org/prog/39#t39p278 in https://fossil.textboard.org/sbbs/tktview?name=812dd05990
>>297
It's marked as fixed (it's in commit https://fossil.textboard.org/sbbs/info/17bd3b26618a4f16 ) but >>296's code is more concise.
Here is a small diff on top of >>292 to time the sexp read/write in bbs.scm:post-message in the REPL of MIT/GNU Scheme 9.1.1. It comments out the thread-list and frontpage updates and makes the redirect absolute.
--- bbs-edit.scm 2020-06-15 22:57:40.125009000 +0000
+++ bbs-edit2.scm 2020-06-17 02:41:55.660698507 +0000
@@ -275,10 +275,10 @@
(content . ,sxml)))))
(call-with-output-file path (lambda (port) (write t port)))
(if (file-exists? cache) (rename-file cache (name-cache-old cache)))
- (if vip
- (update-post-count board thread date post-number)
- (update-thread-list board (string->number thread) date post-number))
- (update-frontpage board)
+ ;(if vip
+ ; (update-post-count board thread date post-number)
+ ; (update-thread-list board (string->number thread) date post-number))
+ ;(update-frontpage board)
(if (equal? frontpage "true")
(redirection board thread (number->string post-number) query-string #t #f)
(redirection board thread (number->string post-number) query-string #f #f))))))
@@ -297,7 +297,7 @@
"That was SICP quality!")
`(303 ,(list (make-http-header
'location
- (string-append (add-query-string (string-append "/" board "/" thread) query-string) "#t" thread "p" post)))
+ (string-append "http://x.y" (add-query-string (string-append "/" board "/" thread) query-string) "#t" thread "p" post)))
"That was SICP quality")))
(define (update-post-count board thread date post-count)
The test is on https://textboard.org/sexp/prog/39 at 286. To fake a simple request manually for post-message the mandatory post fields are those that the let* retrieves using a defaultless lookup-def rather than assq.
$ mit-scheme --load bbs-edit2.scm --load test.scm
[...]
;Loading "test.scm"... done
1 ]=> (timeit (lambda () (post-message "prog" "39" (make-http-request "POST" (string->uri "/uri") '(1 . 0) '() "frontpage=false&epistula=hello") "")))
.08 .02 .093
;Value 13: (303 (#[http-header 14 location]) "That was SICP quality")
Reading and writing the sexp together take up less than a tenth of a second at 286, so the sexp file does not need the incremental generation treatment.
>>296
sbbs person here. My knowledge of encoding in Emacs is quite limited, since just like most people I stick to multibyte buffers all the time. As far as I see, this approach would also work, the only thing that annoys me is that I don't see a direct way to translate your expression into procedural code that works on buffers. This would be necessary to avoid converting the response into a string and back again, that just strains the garbage collector and slows everything down in larger threads (such as this one). If you find anything, post a note here or in the ticked linked above.