[ prog / sol / mona ]

prog


How can I run my own instance of this

263 2020-06-01 01:20

Now that tickets are accessible and I can actually read the issue, here's what happens. In https://bbs.jp.net/sexp/prog/39 the text of >>194 starts with "お疲れさん.", whatever that is, sent as the bytes:

0002ca50  6f 6e 74 65 6e 74 20 28  70 20 28 61 20 28 40 20  |ontent (p (a (@ |
0002ca60  28 68 72 65 66 20 22 2f  70 72 6f 67 2f 33 39 2f  |(href "/prog/39/|
0002ca70  31 39 32 22 29 29 20 22  3e 3e 31 39 32 22 29 20  |192")) ">>192") |
0002ca80  28 62 72 29 20 22 e3 5c  32 30 31 5c 32 31 32 e7  |(br) ".\201\212.|
0002ca90  5c 32 32 36 b2 e3 5c 32  30 32 5c 32 31 34 e3 5c  |\226..\202\214.\|
0002caa0  32 30 31 5c 32 32 35 e3  5c 32 30 32 5c 32 32 33  |201\225.\202\223|
0002cab0  2e 22 20 28 62 72 29 20  22 4e 6f 77 2c 20 49 27  |." (br) "Now, I'|
0002cac0  76 65 20 62 65 65 6e 20  6d 65 61 6e 69 6e 67 20  |ve been meaning |

The relevant bytes are:

>>> s = "e3 5c  32 30 31 5c 32 31 32 e7 5c 32 32 36 b2 e3 5c 32  30 32 5c 32 31 34 e3 5c 32 30 31 5c 32 32 35 e3  5c 32 30 32 5c 32 32 33 2e"
>>> b = bytes (int (t, base = 16) for t in s.split ())
>>> b
b'\xe3\\201\\212\xe7\\226\xb2\xe3\\202\\214\xe3\\201\\225\xe3\\202\\223.'

The original string in utf8 is:

>>> "お疲れさん.".encode ("utf8")
b'\xe3\x81\x8a\xe7\x96\xb2\xe3\x82\x8c\xe3\x81\x95\xe3\x82\x93.'

so it is obvious that we have high bytes followed by backslashed octal escapes. In the bytes of >>64 a textual backslash can be seen to be doubled.

0000deb0  6e 20 20 28 6c 65 74 2a  20 28 28 72 31 20 28 73  |n  (let* ((r1 (s|
0000dec0  74 72 69 6e 67 2d 73 70  6c 69 74 20 72 61 6e 67  |tring-split rang|
0000ded0  65 20 23 5c 5c 2c 29 29  5c 6e 20 20 20 20 20 20  |e #\\,))\n      |

So we just need to process the octals before the utf8 decoding:

>>> f = lambda b: bytes (int (b [4*k+1 : 4*k+4].decode ("ascii"), base=8) for k in range (len (b) // 4))
>>> g = lambda b: re.sub (rb"([\x80-\xff])((\\[0-7]{3})+)", lambda mo: mo.group (1) + f (mo.group (2)), b).decode ("utf-8")
>>> g (b)
'お疲れさん.'

Just do the equivalent of this in elisp and you can have your weeb characters. Someone might send this to the sbbs.el person.

264 2020-06-01 02:47

Imagine there is a line with

>>> import re

anywhere before the g(b) call >>263, for the re.sub in g. It didn't make it through the copypasting but it was obviously there in the original because the g(b) call returned a result rather than raising a NameError.

265 2020-06-01 03:28

To convert all the honeypot links on a page like
https://www.fossil-scm.org/fossil/rptview?rn=1
to ticket links:

Array.from (document.getElementsByTagName ("a")).filter (e => e.hasAttribute ("data-href") && /\/honeypot$/.test (e.getAttribute ("href"))).forEach (e => { e.setAttribute ("href", e.getAttribute ("data-href")); })

Obviously the hostiles >>262 they are so afraid of will be nice enough to refrain from reading the data-href attribute.

266 2020-06-01 11:05

>>263
sbbs.el person here, your code is incomprehensible for non-pythonistas. Can anyone explain what's going on or at least write it out normally? "process the octals before" is a bit vauge.

267 2020-06-01 11:35

Thanks to whoever linked >>263 in the ticket.
https://fossil.textboard.org/sbbs/tktview?name=ee2e075a98

>>266

non-pythonistas

What is a pythonista?

your code is incomprehensible

Input: raw byte array
Output: unicode characters
1. ([\x80-\xff])((\\[0-7]{3})+)
Scan the input and identify locations where a byte over 0x80 is followed by one or more groups of "\DDD" where the Ds are octal digits.
2. Pass everything else through.
3. For each location, emit that first byte over 0x80, then loop over the "\DDD" groups.
4. For each group dump the backslash, take DDD to be an ascii string of three characters, parse that string as an integer in base 8, emit that integer as a byte.
5. After each location has been procesed decode the resulting byte array as utf-8.

268 2020-06-01 11:38

*processed
sorry

269 2020-06-01 13:33

>>267

What is a pythonista?

A python programmer?

And thanks for the explanation, I get the original code now too, but it's still super cryptic. Shouldn't take long to translate into working elisp.

270 2020-06-04 08:55

I didn't realize the complete Monapo font was so huge. It's been replaced with a lighter version that should suffice for SJIS-art.

271 2020-06-05 20:49

>>263
>>269
sbbs can now render SJIS-art, though it looks weird without the right font: https://fossil.textboard.org/sbbs/info/17bd3b26618a4f16

272 2020-06-05 21:31

>>271

sbbs can now render SJIS-art

UTF-8 too, Nice!

287 2020-06-15 21:57

>>263,267
Why doesn't the admin produce proper UTF-8 files? That seems much better than processing them after the fact. What even is this encoding? Seems like a bug.

288 2020-06-15 22:10 *

>>288
That but is called MIT Scheme.
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Unicode.html

289 2020-06-15 23:56

>>287
The sexp files are written in bbs.scm:post-message:

(call-with-output-file path (lambda (port) (write t port)))

This 'write' is a built-in of MIT/GNU Scheme and therefore the bug is not the admin's.
http://web.mit.edu/scheme_v9.2/doc/mit-scheme-ref/Output-Procedures.html#index-write-2117
Rest assured that if the bug had been Bitdiddle's, this would have been stated explicitly.

290 2020-06-16 00:15

>>287

Why doesn't the admin produce proper UTF-8 files?

To answer this question narrowly, the reason is that there is no built-in pair that reads/writes general scheme objects with proper utf-8 support. If you wish to submit such a pair of functions yourself, the admin will probably accept them if they pass correctness stress tests and the efficiency loss is not too great. But that is by no means a small undertaking. Patching the decoding was far easier.

The HTML files are written in actual utf-8, as far as we've seen.

293 2020-06-16 02:50

>>288-290
Ah, that's unfortunate. I don't know Scheme so nope, no chance of me submitting some kind of patch.

296 2020-06-16 17:13

>>263,267
I'm a bit late to the party and I don't know elisp very well but if I take the string of >>194 in the sexp file, I can get back the utf-8 representation like this:

ELISP> (string-as-multibyte (apply #'unibyte-string (mapcar 'multibyte-char-to-unibyte "ã\201\212ç\226²ã\202\214ã\201\225ã\202\223.")))
"お疲れさん."
300 2020-06-17 18:06

>>296
sbbs person here. My knowledge of encoding in Emacs is quite limited, since just like most people I stick to multibyte buffers all the time. As far as I see, this approach would also work, the only thing that annoys me is that I don't see a direct way to translate your expression into procedural code that works on buffers. This would be necessary to avoid converting the response into a string and back again, that just strains the garbage collector and slows everything down in larger threads (such as this one). If you find anything, post a note here or in the ticked linked above.

301


VIP:

do not edit these