prog

frontpage - thread list - new thread - preferences - ?

How can I run my own instance of this

263 2020-06-01 01:20

Now that tickets are accessible and I can actually read the issue, here's what happens. In https://bbs.jp.net/sexp/prog/39 the text of >>194 starts with "お疲れさん.", whatever that is, sent as the bytes:

0002ca50  6f 6e 74 65 6e 74 20 28  70 20 28 61 20 28 40 20  |ontent (p (a (@ |
0002ca60  28 68 72 65 66 20 22 2f  70 72 6f 67 2f 33 39 2f  |(href "/prog/39/|
0002ca70  31 39 32 22 29 29 20 22  3e 3e 31 39 32 22 29 20  |192")) ">>192") |
0002ca80  28 62 72 29 20 22 e3 5c  32 30 31 5c 32 31 32 e7  |(br) ".\201\212.|
0002ca90  5c 32 32 36 b2 e3 5c 32  30 32 5c 32 31 34 e3 5c  |\226..\202\214.\|
0002caa0  32 30 31 5c 32 32 35 e3  5c 32 30 32 5c 32 32 33  |201\225.\202\223|
0002cab0  2e 22 20 28 62 72 29 20  22 4e 6f 77 2c 20 49 27  |." (br) "Now, I'|
0002cac0  76 65 20 62 65 65 6e 20  6d 65 61 6e 69 6e 67 20  |ve been meaning |

The relevant bytes are:

>>> s = "e3 5c  32 30 31 5c 32 31 32 e7 5c 32 32 36 b2 e3 5c 32  30 32 5c 32 31 34 e3 5c 32 30 31 5c 32 32 35 e3  5c 32 30 32 5c 32 32 33 2e"
>>> b = bytes (int (t, base = 16) for t in s.split ())
>>> b
b'\xe3\\201\\212\xe7\\226\xb2\xe3\\202\\214\xe3\\201\\225\xe3\\202\\223.'

The original string in utf8 is:

>>> "お疲れさん.".encode ("utf8")
b'\xe3\x81\x8a\xe7\x96\xb2\xe3\x82\x8c\xe3\x81\x95\xe3\x82\x93.'

so it is obvious that we have high bytes followed by backslashed octal escapes. In the bytes of >>64 a textual backslash can be seen to be doubled.

0000deb0  6e 20 20 28 6c 65 74 2a  20 28 28 72 31 20 28 73  |n  (let* ((r1 (s|
0000dec0  74 72 69 6e 67 2d 73 70  6c 69 74 20 72 61 6e 67  |tring-split rang|
0000ded0  65 20 23 5c 5c 2c 29 29  5c 6e 20 20 20 20 20 20  |e #\\,))\n      |

So we just need to process the octals before the utf8 decoding:

>>> f = lambda b: bytes (int (b [4*k+1 : 4*k+4].decode ("ascii"), base=8) for k in range (len (b) // 4))
>>> g = lambda b: re.sub (rb"([\x80-\xff])((\\[0-7]{3})+)", lambda mo: mo.group (1) + f (mo.group (2)), b).decode ("utf-8")
>>> g (b)
'お疲れさん.'

Just do the equivalent of this in elisp and you can have your weeb characters. Someone might send this to the sbbs.el person.

301