[ mona / prog / sol ]

prog


What are you working on?

1 2018-12-22 22:05

This is the type of thread that you can write in every time you visit the site! You can do anything from use this as a personal log for your project, to sharing what you're hacking on at the moment, or to critique something some else did in the thread. I'm looking forward to hearing what you guys are up to!

Personally I've been working on a radix tree implementation in R7RS, it seems moderately efficient and not too bad to use, although I haven't finished many of the operations, and I'm not sure I will due to the awareness of a better way to do it if I only had the primitives (or even just a standardised FFI). Anyway thought I'd share it just because it's something to talk about. Due to the file limit I'm only posting the introduction, imports, exports, and type here, but a link to the full source can be found here: http://ix.io/1wB1.

; This is a implementation of a radix tree in the style of Clojure or
; Scala, meaning a not particularly good implementation. Ideally this
; would be a adaptive radix tree as described in the following paper:
; http://www-db.in.tum.de/~leis/papers/ART.pdf
; additionally it'd be nice to be able to have a variant for using SIMD
; functions on unboxed values if possible. Regardless scheme does not
; have the necessary primitives for implementing a adaptive radix tree
; to its full potential, namely access or abstractions of SIMD functions
; for the search function.

; much of this library may seem unidiomatic, this is simply because I'm
; trying quite hard to A) avoid allocations and B) factor out functions
; when they can be used in multiple situations, even if this means adding
; a bit of complexity.

; one of the goals of this library was to be exceedinly standard compliant,
; it's written entirely in R7RS small with SRFI 1 & 60. That being said
; this doesn't actually make this a particularly portable library due to
; the lack of upto date compliance with the standard and implementation
; of SRFI 60. To my knowledge the following implementations would likely
; be able to run this library: Foment, Gauche, Larceny, & Sagittarius.

(define-library (data radix)
  (export radix-ref radix-set radix->list radix->vector
          vector->radix list->radix radix radix-append
          radix-data radix-length radix-depth))

(import
  (scheme base)
  (srfi 1)   ; append! in radix->list
  (srfi 60)) ; bitwise operations

(define-record-type radix #t #f data length depth)
2 2018-12-23 11:34

What you should really do is get rid of this dl/dd nonsense that is cut off by ellipses on narrow displays without being scrollable, and put the code blocks into plain pre > code.

3 2018-12-23 16:01

>>2
While I appreciate the complement, I'm afraid this isn't my site. The anon who wrote it did visit a IRC channel I hang out in awhile back and said he wanted to stick to the earliest revision of the HTML standard as possible though, so maybe it has something to do with this? I'll look around at the original standard and what anon is doing to see if there is a better way to do this: https://gitlab.com/naughtybits/schemebbs
Anyway, what are you working on anon?

4 2018-12-23 21:38

>>2

wanted to stick to the earliest revision of the HTML standard as possible though, so maybe it has something to do with this?

The code is already inside a dl > dd > pre > code chain.

Anyway, what are you working on anon?

Some graphics stuff with spirals from text stamps using cairo. I'll post a simplified version on a board with images, probably lainchan.

5 2018-12-23 21:39

meant to reply to >>3, obviously

6 2018-12-24 03:49

>>4,5

The code is already inside a dl > dd > pre > code chain.

The original standard didn't have DIV or TABLE so I was thinking that maybe he was using DD/DL for CSS styles. I can't actually see the CSS because I use W3M, but when I opened it in netsurf I didn't see any reason why they would be necessary. I'll try to make a pull request tomorrow.

Some graphics stuff with spirals from text stamps using cairo. I'll post a simplified version on a board with images, probably lainchan.

Neat! Would you happen to be the lain that frequently posts this type of thing such as in the challenge thread? I honestly don't know anything about graphics programming, but I lurk /lambda/ so I'll take a look if I see it in a thread.

I ended up working on my radix library some more, I figured I'm so close to done that I might as well finish it. Some bug fixes, and code clean up but I also got slice, delete, and insert working: http://ix.io/1wGr

7 2018-12-24 13:04

>>6

The original standard didn't have DIV or TABLE

>>3

wanted to stick to the earliest revision of the HTML standard as possible

The "thread list" uses a table element, as does the SchemeBBS main page. Additionally, I find your statement quite puzzling for the following reason. The DOCTYPE on this page contains "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN". The first edition of 15445:2000 from 2000-05-15 can be found at
https://www.scss.tcd.ie/misc/15445/15445.html
In "Annex B Entities, element types and attributes" the block elements are listed as
<!ENTITY % block "BLOCKQUOTE | DIV | DL | FIELDSET | FORM | HR | OL | P | PRE | TABLE | UL" >

Would you happen to be the lain that frequently posts this type of thing such as in the challenge thread?

The "frequently" bit makes me unsure about what you're asking but I did post the trig and zigzag fizzbuzzes there. Maybe the spiral should be a fizzbuzz as well.

8 2018-12-24 13:46

The "thread list" uses a table element, as does the SchemeBBS main page. Additionally, I find your statement quite puzzling for the following reason. The DOCTYPE on this page contains "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN".

How odd, I assumed he was using HTML 2.0: https://tools.ietf.org/html/rfc1866 my apologies. I'll go work on that pull request now.

The "frequently" bit makes me unsure about what you're asking but I did post the trig and zigzag fizzbuzzes there. Maybe the spiral should be a fizzbuzz as well.

ah, well honestly I just remember those two and maybe one or two more from awhile back with the same image as the zigzag. Anyway good luck on your project anon, I look forward to seeing it.

9 2018-12-24 16:51

Apparently DD DL DT have some nice properties for formating text on devices without CSS support, I suspect that's why they're being used (something I should've known) there turned out to be a much simpler fix anyway. I emailed it to the BO because I didn't really want to make a bitbucket account.

diff --git a/static/styles/2ch.css b/static/styles/2ch.css
index 01e5cda..6e408e4 100644
--- a/static/styles/2ch.css
+++ b/static/styles/2ch.css
@@ -181,8 +181,7 @@ dd pre {
   margin: 1.2em 0;
   padding: 2px;
   line-height: 1.45em;
-  overflow: hidden;
-  text-overflow: ellipsis;
+  overflow: scroll;
 }
 samp {
   font-size: .83em;
diff --git a/static/styles/default.css b/static/styles/default.css
index ef558a8..f5774fe 100644
--- a/static/styles/default.css
+++ b/static/styles/default.css
@@ -118,8 +118,7 @@ dd pre {
   padding: 2px;
   background-color: #f7f7f7;
   line-height: 1.4em;
-  overflow: hidden;
-  text-overflow: ellipsis;
+  overflow: scroll;
 }
 blockquote {
   border-left: solid 2px #cccccc;
10 2018-12-24 21:14

>>9
Thanks for looking into this. I really should have noticed the text-overflow: ellipsis; line. Might I suggest a small change from overflow: scroll; to overflow: auto;. The difference in my firefox is that auto makes the up/down arrows scroll the page when the keyboard focus is in the pre>code, while with scroll they do nothing in the same situation, so auto is more usable for me.

11 2018-12-24 21:48

>>10

scroll: This value indicates that the content is clipped to the padding box, but can be scrolled into view (and therefore the box is a scroll container). Furthermore, if the user agent uses a scrolling mechanism that is visible on the screen (such as a scroll bar or a panner), that mechanism should be displayed whether or not any of its content is clipped. This avoids any problem with scrollbars appearing and disappearing in a dynamic environment. When the target medium is print, overflowing content may be printed; it is not defined where it may be printed.
auto: Like scroll when the box has scrollable overflow; like hidden otherwise. Thus, if the user agent uses a scrolling mechanism that is visible on the screen (such as a scroll bar or a panner), that mechanism will only be displayed if there is overflow.

In addition to what you've mentioned auto according to the standard is apparently better in this case because it hides the scrollbars when there is no overflow while scroll leaves them always present. In netsurf and presumably other browsers they seem to have the same effect because browsers aren't following the standard. Sorry about that, ```overflow: auto``` is a better choice.

12 2018-12-25 12:41

http://ix.io/1wGr

Some preliminary observations:
- It is unclear to me why 5, 32 and #b11111 are hardcoded throughout.
- There is a mixed use of the term 'depth' both with and without multiplication by 5.
- The abstraction of extract-5 is violated in the operation call in radix-copy-path.
- Using the number of the bit group instead of the shift as parameter to extract-5 would have made your bit packing scheme quicker to grok for me, along with a comment stating that groups are numbered from 1 rather than 0 and that the number is the distance from a leaf rather than the root, effectively making it a reverse depth.
- In radix-copy-path the same new-radix is returned on both branches, so it can go after the 'if'. Then the branches can lose the 'begin's. Since the let body is a lambda and has an implicit 'begin', they can be removed entirely.
- copy-path-iter! is linear recursive rather than linear iterative, so the name is misleading. The same goes for some of the other helpers named *-iter.

13 2018-12-26 20:04

>>12
Thanks so much for taking the time to give it a look over! Your comments made me realise some ways I can improve my programming in general, (I'm apparently god awful at naming things especially) and I've started to implement your suggestions, but haven't gone about the two most major ones yet, undoing the hard-coding and rewriting my extract-5 into layer-position, I'll try to do that over the next few days. Other than starting this and adding some testing I've been mostly confined to reading and hanging out with my family over the holiday so not much change on the programming front. I'll post my source again once I finish optimising the delete/insert functions and implement the two improvements.
Are you a different anon? been up to anything neat this holiday season?

>>10
I saw your latest spiral on /lambda/ I really like the look of this one. I think it's my favourite aesthetically.

I've been reading the Unix Hater's Handbook for fun, (I've enjoyed it thus far) and a couple of papers on parsers in preparation for my next project (a fast parser combinator library).

14 2018-12-27 01:07

>>13

Are you a different anon?

It's only been the two of us in this thread so far.

I saw your latest spiral on /lambda/ I really like the look of this one. I think it's my favourite aesthetically.

Thanks. Feel free to critique the code if you have cairo experience or see something that could be improved.

Some further observations:
- (extract-5 position end) is computed twice on the alternate branch of copy-path-iter!.
- The slice-length computation of copy-slice-append-iter contains an inlined version of (min 32 (- read-end read-position)). Making this explicit seems cleaner to me.
- Your code uses both (modulo read-position 32) and (logand position #b11111), which is a bit curious.
- A named let near the top level, like the *-iters, is perfectly fine. But named lets that are buried deep inside other computations, like largest-divisible-exponent-32 or the two versions of make-path, seem a bit excessive. They might be good candidates for splitting out into helper functions, like pow32? is split out.

These are still only superficial observations, mainly because your use of the continuation parameter of naive-append-vector is a nice trick that I still need to have a think on.

15 2018-12-27 08:58

I still have to finish my chan ncurses browser made in scheme.
I got a new project idea though.
I recently bought a TV and there's this app in it that allows me to see images/videos like they're an album.
But I have to host a local server which would provide the json files it requires to extrapolate images, title, layout, etc.
I'll be using guile, and I'll need to learn the web server module, and also the app's JSON api itself of course.
As for the content to see though, idk exactly what.
Obviously I can just provide a to see a folder, or even my filesystem entirely. But I'm also thinking of just providing an interface to certain sites, like maybe 4chan's webm boards

16 2018-12-27 15:24

>>13
I just noticed something strange about copy-slice-append-iter. Correct me if I'm wrong but it seems to me that, for every leaf vector except the first, each leaf vector is looked up twice, starting all the way from the root, once by the second vector-copy! in the current round, and again by the first vector-copy! in the next round. These two calls:
(radix-ref-depth read-radix (+ read-position 32) 5)
(radix-ref-depth read-radix read-position 5)
perform the exact same work across a pair of consecutive rounds, where read-position has been incremented by either 32 or the distance to read-end. I would think that avoiding such duplicate work at the algorithm level would rate higher than whether SIMD was available.

17 2018-12-27 17:25

>>14,16
I've now corrected everything you've mentioned but splitting out the functions from naive-append-vector!, removing the hard coding, and the inefficiencies in copy-append! there (you are completely correct, and of course it matters more than SIMD being available). I'm not a particularly experienced programmer I'm afraid to say, and as you've no doubt uncovered that from looking at my library here. I tried to look over your spiral code to help you a little since you've helped me so much but I've never done anything with graphics, python, or objects, and I'm afraid I couldn't come up with anything which might be of use. I do really appreciate everything you've done for me, and maybe one day when I have a bit broader set of knowledge or when you're working in my wheel house I can give some sort of a valuable suggestion. Anyway don't feel obligated to continue helping me, you've done so much for me already and I completely understand if there is some other better use of your time, if you are going to continue or if you just want to see the impact of your changes thus far here's a updated version: http://ix.io/1wVh
>>15
Those sound like pretty neat projects anon. I actually started on a image archiver for imageboards in guile awhile back, but it turns out that guile (& gnuTLS) don't play well with openbsd so I had to abandon it before I could even give it a test compile, it's not much (pretty rough if anything) but it might be of some use to you: http://ix.io/1wVg If you have any questions or you need help I or someone else here might be able to assist you.

I finished The Unix Hater's Handbook last night. I used to be really hooked on the New Jersey stuff, but these days I'm really fond of "The right thing". It's nice to be able to laugh at how far you've come. Also I think I will go ahead and try to optimise my radix tree a bit more, now that with the help of anon I've abstracted out my extract-5 function in favor of a function that simply returns the correct position in the current vector I can change the underlining function more easily, making it seem more reasonable to implement the relaxed variant of the radix tree (a sort of halfway house between the adaptive and the general) https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

18 2018-12-28 11:35

http://ix.io/1wVh

Some minor points:
- Please align the return of new-radix in copy-path-recur! with the 'if' block. Right now it's aligned as if it were a fourth argument to vector-set!. My python sense is tingling.
- There is an extract-5 in radix-delete for some reason.
- Reusing the name list->radix for the helper in (radix . elements) seems a bit inelegant to me.

And a note about radix->list. You are first testing whether the radix will be exhausted on this round and only after that whether the radix has already been exhausted. This is backward and your second case will never trigger. It is impossible that the radix will keep going after the current round, yet be exhausted already. Since you are already relying on the first case gracefully handling exhaustion by copying, listifying and appending an empty piece, you can simply dump the second case entirely rather than switching it to first. However, my suggestion is to construct the modulo piece first, then write a simple loop from below the modulo downward to zero and append the partial result to the new piece. Since the radix is accessed with radix-ref-height the same amount of accessing work is done by iterating backward. However, this way you would be appending to a piece of constant length at each step, which is linear behavior overall, whereas in your current version each step appends to a piece of linearly increasing length, which is quadratic behavior overall. Additionally, there is nothing forcing you to stick with vector->list and append!. You can have a helper deal with each new piece by consing a series of vector-refs backwards onto the result list, eliminating the need for vector->list and ditching append! and its import entirely.

I'm not a particularly experienced programmer

From your ease of use of named lets and the continuation trick of naive-append-vector!, I doubt that.

19 2018-12-28 20:08

>>18

There is an extract-5 in radix-delete for some reason.

I corrected this shortly after I posted that version, I guess I replaced those manually? very odd.

Reusing the name list->radix for the helper in (radix . elements) seems a bit inelegant to me.

I actually should've moved that into list->radix awhile ago, apply also has a bit of overhead that this avoids.

However, my suggestion is to construct the modulo piece first, then write a simple loop from below the modulo downward to zero and append the partial result to the new piece. Since the radix is accessed with radix-ref-height the same amount of accessing work is done by iterating backward. However, this way you would be appending to a piece of constant length at each step, which is linear behavior overall, whereas in your current version each step appends to a piece of linearly increasing length, which is quadratic behavior overall. Additionally, there is nothing forcing you to stick with vector->list and append!. You can have a helper deal with each new piece by consing a series of vector-refs backwards onto the result list, eliminating the need for vector->list and ditching append! and its import entirely.

This is a awesome suggestion!

(define (radix->list radix-struct)
  (define length (radix-length radix-struct))

  (define (vector->partial-list vector list)
    (let vector->partial-list-iter ((list list) (index 31))
      (if (= index 0)
        (cons (vector-ref vector index) list)
        (vector->partial-list-iter
          (cons (vector-ref vector index) list)
          (- index 1)))))

  (let radix->list-iter
    ((list
      (vector->list
        (vector-copy
          (radix-ref-height radix-struct length 1)
          0
          (modulo length 32))))
     (position
       (- length 32 (modulo length 32))))
    (if (< position 0)
      list
      (radix->list-iter
        (vector->partial-list
          (radix-ref-height radix-struct position 1)
          list)
        (- position 32)))))

From your ease of use of named lets and the continuation trick of naive-append-vector!, I doubt that.

Thanks anon.

20 2018-12-30 00:52

http://ix.io/1wVh

I just noticed that naive-append-vector! may only be called with a 32-aligned write-position, because it never increases the height of the write-radix on the else branch of the cond. This is fine since it's an internal function and its callers follow this restriction, but this should be stated explicitly. It should also be stated for copy-append! which inherits this restriction from naive-append-vector!.

In the same way, radix-copy-path may only be called with a position that is not 32-aligned at the end of the radix, and radix-set respects this. This is also fine because radix-copy-path is an intternal function, but this restriction should also be stated explicitly. If the position did not respect this, radix-copy-path would malfunction. For example, when called with position 32 on a radix of size 32 and height 1, radix-copy-path will completely ignore the high bit and act exactly as for position 0, including the operation call. When called with position 64 on a radix of size 64 and height 2, radix-copy-path will access the uninitialized third element of the root vector, and pass this to vector-copy.

The problems start in radix-append which does not follow the radix-copy-path condition. It calls radix-copy-path with (radix-length radix-struct-write) even when it is 32-aligned, which can already trigger the problems pointed out above. To make matters worse, the final-vector-remainder computation yields 32 rather than 0 for a 32-aligned length, so the lambda will attempt to use its potentially bogus vector and position arguments for an actual copy. Therefore radix-append looks quite broken from here and needs both these issues fixed.

Can you confirm this chain of reasoning from your end?

21 2018-12-30 15:49

I just noticed that naive-append-vector! may only be called with a 32-aligned write-position, because it never increases the height of the write-radix on the else branch of the cond. This is fine since it's an internal function and its callers follow this restriction, but this should be stated explicitly. It should also be stated for copy-append! which inherits this restriction from naive-append-vector!.

Okay will do!

In the same way, radix-copy-path may only be called with a position that is not 32-aligned at the end of the radix, and radix-set respects this. This is also fine because radix-copy-path is an intternal function, but this restriction should also be stated explicitly. If the position did not respect this, radix-copy-path would malfunction. For example, when called with position 32 on a radix of size 32 and height 1, radix-copy-path will completely ignore the high bit and act exactly as for position 0, including the operation call. When called with position 64 on a radix of size 64 and height 2, radix-copy-path will access the uninitialized third element of the root vector, and pass this to vector-copy.

I can't confirm this, I've tried calling radix-copy-path seperately and it seems to function as expected. I think I tracked down the actual error to the way I'm calculating the remainding space in the current vector in final-vector-remainder. So what was happending was that when the (radix-length radix-struct-write) equaled 32 the function would take the modulo 32 of this which is 0 and subtract it from 32 resulting in it saying there was 32 spaces left rather than zero. The following change to radix-append seems to have resolved things:

(define final-vector-remainder
    (- 31 (modulo (- (radix-length radix-struct-write) 1) 32)))

Thanks as always for the tips anon. I think I'll start the relaxed rewrite this afternoon, hopefully it goes well.

22 2018-12-30 22:06

>>21

I think I tracked down the actual error to the way I'm calculating the remainding space in the current vector in final-vector-remainder. So what was happending was that when the (radix-length radix-struct-write) equaled 32 the function would take the modulo 32 of this which is 0 and subtract it from 32 resulting in it saying there was 32 spaces left rather than zero.

From >>20

To make matters worse, the final-vector-remainder computation yields 32 rather than 0 for a 32-aligned length

I can't confirm this, I've tried calling radix-copy-path seperately and it seems to function as expected.

We can very easily resolve which one of us is right. Make a new radix of 64 elements. Append any other radix to it with the radix-append you posted. According to >>20 this will call vector-copy on the uninitialized third element of the root vector. In Guile this is immediately an error:

guile> (vector-copy (vector-ref (make-vector 32) 2))

Backtrace:
In standard input:
   3: 0* [vector-copy {#<unspecified>}]

standard input:3:1: In procedure vector-copy in expression (vector-copy (vector-ref # 2)):
standard input:3:1: Wrong type (expecting array): #<unspecified>
ABORT: (wrong-type-arg)
guile>

It would be very surprising if your Scheme variant was able to successfully call vector-copy on the default value that (make-vector 32) uses for the elements.

As stated in >>20 final-vector-remainder was merely one issue in radix-append. The other is the unconditional call to radix-copy-path.

23 2018-12-30 23:36

>>22
you're completely correct, I should really learn to read properly, sorry.

24 2019-01-02 03:54

I've made a little progress on the relaxed rewrite of the radix tree, and I started a new small project to allocate parliament seats proportionally based on a party-list Borda count. The election system is nearly done, I've got a paper I'm trying to decipher so that I can handle partial ballots after which all I've got to do is clean up and document. I'll probably wait until it's complete before I post anything.
I think I'm winding down a bit, or at least refocusing, I've been thinking about computational things for awhile now and honestly as much as I find it interesting like any great puzzle I also find it a bit pointless. I was always a bit of a humanities guy at heart so I'm going to try to find some ways to explore humanities topics (maybe something a bit deeper than a election system) with programming. I'm considering giving Mercury a look, or even just the logic programming section of SICP as a humanities DSL. On the other hand maybe I'll just write a bit more and that'll lift my mood a bit idk. That's about it for me for now.

Do you have any new years resolutions/plans?

25 2019-01-02 12:29

Happy New Year!

http://ix.io/1wVh

I just noticed that radix-slice is also broken, in the following ways. The 'end' parameter is used as an exclusive index, so the range test of (> end (radix-length radix-struct)) is correct, as is the copy-append! call. But this also means that radix-slice may be called with the length as 'end'. Since naive-append-vector! only increases the height of the radix on the (pow32? write-position) branch, a new radix of 32^k elements will be fully packed and have height k, where k>=1. The (= start 0) branch of radix-slice uses (= (vector-position end height) 0) to peel away levels from the radix and decrease the height. Therefore, a radix-slice call on a new radix of 1024 elements with start 0 and end 1024 will see that the peeling test passes and go down one level, because (vector-position 1024 2) is 0. On the next round it will make and return a new radix consisting solely of the first leaf vector of the source radix, with a height of 1 but claiming a length of 1024 elements. Radix-slice is similarly broken for all sources of size 32^k with k>=2, start=0 and end=length.

A second issue is that a radix-slice call on a new radix of 1024 elements with start 0 and end 32 will not pass the peeling test and will return a radix with 32 elements but height 2. This will break a subsequent naive-append-vector! because the (pow32? write-position) branch assumes the radix is fully packed and needs its height raised.

26 2019-01-03 10:35

http://ix.io/1wVh

Radix-delete has a leftover extract-5 as pointed out previously. It also uses 'end' in the level peeling test inside find-head when it obviously meant to use 'start'. Find-head is a near verbatim copy of the one inside radix-slice, so it should obviously be factored out -- the global comments claim:

I'm trying quite hard to [...] B) factor out functions when they can be used in multiple situations

More to the point, the radix-delete fixed with 'start' and 'vector-position' is still broken in ways similar to radix-slice. For example, when called on a new radix with length in [33...1024] and height 2, with start=32 and end=length, it will not pass the level peeling test and it will return a radix with 32 elements but height 2. At that point >>25 applies:

This will break a subsequent naive-append-vector! because the (pow32? write-position) branch assumes the radix is fully packed and needs its height raised.

The other class of errors can be seen when called on a new radix of 1024 elements with both start and end set to length. The first level peeling test will pass and the next round will return a new radix consisting solely of the first leaf vector of the source radix, with a height of 1 but claiming a length of 1024 elements.

And as a matter of elegance, on the (= end (radix-length radix-struct)) branch the (- (radix-length radix-struct) (- end start)) computation in the make-radix call can be entirely replaced with start.

27 2019-01-04 10:29

http://ix.io/1wVh

Radix-insert is implemented in terms of radix-append and radix-slice, both of which have the problems pointed out previously. In addition it uses 'radix-struct' on the second branch which is an unbound name at that point, and was meant to be 'radix-struct-write'. I think this about covers the public methods.

28 2019-01-04 16:02

>>25,26,27
Happy new year to you too anon. Thanks so much as always! Sorry for falling a bit behind I was working on some other things for a moment and I forgot to respond to your posts. It seems the last several functions are in a very sorry state, but I'm unsure if I should focus on resolving the last few issues with this implementation or try to work more on the rewrite. I'm curious what you thing of this anon: the advantage of the rewrite is that the library gets Olog(n) rather than O(n) append, delete, & insert but this comes at the cost of consistent read performance, with radix-ref becoming 2x slower if the radix-tree becomes unbalanced from a number of append, delete, & insert calls on it. That seems like it would make it both more difficult to reason about the performance of the radix tree, and in certain situations even reduce performance so I don't know if it's worth it or not.
Anyway I ended up finishing my electoral system for now, the implementation seems solid but I need to rework some of my ideas I think. When I first started writing it I used only abstractions in the functions, for example instead of dictating that ballots were vectors I just had wrapper functions around the vector functions to use on ballots, and instead of dictating that candidates were numbers I had variables like last-candidate, and first-candidate, and functions like next-candidate or candidate-less-than etc. I decided this was a bit verbose despite having the nice quality of being able to change out the representation and scrapped it. Was this a bad choice? http://ix.io/1xv1
Only other plans I've got is to finish reading SICP (I got 500 pages in before I decided to do other things) even if I'm not going to do the remaining exercises yet, and some things not /prog/ related.

I hope you're doing well anon.

29 2019-01-04 21:54

>>29

I think this about covers the public methods.

I still can't believe you took this much time to correct my work, you're truly amazing. I think I'll reread your suggestions here to hopefully absorb them a bit more fully.

30 2019-01-05 03:53

>>28

I'm curious what you thing of this

For a project that is not an externally imposed assignment you should work on whichever aspect holds your interest, provided that you do not advertise a project with outstanding issues as bug free to others.

it would make it both more difficult to reason about the performance of the radix tree, and in certain situations even reduce performance so I don't know if it's worth it or not.

If one class of operations loses a small constant factor while another class of operations gains an order of growth improvement, the change is probably worth it. Having said that, you have an outstanding double lookup issue in the append operation since >>16 so performance issues seem to receive a rather mercurial weight.

I decided this was a bit verbose despite having the nice quality of being able to change out the representation and scrapped it. Was this a bad choice?

As a matter of principle, replacing an abstraction layer and inlining a particular implementation is indeed a bad choice. You might do that in a C accelerator module only after profiling shows that your abstraction layer specifically constitutes a performance bottleneck.

>>29

I still can't believe you took this much time to correct my work

I could only allocate a portion of my leisure time to this, which is why it took 13 days. It was still fast enough that there are several explained but still unresolved organization, efficiency and correctness issues.

31 2019-01-05 11:49

http://ix.io/1xv1

While I won't be touching the political stuff on a programming board, here are some programming observations:
- Both the 'down' and 'incomp' functions contain lambdas that repeat the same (vector-ref ballot party) lookup on each round. For the purposes of the lambdas this is a constant computation and should be done once, before the ballot-filter-sum invocations.
- Your *-sum function increments a 0-based accumulator by either 0 or 1 based on the suitability of each item. This is called a *-count.
- Both 'ballot-max' and 'ballot-filter-sum' are vector folds. Writing them in terms of a vector-fold would be cleaner.
- The 'tally' function contains an inlined vector-inc!, which could be made explicit.

32 2019-01-05 16:53

>>30

For a project that is not an externally imposed assignment you should work on whichever aspect holds your interest, provided that you do not advertise a project with outstanding issues as bug free to others.

I'm very interested in doing the right thing if I can, and I'm certainly not interested in wasting such a valuable chance to improve my self as the one you've presented with your criticism. I think I'll throw on some music, grab some caffeine and make a bit of a programming day out of starting through the backlog today.

If one class of operations loses a small constant factor while another class of operations gains an order of growth improvement, the change is probably worth it.
As a matter of principle, replacing an abstraction layer and inlining a particular implementation is indeed a bad choice. You might do that in a C accelerator module only after profiling shows that your abstraction layer specifically constitutes a performance bottleneck.

I think I knew both of these things but I needed someone else to tell them to me so that I stop being lazy (that weird kind of lazy which is more work in the end) and just do them.

Having said that, you have an outstanding double lookup issue in the append operation since >>16 so performance issues seem to receive a rather mercurial weight.

I naively thought this issue as well as a number of issues with the final few functions would simply go away with the rewrite. When the tree is insufficiently balanced I'm of course going to be doing all the same operations though.

I could only allocate a portion of my leisure time to this, which is why it took 13 days. It was still fast enough that there are several explained but still unresolved organisation, efficiency and correctness issues.

I certainly could've done better with keeping up with your suggestions I'll try to resolve that a bit today. Even with it being a little bit spread over 13 days time is so valuable and you're beautiful anon.
>>31

While I won't be touching the political stuff on a programming board, here are some programming observations

I probably should've simply removed the original blurb for the sake of posting here, sorry about that.

Both the 'down' and 'incomp' functions contain lambdas that repeat the same (vector-ref ballot party) lookup on each round. For the purposes of the lambdas this is a constant computation and should be done once, before the ballot-filter-sum invocations.
Your *-sum function increments a 0-based accumulator by either 0 or 1 based on the suitability of each item. This is called a *-count.
Both 'ballot-max' and 'ballot-filter-sum' are vector folds. Writing them in terms of a vector-fold would be cleaner.
The 'tally' function contains an inlined vector-inc!, which could be made explicit.

I finished all but correcting ballot-max and tally this morning, these I still need to consider. I can't find documentation on vector-inc! & I'm trying to figure out how to get a index out of vector-fold nicely, or otherwise a different procedure to do it. I've also started adding back the abstractions and thought of a new weight function that I think might give more desirable results, I haven't added it yet however. Anyway on to the radix library for the rest of the day. http://ix.io/1xzN Thanks as always anon.

33 2019-01-05 23:54

>>32

I can't find documentation on vector-inc!

There is no standard vector-inc! function, that I know of. I simply meant your own function that would do as the name suggests. You are already using your Scheme variant's vector-for-each which does not take an index argument, in contrast with the SRFI-43 version: https://srfi.schemers.org/srfi-43/srfi-43.html#vector-for-each

I'm trying to figure out how to get a index out of vector-fold nicely, or otherwise a different procedure to do it.

The idea is to use sufficient information as the fold accumulator to comfortably perform the computation, then wrap the fold in an extractor that gets that part of the accumulator that you wish to return. When the information to be returned is insufficient on its own to perform the fold, the standard technique is to make the accumulator a superset of the final return.

http://ix.io/1xzN

It is unclear to me why the new 'ranked-parties-below' and 'incomparable-parties' take both a 'party' and a 'rank' argument. If the rank is computed externally, there does not seem to be a need for 'party'. If 'party' is passed in, the rank can be extracted locally, prior to the vector-count.

34 2019-01-06 04:08

>>33

There is no standard vector-inc! function, that I know of. I simply meant your own function that would do as the name suggests.

Oh that makes sense, sorry. I'll work on this in the morning.

You are already using your Scheme variant's vector-for-each which does not take an index argument, in contrast with the SRFI-43 version: https://srfi.schemers.org/srfi-43/srfi-43.html#vector-for-each

vector-for-each is a R7RS standard function, and SRFI-43 was replaced with SRFI-133 due to the standard breaking compliance with SRFI-43: https://srfi.schemers.org/srfi-133/srfi-133.html#vector-for-each That being said it's still basically my Scheme variant's version because there aren't all that many implementations of R7RS :(

The idea is to use sufficient information as the fold accumulator to comfortably perform the computation, then wrap the fold in an extractor that gets that part of the accumulator that you wish to return. When the information to be returned is insufficient on its own to perform the fold, the standard technique is to make the accumulator a superset of the final return.

Ah, I did try this, it just felt a bit unsatisfying, like there was something really good lurking just below the surface. Oh well, I'll do this in the morning as well then.

It is unclear to me why the new 'ranked-parties-below' and 'incomparable-parties' take both a 'party' and a 'rank' argument. If the rank is computed externally, there does not seem to be a need for 'party'. If 'party' is passed in, the rank can be extracted locally, prior to the vector-count.

This is now fixed, it was just silliness as you might imagine.

I didn't make quite as much progress as I would've liked today, I know for a fact I introduced at least one new bug while resolving only 3 of your older suggestions and 2 I'd been meaning to do. I'll have to give it another go tomorrow: http://ix.io/1xBR

35 2019-01-07 04:04

>>34

vector-for-each is a R7RS standard function, and SRFI-43 was replaced with SRFI-133 due to the standard breaking compliance with SRFI-43

Right, you're using R7RS. I'm still living in the R5RS world. Thanks for the correction.

http://ix.io/1xBR

Some preliminary observations:
- The focus system seems designed to accelerate near-sequential lookups or iteration. Is this your trick? How was it chosen over, say, a doubly linked list running through the leaf vectors?
- The rationale behind the new-start computation of the let* of focus-set! is not immediately obvious to me. Could you explain a bit? Why is the multiplication factor 32 rather than node-size?
- I think a note would be appropriate somewhere around focus-set! that you are choosing to compute focus-start and focus-end as the maximum possible range of positions that the focus could cover, but those positions are not all guaranteed to actually exist in the current radix.
- Naive-append-vector! looks much more manageable now. I do have to wonder though about the make-radix call in the proc call of the else branch, when the let* has new-radix.
- As explained in >>20 about radix-copy-path, radix-ref-height must also not be called with a position that is not 32-aligned at the end of the radix. Radix->list-iter does not respect this, and in >>19 this was almost harmless because the radix was stateless and the vector-copy handled the empty slice. But now the radix is stateful so the modulo piece should become the empty list immediately when the modulo is 0, without a call to (radix-ref-height radix-struct length 1).
- The final-vector-remainder of radix-append regressed from >>21.

36 2019-01-07 04:19

Correction:

must also not be called with a position that is not 32-aligned at the end of the radix.

the second 'not' is to be elided:

must also not be called with a position that is 32-aligned at the end of the radix.

37 2019-01-07 16:35

>>35

The focus system seems designed to accelerate near-sequential lookups or iteration. Is this your trick?

Yep, the idea is to optimise for locality of reference by reducing indirection to recently accessed sub-radixs. I think this is what Scala does for its vectors as well.

How was it chosen over, say, a doubly linked list running through the leaf vectors?

It seems like a doubly liked list on the leaves would be better. It's fewer indirections to neighbours when the tree is tall, and you don't have to store height. Perhaps when doing concurrent unordered operations the focus system is fewer indirections, and that's why scala did it this way? but why would you care about locality then? I'll probably redo it as a doubly linked list as you suggest.

The rationale behind the new-start computation of the let* of focus-set! is not immediately obvious to me. Could you explain a bit? Why is the multiplication factor 32 rather than node-size?

This was the mistakes that was causing radix->list to vomit random data all over the place, nice!

I think a note would be appropriate somewhere around focus-set! that you are choosing to compute focus-start and focus-end as the maximum possible range of positions that the focus could cover, but those positions are not all guaranteed to actually exist in the current radix.

Okay done.

Naive-append-vector! looks much more manageable now. I do have to wonder though about the make-radix call in the proc call of the else branch, when the let* has new-radix.

Thanks, I think there is still a good bit of room for improvement there, the double make-radix is silliness as usual though, it's now fixed.

As explained in >>20 about radix-copy-path, radix-ref-height must also not be called with a position that is 32-aligned at the end of the radix. Radix->list-iter does not respect this, and in >>19 this was almost harmless because the radix was stateless and the vector-copy handled the empty slice. But now the radix is stateful so the modulo piece should become the empty list immediately when the modulo is 0, without a call to (radix-ref-height radix-struct length 1).

I haven't dealt with this yet but I will soon.

The final-vector-remainder of radix-append regressed from >>21.

Oops, I moved to a old save because I was working on the rewrite on the current branch. I really need some sort of version control.

Yesterday I just ended up working on the election system, I came up with a different weighting function than the one I thought would work two days ago, and I ended up resolving many of my problems by using fancy vector functions, main thing left is to find a efficient means of resolving ballot level ties: http://ix.io/1xJM I'll try to focus on the radix library today.

38 2019-01-07 19:01

>>37

Yep, the idea is to optimise for locality of reference by reducing indirection to recently accessed sub-radixs. I think this is what Scala does for its vectors as well.

I forgot to mention that the idea was to find a general optimisation for >>16 that's why I did this now.

39 2019-01-08 00:18

>>37

I'll probably redo it as a doubly linked list as you suggest.

This wasn't a design suggestion. I was merely curious about how this focus trick came to be. It's a nice trick.

http://ix.io/1xJM

- Ballot-max exists solely as a comment.
- I'm not sure about the direction of comparison in ranked-parties-below.
- On the consequent of (vector-every zero? results) you are dynamically computing the sum of 1 and 0, and doing it for each 0.

40 2019-01-08 04:47

>>39

This wasn't a design suggestion. I was merely curious about how this focus trick came to be. It's a nice trick.

Oh neat, I guess this is because of the cost of creating and maintaining the doubly linked list? That makes sense.

Ballot-max exists solely as a comment.
On the consequent of (vector-every zero? results) you are dynamically computing the sum of 1 and 0, and doing it for each 0.

Fixed and fixed.

I'm not sure about the direction of comparison in ranked-parties-below.

It's functioning correctly, I'm just really aweful at naming things is probably why you said that.

I believe I've now corrected everything you've suggested but the hard coding which I will likely put off until I want to implement either a set or a map using the same underlieing library. http://ix.io/1xMb I've got a few more things I know of to do on this part before I consider either moving on to add enough functionality to make it useable or to continue on the relaxed rewrite.

41 2019-01-08 12:50

http://ix.io/1xMb

- This is a nice upgrade, but there's really no need to mention me in the comments.
- Find-head now uses 'position' as an exclusive end index. This is fine for its purpose, but it also means it shouldn't be called with position=0.
- Radix-slice disallows find-head calls with end=0, although the second error names radix-delete.
- Radix-delete allows find-head calls with start=0, which is a problem both for find-head and because deleting everything would produce an empty radix. Also, the second error should be for a strict test, since equality means deleting nothing and should short-circuit to an identity operation.
- You have both (make-radix #() 0 0) and (make-radix #() 0 1).
- Radix-append's final-vector-remainder is 32 for a 32-aligned length. For a new radix of 64 elements radix-copy-path will be called with 63. This will copy the path to the second full leaf vector, then call the lambda with 31. This will then call vector-copy! at 32 on a 32-vector with copy count 32.

42 2019-01-09 05:47

>>41

This is a nice upgrade, but there's really no need to mention me in the comments.

If you'd actually like to be removed I can just change it to "anonomous contributer" and remove the context. I certainly don't want to remove the testement to the work you've done, even if only you and I know who it's refering to. It's just a means to give credit where it's due and expressing my appresiation for all you've done for me. The name spiral-anon reminds me to ask if you've had the time to work on anything for your self lately?

Find-head now uses 'position' as an exclusive end index. This is fine for its purpose, but it also means it shouldn't be called with position=0.
Radix-slice disallows find-head calls with end=0, although the second error names radix-delete.
Radix-delete allows find-head calls with start=0, which is a problem both for find-head and because deleting everything would produce an empty radix. Also, the second error should be for a strict test, since equality means deleting nothing and should short-circuit to an identity operation.
You have both (make-radix #() 0 0) and (make-radix #() 0 1).
Radix-append's final-vector-remainder is 32 for a 32-aligned length. For a new radix of 64 elements radix-copy-path will be called with 63. This will copy the path to the second full leaf vector, then call the lambda with 31. This will then call vector-copy! at 32 on a 32-vector with copy count 32.

I think I've fixed or documented all of these now along with a whole host of other changes, I'm kind of exhausted so I might have a slower day tommorrow: http://ix.io/1xRN
Other than this I watched a video lecture on Backus's "function level programing" which was pretty interesting.

43 2019-01-09 12:55

>>42

If you'd actually like to be removed I can just change it to "anonomous contributer" and remove the context.

It's your source code so you can mention whomever you want. I simply meant that it's not necessary to give credit for a few observations produced in my leisure time.

The name spiral-anon reminds me to ask if you've had the time to work on anything for your self lately?

I've been trying to make a flower crown out of composing sine waves in cairo. It's not going well.

http://ix.io/1xRN

- I see you changed your mind about the 'proc' of naive-append-vector!. As one of the benefits, the copy-append-recur! helper is now iterative, as is vector->radix-recur.
- The final-vector-remainder of radix-append is 32 for a 32-aligned length. While the lambda no longer attempts to overflow its vector, the make-radix call receives a length that is incremented regardless of the lambda's decision, so the length does not match the no-copy case. The copy-append! also receives a read-start that does not match the no-copy case, so the first 32 radix-2 elements are skipped when (radix-length radix-1) is 32-aligned.
- In radix-append, the second argument of the make-radix call contains an inlined version of the fifth argument of the vector-copy! call, which should thus be computed once and reused.
- The second branch of radix-slice retains "or equal to", and the third branch test has an extra 0. The full-slice case should also be handled faster than via find-head.
- With the proliferation of (make-radix #() 0 0) calls, they should be factored out.
- The second branch of radix-delete retains "or equal to". A find-head call can still be triggered with start=0. The full-delete case should also be handled.
- With the advent of find-head it seems that a global decision should be made on whether an empty radix is a first-class radix or just an internal convenience targeting the first branch of naive-append-vector!. If it is decided that it is first-class, then all radix consumers should be prepared to receive it, among them radix-append.
- A radix-insert call with position=0 will cause problems for the current radix-append.

44 2019-01-09 23:45

>>42

http://ix.io/1xRN
fold, map, filter

- There seems to be a lot of (radix-length radix) here.
- It seems that the consequent of the quick exit test of radix-fold would also work for length 32.
- If you switch radix-fold-iter's leaf-length to (min 32 (- length-of-radix position)) you can avoid the modulo call.
- When you have an 'if' block with 'begin's on both branches, it's better to use a 'cond' with a test and an 'else', to avoid the 'begin's.

45 2019-01-10 16:15

>>41,42

I've been trying to make a flower crown out of composing sine waves in cairo. It's not going well.

That sounds like it will be really neat, I'm sorry it's not going well. I could do research assistant like jobs if you need anything looked into for this maybe.

I see you changed your mind about the 'proc' of naive-append-vector!.

Yes, now that I'm just mutating the struct and I was able to remove the storage variable it seems silly to pass a function just to call it on one argument; honestly, the reason I had it that way in the first place is that I have a bit of a aversion to multiple return values when they're not wrapped up in a struct, this was one of the reason I felt unsatisfied by the fold in 'vector-max' as well.

With the advent of find-head it seems that a global decision should be made on whether an empty radix is a first-class radix or just an internal convenience targeting the first branch of naive-append-vector!. If it is decided that it is first-class, then all radix consumers should be prepared to receive it, among them radix-append.

The plan is for it to be first-class like with vectors and lists in scheme, I've been slowly adding the feature to functions over the past couple days. radix-append has now been given this functionality as well.

I believe I've resolved all these issues (many this morning) and other than that it's was a slow day as promised: http://ix.io/1xY0

46 2019-01-10 20:43

>>45
I've missed a few things actually.

47 2019-01-10 23:56

>>45

I could do research assistant like jobs if you need anything looked into for this maybe.

Thanks, but this is a bit of an office contest type thing over here, so that wouldn't be fair.

I have a bit of a aversion to multiple return values when they're not wrapped up in a struct

Since the struct and an ad hoc representation like a list would be isomorphic as far as their information content is concerned, I take it that this is an aesthetic preference type of aversion.

I've been slowly adding the feature to functions over the past couple days
(many this morning)
I've missed a few things actually

Obviously there is no obligation to fix any particular issue in any particular version, such as the next one to be posted. Observations may even be overruled and entirely discarded since, again obviously, the code writer is the decision maker.

48 2019-01-11 11:13

http://ix.io/1xY0

- Why call internal-radix-constructor with hardcoded -1 for null-radix when make-radix handles this via undef? If you want null-radix to be a singleton rather than a constructor you can move it after make-radix.
- (radix->vector-iter result length length) in radix->vector-iter is simply result.
- There's a lot of (vector-length vector) recomputation in vector->radix.
- The leaf-length computation of vector->radix-iter duplicates work already done in vector-copy-32, which could return the vector and length as a cons.
- radix-copy-path calls 'proc' with the output of vector-position, which has a range of [0...31], so the 'unless' test of radix-append needs another look.

49 2019-01-11 17:51

>>47

Thanks, but this is a bit of an office contest type thing over here, so that wouldn't be fair.

Oh man, that sounds like fun, good luck then anon.

Since the struct and an ad hoc representation like a list would be isomorphic as far as their information content is concerned, I take it that this is an aesthetic preference type of aversion.

Yep, I think record-types tend to be implemented as vectors, but it's really just some scars from learning C early on. (you've seen many of them)

Obviously there is no obligation to fix any particular issue in any particular version, such as the next one to be posted. Observations may even be overruled and entirely discarded since, again obviously, the code writer is the decision maker.

Of course, it does seem like good practice to prioritise bug fixes when they're first discovered though so they don't end up confusing you at a higher level of abstraction though. This actually reminded me that at some point I need to get a bit better about writing tests. On overruling your observations, they've all just been completely correct, the only two or three times I considered it, one or two times was due to my misunderstanding and the other was me focusing on micro optimisations instead of building up a abstraction that would let me get a big-O gain later down the line.
>>48

There's a lot of (vector-length vector) recomputation in vector->radix.

I'm really bad about this, it's a pretty low cost because scheme has enforced vector bounds, but that doesn't mean it's free. It's also just not very aesthetic.

The leaf-length computation of vector->radix-iter duplicates work already done in vector-copy-32, which could return the vector and length as a cons.

ugh... okay done. (it's a good catch honestly)

radix-copy-path calls 'proc' with the output of vector-position, which has a range of [0...31], so the 'unless' test of radix-append needs another look.
Why call internal-radix-constructor with hardcoded -1 for null-radix when make-radix handles this via undef? If you want null-radix to be a singleton rather than a constructor you can move it after make-radix.
(radix->vector-iter result length length) in radix->vector-iter is simply result.

fixed, fixed, & fixed. Along with the remainder from your last post. http://ix.io/1y3G

50 2019-01-12 05:29

>>49
I didn't work much on the library today other than the bug fixes this morning, and I think I'm going to take a bonefied break tommorrow from programming generally. I did finally got around to responding to Cartesian Product thread though, and I think my solution turned out pretty nice.

51 2019-01-12 11:37

http://ix.io/1y3G

- Radix-append looks good now. One thing I think has become unnecessary is the (> length-1 0) part of the final-remainder test, since empty radices are handled entirely separately.
- If you write a vector-fold that uses a vector and a usable length, the leaf-* helpers from "Iterators" become vector folds and they only need the accumulating lambda, without repeating the iteration code.
- Similarly, the *-iter helpers from "Iterators" are all instances of a level-1-fold that provides the level-1 vectors and their usable length to the accumulating function.
- Are you sure you want to use 'value' as the next accumulator for leaf-cumulate-iter rather than the return of 'proc'?

52 2019-01-12 12:35

>>50

I did finally got around to responding to Cartesian Product thread though, and I think my solution turned out pretty nice.

While I completely agree with your assessment, you are returning the tree, not the list:

guile> (cartesian-product '(1 2 3) '(4 5 6) '(7 8 9))
((((1 4 7) (1 4 8) (1 4 9)) ((1 5 7) (1 5 8) (1 5 9)) ((1 6 7) (1 6 8) (1 6 9))) (((2 4 7) (2 4 8) (2 4 9)) ((2 5 7) (2 5 8) (2 5 9)) ((2 6 7) (2 6 8) (2 6 9))) (((3 4 7) (3 4 8) (3 4 9)) ((3 5 7) (3 5 8) (3 5 9)) ((3 6 7) (3 6 8) (3 6 9))))

and this is your call pattern:

(define (cartesian-product . An)
  (let cartesian-product-iter
    ((values (lambda (x) x)) (An An))
    (simple-format #t "c-p-i ~A" An)
    (newline)
    (if (null? An)
      (values '())
      (map
        (lambda (x)
          (cartesian-product-iter
            (lambda (y) (values (cons x y)))
            (cdr An)))
        (car An)))))

guile> (cartesian-product '(1 2 3) '(4 5 6) '(7 8 9))
c-p-i ((1 2 3) (4 5 6) (7 8 9))
c-p-i ((4 5 6) (7 8 9))
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((4 5 6) (7 8 9))
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((4 5 6) (7 8 9))
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
c-p-i ((7 8 9))
c-p-i ()
c-p-i ()
c-p-i ()
53 2019-01-12 14:51

>>50
Shouldn't it return a tree so that you can keep track of where each product belongs? Should that just be done by the relative position of the products in a list?
That call pattern just confuses me even more, there were cases where when the data was small to medium size where the memoized function was even slower than the non-memoized version, and when the value was large they were equivilent, I can confirm that the memoize function is functioning as expected. (although due to the curried function I'm not actually memoizing the correct values)
>>51

If you write a vector-fold that uses a vector and a usable length, the leaf-* helpers from "Iterators" become vector folds and they only need the accumulating lambda, without repeating the iteration code.
Similarly, the *-iter helpers from "Iterators" are all instances of a level-1-fold that provides the level-1 vectors and their usable length to the accumulating function.

Oh man, that's super sweet, I was thinking I was going to have to use macros for this for some reason.

Radix-append looks good now. One thing I think has become unnecessary is the (> length-1 0) part of the final-remainder test, since empty radices are handled entirely separately.
Are you sure you want to use 'value' as the next accumulator for leaf-cumulate-iter rather than the return of 'proc'?

Anyway I'll stay good on my promise to myself and work on these tommorrow, thanks for the help as always though anon.

54 2019-01-13 00:22

>>53

Shouldn't it return a tree so that you can keep track of where each product belongs?

You can return whatever you like, but this way you are shifting the burden of tree iteration onto the consumer of your output. It also seems curious to state how you "avoid any sort of reverse or append of the lists" when you produce a tree.

That call pattern just confuses me even more

It shows that you are making 27 leaf calls to produce a collection of 27 items, so there's nothing to memoize since the lambdas contain information essential to the return value.

55 2019-01-13 04:16

You can return whatever you like, but this way you are shifting the burden of tree iteration onto the consumer of your output. It also seems curious to state how you "avoid any sort of reverse or append of the lists" when you produce a tree.

Ah, well I guess I was trying to return a value for a human rather than another function, my naive implementation used a reverse rather than a curried function, and append was mentioned in the thread by a reply which I misinterpreted as attempting to return the same thing I was.

It shows that you are making 27 leaf calls to produce a collection of 27 items, so there's nothing to memoize since the lambdas contain information essential to the return value.

Yes I undestand that, but I'm even more confused than before because the (timed) performance is equivilent to the memoized version despite this.

56 2019-01-13 11:02

>>55

but I'm even more confused than before because the (timed) performance is equivilent to the memoized version despite this.

Given your claim of producing a collection of 2000^5000 items, which dwarfs the number of elementary particles in our universe, your methodology is more than a little suspect.

57 2019-01-14 02:24

>>56

Given your claim of producing a collection of 2000^5000 items, which dwarfs the number of elementary particles in our universe, your methodology is more than a little suspect.

I think one of the primary things holding me back at the moment is not adequately testing things, It seems both my test and one of my functions I didn't test at all. I'll make this a priority, also it seems my day off turned into two, I'll focus up tommorrow.

58 2019-01-15 03:07

>>57

I'll focus up tommorrow.

I lied, perhaps three's the lucky number.

59 2019-01-16 01:18

It wasn't a very productive day but at least I did something: http://ix.io/1yqP
Other than this the only thing I did today was start to layout a plan for migrating everything into emacs. I think I'll spend the remaining hours of the day reading though.

60 2019-01-17 04:01

>>59
another slow day, the small changes made today should mean that the library can run on every standard compliant R7RS implementation though which is nice. http://ix.io/1yvE
I also finished skimming my implementations manual for implementation specifics in standard compliant functions, mostly reading about supported SRFI today. I also got about half way through the revised report which I'll hopefully be able to finish tommorrow morning before I go out of town for a few days.

61 2019-01-18 12:33

http://ix.io/1yvE

- If null-radix? exists it might as well be used in radix-append.
- Gen-leaf-fold is not a leaf fold, it doesn't use 'leaf' at all. It folds an inlined iota. Since 'proc' will inevitably use 'pos' to extract a leaf item, and since the purpose of the fold is to factor out common work, this extraction should be in the fold and the item passed to 'proc'.
- Similarly, the level-1 fold also folds an inlined iota. It should extract the level-1 vector from the radix and pass it to 'proc-2'. It should also remove its current return asymmetry by ditching 'proc-1' and the first branch entirely.
- The lambda of leaf-for-each should return 'acc' to suppress the return of 'proc'.
- The 'begin' in the lambda of leaf-map is pointless.
- Once the level-1 fold stabilizes it should be used to drastically simplify radix->vector.

62 2019-01-22 18:19

>>61
Sorry for not replying, I was quite busy these last few days. I ended up finishing the standard the other day but I haven't done any programming other than writing some emacs configuration which is what I'll be continuing to do today.

63 2019-02-12 01:57

>>62
I saw your flower crown, it's quite beautiful and certainly a step up in complexity, it alone should've been reason to post here, I'm sorry about that. I've only been reading non-programming books since my last post, I'll take a break to do some programming after I finish the Iliad later this week though. I hope you're doing well spiral-anon.

64 2019-02-12 10:54

>>63

I saw your flower crown, it's quite beautiful

Thanks, Anon.

I hope you're doing well spiral-anon.

I am, thanks. I hope you're doing well, too.

65 2019-02-13 02:13

>>64

I am, thanks. I hope you're doing well, too.

that's good, I'm also doing quite well.

66 2019-02-15 22:13

I finished the Iliad, I'm going to move onto my next book, but I'm also going to try to be a bit more balanced about things. I wrote a different weighting procedure that solved the compromising and burying problems I was having, although I discovered that it really encourages party splitting so I need to look into that, I tried to work my head around how to memoize the cartesian product function I made to no avail, and most importantly I implemented the suggestions you mentioned in >>61 except for the radix-vector because I didn't get around to it today and the radix-fold suggestion because I'd like it to return #<undef> rather than the accumulator and I don't know how to do that yet. Although I broke radix-map in the process, and I still haven't used level-1 and gen-leaf-fold to their full potential yet. Not a massively productive day but it's a day where I have lots of free time to work on my other studies and I got something done in CS which I'm pretty happy about. http://ix.io/1B95 (oh, also some formatting changes due to emacs wizardry)

67 2019-02-16 21:21

>>66
oops, I see what broke radix-map now, carelessness.

68 2019-02-17 00:48

http://ix.io/1B95

- You can't return the vector-set! from leaf-map's lambda. That would lose the acc, which is the new (make-node).
- A function having an undefined return value means that the return is not specified, so the caller may not assume anything about it. It does not mean that the function guarantees to return anything in particular, like #<unspecified> or #<undef> -- that would actually make it have a defined return value.

69 2019-02-17 15:17

>>68

You can't return the vector-set! from leaf-map's lambda. That would lose the acc, which is the new (make-node).

I found this one, I was just being careless.

A function having an undefined return value means that the return is not specified, so the caller may not assume anything about it. It does not mean that the function guarantees to return anything in particular, like #<unspecified> or #<undef> -- that would actually make it have a defined return value.

Yes I know, I'd just rather be consistent with the surrounding system if at all possible (it might not be without using implementation specific extentions).

Mostly just aesthetic changes (I think I'm done with these) and fixing radix-map yesterday: http://ix.io/1Bix

70 2019-02-18 00:58

http://ix.io/1Bix

- Leaf-cumulate-iter continues to suffer from the last point of >>51.
- Leaf-cumulate is an instance of gen-leaf-fold with a pair holding the 'proc' output and the new-leaf as accumulator, as seen on the alternate branch. There's no need to repeat the iteration code.
- Radix-cumulate-iter and its containing 'if' are ripe for replacement by level-1-fold.

71 2019-02-18 01:00

http://ix.io/1Bix

Radix-fold-right can be unified with the normal fold by writing a function that folds an extracting map over a range iteration. A range would have start, step and end values, and would support negative steps. The extractor would receive the source object and the current index. The folding procedure would receive the accumulator, the current index and the extractor output. These steps would be combined to avoid the need to store an intermediate map, since map is eager in Scheme. This would make the normal and reversed folds differ only in the upward and downward ranges. It will also allow the unification of gen-leaf-fold and level-1-fold, since both are instances of a range-extract-fold with different ranges and extractors. You would also get some new things for free, like a radix-reversed-for-each.

72 2019-02-18 01:03

http://ix.io/1Bix

Radix-filter has two interleaved responsibilities. It produces a stream of items by filtering, and it consumes the stream with a radix builder. If you separate production and consumption, and interleave them in a function that links up a producer and a consumer, the radix building consumer becomes a fold. This can be written once and used to unify radix-filter, vector->radix and list->radix by using different producers. If you choose this path, copy-append! will also belong here by using an appending radix builder, which you already have. Then >>16 can be addressed by a generic buffering producer that wraps the real producer. The producer+consumer design can then be unified with range-extract-fold by making the fold a consumer and range-extract a producer.

73 2019-02-18 23:36

>>70-72
Oh man! Looks like I've got a programming day in my future! The past two days what I did was start to clean up my init.el, read some about literate programming, and to implemented a little program to convert between hsl rgb & to measure contrast between two rgb colors: http://ix.io/1Brq I might look into generating little color swashes at some point. The goal with these random programs is to help me make a psuedo-scientific (knowledge of rods/cones & some studies) color pallet to avoid eye strain.

74 2019-02-21 13:09

>>73
Past few days I've only been doing 'dev ops' (I really hate this kinda thing), doing some research on type design, and researching color proccessing further:

;; Primarily based off of doi:10.1167/11.5.8 and the meta analysis done there
;; on the state of the art. They come to the conclusion that .2 degrees is the
;; critical point for fonts, that is to say the smallest size at which the
;; majority of readers can read at maximum speed. In addition to this they
;; reviewed the fonts of print publications discovering the majority to lie
;; within the .22 to .25 range.

;; Programmers have several unique considerations when it comes to typography.
;; Due to the nature of programming having more context on screen at a time
;; reduces the amount of scrolling and searching neccessary to understand any
;; particular function. For this reason many programmers prefer smaller type
;; faces than they use in their other applications. Smaller typefaces are more
;; ledgeable when they have larger x-height ratios, this is something
;; programmers also tend to like even though they don't neccessarily recognise
;; it. Programmers also tend to use Monospace fonts, these fonts make the
;; structure of programs more clear.

;; The last thing of note is the history of digital type design, and a
;; associated complaint. Digital type design was historically done on raster
;; grid unlike its free flowing print cousin. This was done due to the
;; resolution of the screens and the physical capabilities of computers. Even
;; vector fonts designed for digital useage, so called 'web safe fonts', such
;; as Verdana, Arial, & Georgia were done in this way. Additionally, and likely
;; for this same reason digital types, even those being used for body text,
;; tend to be sans-serif. This is likely for the same reason, on older displays
;; the restrictions on the resolution of the font. That being said many
;; programmers today still prefer fonts designed in this style and even bitmap
;; fonts due to the increased clarity they offer, and the reduction in
;; anti-aliasing which some individuals get eye-strain from.

(define (ideal-font-size x-height-ratio inches-from-screen
			 software-dpi physical-dpi)
  (let* ((ideal-x-height-radians (/ 3.141592 900)) ; .2 degrees
	 (ppi-scaling (/ software-dpi physical-dpi))
	 (points-from-screen (* inches-from-screen 72)))
    (/ (* ideal-x-height-radians points-from-screen)
       ppi-scaling x-height-ratio)))
75 2019-02-23 21:48

>>74
I finished up my typographic research hoping to be able to make more small functions like in my previous post, unfortunately the studies I found were flaky and imprecise. I'll either conduct a little bit of my own research or I'll start trying to implement the functions necessary to generate the color scheme as I mentioned was my original intention: http://ix.io/1BS2

76 2019-03-02 03:28

>>75
Most of my computer time has unfortunately been spent getting ready to migrate operating systems and reconfiguring things, but I have managed to get a few things done. I made a corrected contrast function: http://ix.io/1zKq and decided on abstractly the hues and contrast levels I want, but haven't picked specifics. The theory of color I had before was a bit absurd so that went out the window. I've also been reading 'Computers & Typesetting, Volume C' (The Metafont Book) with hopes of perhaps making a typeface but haven't made a dramatic amount of progress.

77 2019-03-19 16:15

>>76
I've completed The Metafont Book, and made a exceedingly poor implementation of my Emacs theme, but other than this I don't believe I've done anything even computationally related.

78 2019-04-26 18:32

>>70-72
I decided to return to the radix tree. I've finally gotten around to correcting >>70 I should be able to get 71 and 72 done within a few days.

79 2019-04-26 18:33

>>71,72
These two seem especially brilliant to me, I can't believe I've put them off this long.

80 2019-06-22 21:01

>>79
Just to give a final update I've been forced to abandon my project due to a combination of university dominating my time, and a realization that this data-structure isn't well suited to the task I had in mind. I might revive it one day, but that would only happen if I found myself in a situation where the data-structure appeared particularly useful.

81 2019-07-07 14:28

I've been working on my own fork of this site on and off for a bit. I haven't made much progress besides adding quotelinks to posts in external threads.

82 2019-07-16 18:25

>>81

I've been working on my own fork of this site on and off for a bit. I haven't made much progress besides adding quotelinks to posts in external threads.

That's cool. I was thinking it would be neat to make a more standard compliant version using SRFI 106 and whatnot, but I'm low on time currently.

83 2019-07-22 13:06

hello, test.

84 2019-07-26 08:14

I am working on cooking perfect rice that ain't sticking to the pan, and it doesn't work out. HELP ME PLEASE

85 2019-07-27 05:49

if you can't afford a rice cooker then microwave is the way to go.

86 2019-07-29 08:43

>>84
Make sure water:rice ratio is correct for the type of rice you're cooking, long grain is 2:1 and jasmine is 1.5:1. Add salt, bring the rice to a boil without the lid, set to low heat, place lid on the pot, cook for 18 minutes, turn off heat and set aside without removing the lid for 10 minutes, fluff rice with a fork and enjoy.

87 2019-08-15 04:52

>>82
That would be neat to see. I'm not very familiar with the standards myself, but I've been working on and off on my fork, and maybe I'll post it here eventually once I've made enough progress that it no longer feels like a straight up clone of this site.

88 2019-08-21 00:15

Despite knowing nothing more than a few basic bits of python, I’m going to try make some sort of script to configure the wifi on my slackware laptop easier. Just find and replace essids and passwords, since conky breaks and I don’t like network manager. How hard could it be?

89 2019-08-31 14:21

I've been working on a website, https://tiblar.com. It's sort of just a fun project to work on but I'm pleased with how it's turning out. I didn't know much about UI/web design before I made it.

90 2019-10-06 11:11

3d Game engine using gerbil scheme

91 2019-10-06 15:07

>>89 Looks nice except :

You should create an account Click here to join

bruh

92 2019-10-06 15:16

String template interpolator in Clojure.

93 2019-10-06 17:12

I tried programming soft body simulation from scratch. Few days latter I regret it.

94 2019-10-06 18:20

Sore all over?

95 2019-10-16 03:10

dump

96 2020-02-17 00:15

To view the thread list in update order regardless of VIPs:

(() => {
  const tbody = document.getElementsByTagName ("tbody") [0]
  const rows  = Array.from (tbody.children)
  const upd   = e => e.children [3].children [0].textContent
  rows.sort ((a, b) => { u = upd (a); v = upd (b); return u < v ? 1 : u > v ? -1 : 0; })
  tbody.innerHTML = ""
  rows.forEach (e => tbody.appendChild (e))
}) ()

To return to the VIP order reload the page.

97 2020-02-19 12:56 *

>>96
This should be added to the "collection" of userscripts.

98 2020-02-20 00:28

To change single post quote links inside full thread views from separate single post views to local scroll jumps within the thread, while leaving other links untouched:

Array.from (document.getElementsByTagName ("a")).filter (e => e.hasAttribute ("href")).forEach (e => {
  h = e.getAttribute ("href")
  s = h.replace (/^\/([^\/]+)\/(\d+)\/(\d+)$/, "/$1/$2#t$2p$3")
  if (h != s) { e.setAttribute ("href", s); }
})

The operation is idempotent. To get the previous behavior back reload the thread.

99 2020-02-20 06:44

>>96
The sexp API also gives you the thread list in update order: http://textboard.org/sexp/prog/
But it's not really human readable, you need a scheme client first.

100 2020-02-27 02:09 *

>>98
Thanks for saving me from having to do this myself.

101 2020-02-27 15:08

>>97

This should be added to the "collection" of userscripts.

Done.
>>96 and >>98 were added to the list of userscripts:
http://textboard.org/prog/preferences
http://textboard.org/static/userscripts/

102 2020-03-03 09:02 *

My suicide note.

103 2020-03-03 11:33

>>102
Read
https://textboard.org/prog/69
and
https://textboard.org/prog/28

104 2020-03-03 17:47

>>102
Share the note here and do not commit suicide.

105 2020-06-13 23:05

>>96 >>98

https://textboard.org/prog/39#t39p128

I do hope that, in those circumstances, someone will make a script replacing the body of any post with a trip with "I am a child seeking attention". That's the right thing to do.

https://textboard.org/prog/39#t39p217

There's already an ircd running (in a jail) but I am really not sure it's a good idea so I'm keeping that for later. I like anonymous posting better than pseudonymous circlejerking.

https://textboard.org/prog/81#t81p62

there are 3 new optional dotted pairs in the threads alists. They're tagged 'name, 'tripcode, and 'e-mail. For posting the new fields are agnomen (name) and inscriptio (e-mail). Tripcodes work as everywhere else with a # character in the name field between name and capcode.

Array.from (document.querySelectorAll ("dl > dt")).filter (e => />\u25C6[0-9a-zA-Z\/+]{10}</.test (e.innerHTML)).forEach (e => { e.nextElementSibling.innerHTML = "<p>I am a child seeking attention.</p>"; })

(preliminary version)

106 2020-06-16 17:54

Some anon in the MIT pattern matching thread was mulling over a pattern matching that preserves the abstraction layers. It reminded me when I was trying to do pattern matching in Java using the visitor pattern. Maybe in Scheme it would be possible to do it nicely?

I've put together a prototype in Guile using GOOPS. It's lacks everything that would make it practical, but it preserves the abstraction layer. When matching, the object receives a `visitor' in the form of a curried function and it can decide what kind of information it wants to feed it.

;;; 20% solution to encapsulated pattern matching using Guile's GOOPS.
;;;
;;; Released to the public domain.
;;;

(use-modules (oop goops)
             (oop goops describe))

;; For this to work, every class you want to match on will have to
;; implement its own `accept-visitor' method. In this it can feed
;; anything it wants to a curried function. See the example below.

;; Build a visitor to match on objects.
(define (build-matcher pattern exit)
  (cond
   ((not (or (pair? pattern) (null? pattern)))
    (error "Malformed pattern!" pattern))
   ;; If we have ran out of patterns, we either succeeded or can't
   ;; match anymore.
   ((null? pattern)
    (lambda args (or (null? args) (exit #f))))
   ;; A symbol always matches.
   ((symbol? (car pattern))
    (lambda v
      (if (and (pair? v) (null? (cdr v)))
          (build-matcher (cdr pattern) exit)
          (exit #f))))
   ;; Special case: quoting.
   ((and (pair? (car pattern)) (eq? (caar pattern) 'quote)
         (pair? (cdar pattern)) (null? (cddar pattern)))
    (lambda v
      (if (and (pair? v) (null? (cdr v)))
          (if (eq? (car v) (cadar pattern))
              (build-matcher (cdr pattern) exit)
              (exit #f)))))
   ;; Atoms other than symbols match if they are eq?.
   ((not (or (pair? (car pattern)) (null? (car pattern))))
    (lambda v
      (if (and (pair? v) (null? (cdr v)))
          (if (eq? (car v) (car pattern))
              (build-matcher (cdr pattern) exit)
              (exit #f))
          (exit #f))))
   ;; Node patterns.
   ((pair? (car pattern))
    (lambda v
      (if (and (pair? v) (null? (cdr v)))
          (let ((result ((accept-visitor (build-matcher (car pattern) exit) (car v)))))
            (when result (build-matcher (cdr pattern) exit)))
          (exit #f))))
   (else (error "Malformed pattern!" pattern))))

;; Does the value (object) match the pattern?
(define (goops-matches? pattern value)
  (call-with-current-continuation
   (lambda (k)
     ((accept-visitor (build-matcher pattern k) value)))))

;; Construct a visitor to extract values from an object hierarchy.
(define-syntax construct-extractor
  (lambda (x)
    (syntax-case x (quote)
      ((_ () body ...)
       #'(begin body ...))
      ((_ ((quote _ ...) p ...) body ...)
       #'(lambda (skip)
           (construct-extractor (p ...) body ...)))
      ((_ ((p' ...) p ...) body ...)
       #'(lambda (node)
           (accept-visitor
            (construct-extractor (p' ...)
              (construct-extractor (p ...)
                body ...))
            node)))
      ((_ (v p ...) body ...)
       (if (symbol? (syntax->datum #'v))
           #'(lambda (v)
               (construct-extractor (p ...) body ...))
           #'(lambda (skip)
               (construct-extractor (p ...) body ...)))))))

;; Do the matching and extracting in one convenient(?) macro.
(define-syntax goops-match
  (syntax-rules (_)
    ((goops-match v
       ((p ...) b ...)
       rest ...)
     (if (goops-matches? '(p ...) v)
         (accept-visitor (construct-extractor (p ...) b ...) v)
         (goops-match v rest ...)))
    ((goops-match v
       (_ b ...))
     (begin b ...))
    ((goops-match v)
     (error "failed to match" (describe v)))))

;; Example: object-oriented lists.

(define-class <g-list> ())
(define-class <g-nil>  (<g-list>))
(define-class <g-cons> (<g-list>)
  (car #:init-keyword #:car)
  (cdr #:init-keyword #:cdr))

(define (apply+ f . args)
  (let repeat ((f f) (args args))
    (cond
     ((null? args) f)
     (else (repeat (f (car args)) (cdr args))))))

(define-method (accept-visitor v (n <g-nil>))
  (apply+ v '<g-nil>))
(define-method (accept-visitor v (c <g-cons>))
  (apply+ v '<g-cons> (slot-ref c 'car) (slot-ref c 'cdr)))

(define example
  (make <g-cons> #:car 1 #:cdr (make <g-cons> #:car 2 #:cdr (make <g-nil>))))

(define (list->g-list l)
  (cond
   ((null? l) (make <g-nil>))
   (else (make <g-cons> #:car (car l) #:cdr (list->g-list (cdr l))))))

(define (g-even-length? l)
  (goops-match l
    (('<g-nil>) #t)
    (('<g-cons> h0 ('<g-cons> h1 t)) (g-even-length? t))
    (_ #f)))

(define (g-zip l1 l2)
  (goops-match (make <g-cons> #:car l1 #:cdr l2)
    (('<g-cons> ('<g-nil>) ('<g-nil>)) (make <g-nil>))
    (('<g-cons> ('<g-cons> h0 t0) ('<g-cons> h1 t1))
     (make <g-cons>
       #:car (make <g-cons> #:car h0 #:cdr h1)
       #:cdr (g-zip t0 t1)))))

(define example2 (g-zip example example))

(define (display-zipped-g-list l)
  (goops-match l
    (('<g-nil>) (display "())") (newline))
    (('<g-cons> ('<g-cons> h0 h1) t)
     (display "((")
     (display h0)
     (display " . ")
     (display h1)
     (display ") . ")
     (display-zipped-g-list t))))

(display-zipped-g-list example2)

(display-zipped-g-list
 (g-zip (list->g-list (iota 10))
        (list->g-list (map (lambda (x) (+ 1 x)) (iota 10)))))

;; (put 'goops-match 'scheme-indent-function 1)
;; (put 'construct-extractor 'scheme-indent-function 1)
107 2020-06-17 17:35

>>106
I made that suggestion in the pattern matching thread from a CS 101 perspective. I don't know anything about objects or patterns etc. I did look and it seems that the view pattern is the typical way to do this, although it looks like some academics defined a DSL to make this more ergonomic: www.microsoft.com/en-us/research/wp-content/uploads/2016/08/pattern-synonyms-Haskell16.pdf

108 2020-06-17 19:46
(cond ((> post-number *max-posts*)
       `(200 () "max posts")) ;; TODO
$ wget --user-agent="Mozilla/5.0 Firefox/66.0" --server-response -O response.txt --post-data 'epistula=test&frontpage=false&ornamentum=3b3738ae1c9295d272cec84ecf7af5c8' 'https://textboard.org/prog/39/post'
--2020-06-17 19:38:23--  https://textboard.org/prog/39/post
Resolving textboard.org (textboard.org)... 91.121.89.209, 2001:41d0:1:8ed1::1
Connecting to textboard.org (textboard.org)|91.121.89.209|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: nginx/1.18.0
  Date: Wed, 17 Jun 2020 19:38:24 GMT
  Content-Length: 9
  Connection: keep-alive
Length: 9
Saving to: ‘response.txt’

response.txt        100%[===================>]       9  --.-KB/s    in 0s      

2020-06-17 19:38:24 (397 KB/s) - ‘response.txt’ saved [9/9]

$ cat response.txt 
max posts$ hd response.txt 
00000000  6d 61 78 20 70 6f 73 74  73                       |max posts|
00000009
$ 
109 2020-06-18 01:33

>>3
what's wrong with the XHTML standard in this case? It's an old standard and very strict.

110 2020-06-18 03:08 *

>>109
The ellipsis thing mentioned in >>3 was a CSS mistake and the very first user-contributed patch.
Keep your XHTML abomination for 2006 PHP web portals. The only standard for HTML is ISO/IEC 15445:2000. SchemeBBS should be usable and have the same layout with the same indentations in any web browser. CSS is totally optional.

Tim Berners-Lee's Line Mode (1991) - http://dejavu.org/cgi-bin/get.cgi?ver=linemode&url=http://textboard.org/prog
NCSA Mosaic (1993) - http://www.dejavu.org/cgi-bin/get.cgi?ver=93&url=http://textboard.org/prog/
Internet Explorer 2 (1995) - http://dejavu.org/cgi-bin/get.cgi?url=http%3A%2F%2Ftextboard.org%2Fprog&ver=96&storlek=0
HotJava (1997) - http://dejavu.org/cgi-bin/get.cgi?url=http%3A%2F%2Ftextboard.org%2Fprog&ver=hotjava&storlek=0

Only line-mode has a problem with <pre> and newlines but text posts with quotes are still readable and the design is the same. Icing on the cake: you only need to change the doctype declaration to make it valid html5: https://validator.w3.org/nu/?doc=http%3A%2F%2Ftextboard.org%2Fprog%2F

111 2020-06-18 03:24

<DL><DT>key</DT><DD>value</DD></DL> is just a mapping of Lisp's alist data structure (linked list of dotted pairs). It's not going anywhere.

112 2020-06-20 10:37
$ mit-scheme
[...]
  Release 9.1.1     || Microcode 15.3 || Runtime 15.7 || SF 4.41
  LIAR/x86-64 4.118 || Edwin 3.116

1 ]=> (file-length "bbs.scm")
;Value: 16583

1 ]=> (file-length "bbs2.scm")
;The object "bbs2.scm", passed as an argument to file-length, is not in the correct range.
;To continue, call RESTART with an option number:
; (RESTART 1) => Return to read-eval-print level 1.

Someone, somewhere decided that "is not in the correct range" is a perfectly reasonable description for a file does not exist error. The MIT/GNU Scheme devs are special people.

113 2020-06-20 12:03

https://www.gnu.org/software/mit-scheme/release.html

Stable release 9.0

In the past my (Chris Hanson's) policy for a stable release was that the documentation had to be updated for the release before it went out. In practice, this has meant that there have been no stable releases in recent years. As of this release, we will no longer consider updated documentation a prerequisite for a stable release.

No chance of competing with python like this.

114 2020-06-20 16:45

>>113
I'm inclined to believe that the problem is rooted in the fact that the development team is made up of two people (cph and Riastradh), both of which do it as a side-project.

115 2020-06-23 20:09

https://fossil.textboard.org/paster/home -> paster.tar.gz -> https://fossil.textboard.org/paster/paster.tar.gz -> Not Found
Page not found: paster.tar.gz

116 2020-06-23 21:59 *

>>115
Why are posting this here?

117 2020-06-23 23:02

>>116
I didn't think the paster needed its own thread. Since it was announced in "Small programming projects ideas" https://textboard.org/prog/118#t118p46 the admin didn't either.

118 2020-06-23 23:50

You butchered cooked-message in patch-acceptor.lisp, Bitdiddle.
https://fossil.textboard.org/paster/dir?ci=tip
https://fossil.textboard.org/paster/info/53613363dd7c0889
If you go to a paste like
https://paste.textboard.org/981275a8/raw
then delete 'raw' but forget the /, it doesn't match any hunchentoot:create-regex-dispatcher in controller.lisp and you get
https://paste.textboard.org/981275a8/

The requested URL ~A was not found on this server./raw

instead of the intent of
https://github.com/edicl/hunchentoot/blob/1fc7d770557affd0de269f451aaf73b3e4aa9ab4/acceptor.lisp#L721
The reason is that you removed the ~? recursive processing but kept its arguments.
http://www.lispworks.com/documentation/HyperSpec/Body/22_cgf.htm
For the same reason, if you get an (#.+http-internal-server-error+) and go to 'if *show-lisp-errors-p*', you will serve the format string itself: "~A~@[~%~%Backtrace:~%~%~A~]". To restore sanity add ~? back, move "/raw" to the :location line, and dump the now useless escape-for-html and mapcar.

(format nil "~?" format arguments) )))
[...]
(cooked-message "~A/raw" (header-out :location)))

You can also use the format directly if you dislike ~?.

119 2020-07-02 18:01 *

>>118
Thanks, I will look into that. I got that pastebot up and working in a matter of minutes and haven't paid much attention. Ad hoc modification of the Hunchentoot web server for the sake of spending the least amount of time possible on adding a link to raw pastes wasn't exactly something you'll found in the book of best practices anyway.

120 2020-07-02 19:54

>>119

I will look into that.

OK. The fix is exactly those two lines. There is also >>115.

121 2020-07-06 14:16

An Anon posted a PNG image with a curious chunk.
https://dis.tinychan.net/read/prog/1593862693#reply_20
https://i.imgur.com/iQSyNAQ.png
The chunk structure is:

 0 0x00000008     13 b'IHDR' 0x8BC30833
 1 0x00000021      1 b'sRGB' 0xAECE1CE9
 2 0x0000002E     28 b'iDOT' 0x4DCC06DA
 3 0x00000056  16384 b'IDAT' 0x9078EF33
 4 0x00004062  16384 b'IDAT' 0xC442CD36
 5 0x0000806E  16384 b'IDAT' 0x3BD3D232
 6 0x0000C07A  10912 b'IDAT' 0x8CE51B02
 7 0x0000EB26  16384 b'IDAT' 0xA963DB86
 8 0x00012B32  16384 b'IDAT' 0xF1406B57
 9 0x00016B3E   6782 b'IDAT' 0x623FA54E
10 0x000185C8      0 b'IEND' 0xAE426082

Search engines can't seem to provide a spec for the iDOT chunk, even though it's marked ancillary, public, unsafe-to-copy by "3.3. Chunk naming conventions" of the PNG spec RFC2083.
https://tools.ietf.org/html/rfc2083
I could only find questions about "ImageIO: PNG invalid PNG file: iDOT doesn't point to valid IDAT chunk" and an old DoS attack with a crafted iDOT, but other Anons could easily have better search skills. So I took a look at the chunk myself.

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 06 4c 00 00 04 de  08 06 00 00 00 8b c3 08  |...L............|
00000020  33 00 00 00 01 73 52 47  42 00 ae ce 1c e9 00 00  |3....sRGB.......|
00000030  00 1c 69 44 4f 54 00 00  00 02 00 00 00 00 00 00  |..iDOT..........|
00000040  02 6f 00 00 00 28 00 00  02 6f 00 00 02 6f 00 00  |.o...(...o...o..|
00000050  ea f8 4d cc 06 da 00 00  40 00 49 44 41 54 78 01  |..M.....@.IDATx.|
00000060  ec dd 07 9c 14 e5 f9 c0  f1 e7 1a e5 8e de 7b 53  |..............{S|

The 0x0000026f that keeps appearing is 623 and the image dimensions are 1612x1246, so that's half the height. The IDAT size stream can be seen to have a discontinuity at 10912 before returning to this encoder's preferred chunk size of 16384. The number of IDAT size runs is 2. The starting offsets of the run leaders are 0x00000056 and 0x0000EB26. The iDOT offset is 0x0000002E, and the remaining values from iDOT are 0x00000028 and 0x00eaf8. So the iDOT structure is clear:

# 4 + 12*k bytes
u32 number of IDAT runs (k)
# k run descriptors of 12 bytes each
k * (
  u32 starting row index of run
  u32 row count in run
  u32 run leader IDAT offset relative to iDOT offset
)

One caveat is that the run count can be obtained from the iDOT chunk size as (size - 4) div 12 after a 4 mod 12 check, so that initial 0x00000002 could possibly encode something else, but I am yet to see an iDOT that didn't start with 2 with a size of 28. With that said, I don't see how this structure could have been a mystery for several years, so I assume my search skills are just that poor. Also, if someone has access to a mac and is willing to experiment, we could see what other failures we could induce in the iDOT parser with crafted chunks, and whether they are simple error messages or crashes. The error messages would already be useful in labeling the various fields.

122 2020-07-06 20:21

Three other PNGs in the same thread >>121 have the iDOT chunk. They all follow the same structure rules explained above.

AJlNWhC.png

 0 0x00000008     13 b'IHDR' 0x7447CAAA
 1 0x00000021      1 b'sRGB' 0xAECE1CE9
 2 0x0000002E     28 b'iDOT' 0x7EA8BF4F
 3 0x00000056  16384 b'IDAT' 0x67E3CD1F
 4 0x00004062  16384 b'IDAT' 0x7C47A125
 5 0x0000806E  16384 b'IDAT' 0x5B75E165
 6 0x0000C07A   7512 b'IDAT' 0xCD0D8D01
 7 0x0000DDDE  16384 b'IDAT' 0x632D36D0
 8 0x00011DEA  16384 b'IDAT' 0x0D987BA8
 9 0x00015DF6   3045 b'IDAT' 0x7E530BF5
10 0x000169E7      0 b'IEND' 0xAE426082

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 06 44 00 00 04 d6  08 06 00 00 00 74 47 ca  |...D.........tG.|
00000020  aa 00 00 00 01 73 52 47  42 00 ae ce 1c e9 00 00  |.....sRGB.......|
00000030  00 1c 69 44 4f 54 00 00  00 02 00 00 00 00 00 00  |..iDOT..........|
00000040  02 6b 00 00 00 28 00 00  02 6b 00 00 02 6b 00 00  |.k...(...k...k..|
00000050  dd b0 7e a8 bf 4f 00 00  40 00 49 44 41 54 78 01  |..~..O..@.IDATx.|
00000060  ec dd 07 78 1c c5 f9 c7  f1 57 b6 dc 7b 53 71 b7  |...x.....W..{Sq.|

hLzyLXE.png

 0 0x00000008     13 b'IHDR' 0x952CDFFB
 1 0x00000021      1 b'sRGB' 0xAECE1CE9
 2 0x0000002E     28 b'iDOT' 0x06937509
 3 0x00000056  16384 b'IDAT' 0xF8A0C4FE
 4 0x00004062  16384 b'IDAT' 0xA6CBEFA6
 5 0x0000806E  16384 b'IDAT' 0x8F56E791
 6 0x0000C07A  14545 b'IDAT' 0x1AFB96B3
 7 0x0000F957  16384 b'IDAT' 0x171F7142
 8 0x00013963  16384 b'IDAT' 0xDDA443B1
 9 0x0001796F  15708 b'IDAT' 0x5C976470
10 0x0001B6D7      0 b'IEND' 0xAE426082

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 06 3e 00 00 04 d6  08 06 00 00 00 95 2c df  |...>..........,.|
00000020  fb 00 00 00 01 73 52 47  42 00 ae ce 1c e9 00 00  |.....sRGB.......|
00000030  00 1c 69 44 4f 54 00 00  00 02 00 00 00 00 00 00  |..iDOT..........|
00000040  02 6b 00 00 00 28 00 00  02 6b 00 00 02 6b 00 00  |.k...(...k...k..|
00000050  f9 29 06 93 75 09 00 00  40 00 49 44 41 54 78 01  |.)..u...@.IDATx.|
00000060  ec dd 07 9c 5c 55 dd ff  f1 df 26 9b de 2b 84 74  |....\U....&..+.t|

Vc2eT5I.png

0 0x00000008     13 b'IHDR' 0xBDCA725C
1 0x00000021      1 b'sRGB' 0xAECE1CE9
2 0x0000002E     28 b'iDOT' 0x81ED3D30
3 0x00000056  16384 b'IDAT' 0xE92A8FA4
4 0x00004062   1762 b'IDAT' 0xBCDA142D
5 0x00004750  16245 b'IDAT' 0xE624764B
6 0x000086D1      0 b'IEND' 0xAE426082

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 02 00 00 00 02 32  08 06 00 00 00 bd ca 72  |.......2.......r|
00000020  5c 00 00 00 01 73 52 47  42 00 ae ce 1c e9 00 00  |\....sRGB.......|
00000030  00 1c 69 44 4f 54 00 00  00 02 00 00 00 00 00 00  |..iDOT..........|
00000040  01 19 00 00 00 28 00 00  01 19 00 00 01 19 00 00  |.....(..........|
00000050  47 22 81 ed 3d 30 00 00  40 00 49 44 41 54 78 01  |G"..=0..@.IDATx.|
00000060  ec 9d 07 9c 1c 65 f9 c7  9f bd 9a 2b b9 e4 72 b9  |.....e.....+..r.|
123 2020-07-07 10:35

The smallest image of the four >>121 >>122 is the last one, at 512x562 8-bit RGBA(6). Its iDOT chunk is (2, (0, 281, 0x28), (281, 281, 0x4722)). The 281 is half the height and the offsets are to IDAT run leaders. The two IDAT runs are [16384, 1762] and [16245].

PNG IDAT chunks contain the stream of deflated filtered rows.
https://tools.ietf.org/html/rfc2083
For this RGBA(6) image a filtered row is 2049 bytes:

>>> 512 * 4 + 1
2049

A run of 281 filtered rows is 575769 bytes:

>>> _ * 281
575769

The full filtered row stream is 1151538 bytes:

>>> _ * 2
1151538

Here is a decompression of the IDAT stream with a bit of verbosity showing the decompressor input, the decompressor output and a running total of the output:

Vc2eT5I.png -> temp/p.pixels
   512x562 8-bit RGBA(6)
deco in 16384
deco out 65536 -> 65536
deco out 65536 -> 131072
deco out 65536 -> 196608
deco out 65536 -> 262144
deco out 65536 -> 327680
deco out 65536 -> 393216
deco out 65536 -> 458752
deco out 35596 -> 494348
deco in 1762
deco out 65536 -> 559884
deco out 15885 -> 575769
deco in 16245
deco out 65536 -> 641305
deco out 65536 -> 706841
deco out 65536 -> 772377
deco out 65536 -> 837913
deco out 65536 -> 903449
deco out 65536 -> 968985
deco out 65536 -> 1034521
deco out 65536 -> 1100057
deco out 51481 -> 1151538
   filters: [0, 562, 0, 0, 0]

After the first IDAT run is consumed, exactly the 575769 bytes of the first run of 281 filtered rows have been generated. The second IDAT run generates the bytes for the next run of 281 filtered rows. This seems to be the link between the IDAT run division and the iDOT chunk.

124 2020-07-07 13:44

"DEFLATE Compressed Data Format Specification version 1.3"
https://tools.ietf.org/html/rfc1951
Here is the IDAT run switchover point from Vc2eT5I.png >>123:

00004730  00 58 b1 0c d9 46 08 40  b0 87 16 02 10 0c 4f 08  |.X...F.@......O.|
00004740  80 8c 63 b6 0b c0 ff 03  00 00 ff ff bc da 14 2d  |..c............-|
00004750  00 00 3f 75 49 44 41 54  ed 9d 77 60 1c d5 d5 f6  |..?uIDAT..w`....|
00004760  67 25 d9 92 9b 24 6c 70  01 8c 83 b1 43 35 60 03  |g%...$lp....C5`.|

The deflate bit packing rules are in "3.1.1. Packing into bytes". The deflate block format is in "3.2.3. Details of block format". The first IDAT run ends on hex:[03 00 00 ff ff] before the u32 PNG chunk checksum. The second IDAT run starts with hex:[ed]. As stated in the spec, deflate block boundaries do not naturally line up with byte boundaries, but there is one place that generates alignment:

      3.2.4. Non-compressed blocks (BTYPE=00)

         Any bits of input up to the next byte boundary are ignored.
         The rest of the block consists of the following information:

              0   1   2   3   4...
            +---+---+---+---+================================+
            |  LEN  | NLEN  |... LEN bytes of literal data...|
            +---+---+---+---+================================+

         LEN is the number of data bytes in the block.  NLEN is the
         one's complement of LEN.

The first IDAT run ends on an empty non-compressed block with LEN=0. The second IDAT run starts with aligned deflate block and byte boundaries, with 0xed=0b11101101 signaling a fixed Huffman code block (BTYPE=01) that is also the last deflate block (BFINAL=1). Therefore, the purpose of the IDAT run divisions and of the iDOT chunk is to provide fixed seek points into the deflate stream, as a limited means of random access. The seek points are on image row boundaries specified in the iDOT chunk >>121. This can be used for things like partial decoding and parallel decoding. Of course, this needs the cooperation of the encoder, both in byte-aligning run leader deflate blocks, such as with an empty non-compressed block, and in not using back references that straddle the seek points. If these conditions are not checked in the decoder then this becomes an attack vector.

125 2020-07-07 13:52 *

A Minecraft server module for Nix.

126 2020-07-07 20:02

A JIT runtime for a virtual register-register machine.

127 2020-07-07 20:12

Here are the IDAT run switchover points for the first three images of >>121 >>122. They all exhibit the same pattern >>124 of ending the first IDAT run with an empty non-compressed block so that the starting deflate block of the second IDAT run is aligned on a byte boundary.

iQSyNAQ.png

0000eb10  fd 4a ae e1 45 af f4 37  7e 3f b7 80 ff 07 00 00  |.J..E..7~?......|
0000eb20  ff ff 8c e5 1b 02 00 00  40 00 49 44 41 54 ec dd  |........@.IDAT..|
0000eb30  0d b4 ae db 5c 30 fc 7b  9f 73 90 c7 d7 b0 37 51  |....\0.{.s....7Q|

AJlNWhC.png

0000ddd0  2f 81 11 88 bc 00 00 00  ff ff cd 0d 8d 01 00 00  |/...............|
0000dde0  40 00 49 44 41 54 ec dd  7f d4 f7 d9 5c 2f fe 7b  |@.IDAT......\/.{|

hLzyLXE.png

0000f940  87 d6 b5 de de 96 00 01  02 04 0e 43 e0 ff 00 00  |...........C....|
0000f950  00 ff ff 1a fb 96 b3 00  00 40 00 49 44 41 54 ec  |.........@.IDAT.|
0000f960  dd 7f d0 b5 5f 59 10 7a  70 42 72 40 3d 88 26 86  |...._Y.zpBr@=.&.|
128 2020-07-07 20:29

>>125

Oh, I am excited for that! Almost was bothered to do it a few months ago, but just setting some systemd.services in configuration.nix and having all data and jars in /home works fine for me.

Also I am using fabric and carpet mod and I fear it'd be kinda painful writing derivations for jars that patch jars using other jars (although it would be very cool).

129 2020-07-07 23:59

Beyond parallel decoding >>124 of the deflate stream using the seek points, which produces runs of filtered image rows, parallel unfiltering can also be used. The condition is that the first row after a seek point may not use the previous row, so it can only use None or Sub filtering, not Up, Average or Paeth.

6. Filter Algorithms
https://tools.ietf.org/html/rfc2083

The encoder that produced the four images of >>121 >>122 seems to take a much simpler approach and uses Sub filtering for every row.

AJlNWhC.png
   1604x1238 8-bit RGBA(6)
   filters: [0, 1238, 0, 0, 0]
hLzyLXE.png
   1598x1238 8-bit RGBA(6)
   filters: [0, 1238, 0, 0, 0]
iQSyNAQ.png
   1612x1246 8-bit RGBA(6)
   filters: [0, 1246, 0, 0, 0]
Vc2eT5I.png
   512x562 8-bit RGBA(6)
   filters: [0, 562, 0, 0, 0]
130 2020-07-08 13:31

The https://i.imgur.com/Vc2eT5I.png image >>124 is small and simple enough to manipulate in the REPL. Here is a demonstration of independent decoding of the second IDAT run, similar to what would happen in partial or parallel decoding. Loading the bytes:

$ python3
>>> with open ("Vc2eT5I.png", "rb") as f:
...    b = f.read ()
... 
>>> len (b)
34525

The chunk layout is in >>122:

>>> idats = [(0x00000056, 16384), (0x00004062, 1762), (0x00004750, 16245)]

Merging IDAT payloads:

>>> ba = bytearray ()
>>> for pos, len in idats:
...    ba.extend (b [pos + 8 : pos + 8 + len])
... 

Decompression yields the filtered rows. The size matches >>123:

>>> import zlib
>>> dec = zlib.decompress (ba)
>>> del len
>>> len (dec)
1151538

PNG doesn't use deflate directly but goes through zlib wrapping. The trailer of the wrapping is the adler32 of the uncompressed data. This is just a sanity check.

>>> hex (zlib.adler32 (dec))
'0x24074d18'

This matches the MSB u32 in the final four bytes of the final IDAT payload:

>>> adlpos = idats [-1][0] + 8 + idats [-1][1] - 4
>>> b [adlpos : adlpos + 4]
b'$\x07M\x18'
>>> import struct
>>> hex (struct.unpack (">L", _) [0])
'0x24074d18'

The PNG IHDR is fixed in size and position. We can peek at the height:

>>> hpos = 8 + 8 + 4
>>> struct.unpack (">L", b [hpos : hpos + 4]) [0]
562

The height is edited to half:

>>> ba.clear ()
>>> ba.extend (b [:33])
>>> ba [hpos : hpos + 4] = struct.pack (">L", 562 // 2)

Since the IHDR changed it needs a new chunk checksum:

>>> crc = zlib.crc32 (b'IHDR')
>>> crc = zlib.crc32 (ba [8 + 8 : 8 + 8 + 13], crc)
>>> ba [29 : 33] = struct.pack (">L", crc)

The second IDAT run has a single chunk. Since it will now be the first and only chunk it needs the zlib header, which is two extra bytes:

>>> ba.extend (struct.pack (">L", idats [-1][1] + 2))
>>> ba.extend (b'IDAT')

The zlib header is taken from the first 2 bytes of the first IDAT payload:

>>> ba.extend (b [idats [0][0] + 8 : idats [0][0] + 10])

The raw deflate stream of the second IDAT run is then copied exactly, omitting the old adler32 trailer:

>>> ba.extend (b [idats [-1][0] + 8 : idats [-1][0] + 8 + idats [-1][1] - 4])

The new adler32 trailer is over the second half of the uncompressed data:

>>> adl = zlib.adler32 (dec [len (dec) // 2:])
>>> ba.extend (struct.pack (">L", adl))

The new IDAT needs its chunk checksum:

>>> crc = zlib.crc32 (b'IDAT')
>>> crc = zlib.crc32 (ba [33 + 8:], crc)
>>> ba.extend (struct.pack (">L", crc))

The IEND chunk is a constant PNG trailer:

>>> ba.extend (b [-12:])

The result is a valid PNG image:

>>> with open ("temp/half.png", "wb") as f:
...    f.write (ba)
... 
16304

The image http://0x0.st/iyn7.png is the lower half of the original. Its chunk layout is:

0 0x00000008     13 b'IHDR' 0xE1499D34
1 0x00000021  16247 b'IDAT' 0xBE6A01A6
2 0x00003FA4      0 b'IEND' 0xAE426082

It is not much to look at, but its raw deflate stream is bit-for-bit identical to the one from the second IDAT run of the original. The checksums are only updated to make a valid PNG file. This decodes independently, without using the first IDAT run, as would happen in partial or parallel decoding. This is possible because the encoder split the image on row boundaries, and the filter condition >>129 and deflate block byte alignment and back reference conditions >>124 are all met.

I think this about covers the function of the iDOT chunk.

131 2020-07-09 07:37 *

>>130
Stop leaving a space between function names and parentheses.

132 2020-07-12 10:36

413 Request Entity Too Large
nginx/1.18.0

A private nginx hack that doesn't propagate to others running SchemeBBS instances, instead of a public SchemeBBS fix.

133 2020-07-12 10:40

[1/2]

https://github.com/ashinn/irregex/issues/21
Replacements of positive lookbehinds only replace first match #21

A simplified test case:

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all "(?<=x)a" "xa xa xa" "-")
$1 = "x- xa xa"
scheme@(guile-user)> (irregex-replace/all "x(?=a)" "xa xa xa" "-")
$2 = "-a -a -a"

The look-behind processing is at:
https://github.com/ashinn/irregex/blob/353b8db8472f9b36a7b08ca21a4227d827750d93/irregex.scm#L3240

((look-behind neg-look-behind)
 (let ((check
        (lp (sre-sequence
             (cons '(* any) (append (cdr sre) '(eos))))
            n
            flags
            (lambda (cnk init src str i end matches fail) i))))
   (lambda (cnk init src str i end matches fail)
     (let* ((cnk* (wrap-end-chunker cnk src i))
            (str* ((chunker-get-str cnk*) (car init)))
            (i* (cdr init))
            (end* ((chunker-get-end cnk*) (car init))))
       (if ((if (eq? (car sre) 'look-behind) (lambda (x) x) not)
            (check cnk* init (car init) str* i* end* matches
                   (lambda () #f)))
           (next cnk init src str i end matches fail)
           (fail))))))

The irregex strategy for matching a look-behind is to rescan the entire input stream from the start to the current position with the concatenation of an iterated any-matcher, the look-behind and an end-of-stream anchor. This will work but whether it's a good idea on the "text-buffer opened onto a 500MB file" from the documentation is another matter, especially with repeated matching like that of irregex-replace/all. But for reasonably sized inputs it's not a real problem, and it is also not the source of the bug. We'll add some verbosity to the end of the let* to see the locals:

            (end* ((chunker-get-end cnk*) (car init)))
            (unused (simple-format #t "look-behind ~A ~A ~A ~A ~A | ~A ~A ~A\n" init src str i end str* i* end*)))

The failing test:

scheme@(guile-user)> (irregex-replace/all "(?<=x)a" "xa xa xa" "-")
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 0 8 | xa xa xa 0 0
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 1 8 | xa xa xa 0 1
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 2 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 3 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 4 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 5 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 6 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 7 8 | xa xa xa 0 8
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 2 8) xa xa xa 8 8 | xa xa xa 0 8
$1 = "x- xa xa"

As soon as the first "a" was matched and 'src' switched from (xa xa xa 0 8) to (xa xa xa 2 8), 'end*' stopped following the current index 'i' and jumped straight to 8. The problem is this jump in 'end*'. This also means that we can make a much stranger failing case:

scheme@(guile-user)> (irregex-replace/all "(?<=x)a" "xa aaa x" "-")
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 0 8 | xa aaa x 0 0
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 1 8 | xa aaa x 0 1
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 2 8) xa aaa x 2 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 2 8) xa aaa x 3 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 4 8) xa aaa x 4 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 5 8) xa aaa x 5 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 6 8) xa aaa x 6 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 6 8) xa aaa x 7 8 | xa aaa x 0 8
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 6 8) xa aaa x 8 8 | xa aaa x 0 8
$3 = "x- --- x"

This happens because after the first replacement the end-of-stream anchor at 'end*' moves the look-behind check to the end of the string. This 'end*' is computed by chunker-get-end which is a vector-ref into a vector of lambdas computed by wrap-end-chunker:
https://github.com/ashinn/irregex/blob/353b8db8472f9b36a7b08ca21a4227d827750d93/irregex.scm#L330

(define (wrap-end-chunker cnk src i)
  (make-irregex-chunker
   (lambda (x) (and (not (eq? x src)) ((chunker-get-next cnk) x)))
   (chunker-get-str cnk)
   (chunker-get-start cnk)
   (lambda (x) (if (eq? x src) i ((chunker-get-end cnk) x)))
   (chunker-get-substring cnk)
   (chunker-get-subchunk cnk)))

This is a wrapper around the input stream that aims to stop at the current position 'i' of the current chunk 'src'. The chunk documentation is at:
http://synthcode.com/scheme/irregex/#SECTION_3.4
This is to be matched by the regex with the iterated any-matcher, the look-behind and an end-of-stream anchor. The wrapper returns #f for a next chunk request after the current one, and overrides the current chunk's end position to 'i'. The current chunk is tested for by eq?, as specified in the documentation.

There are two important constraints on the <get-next> procedure. It must return an eq? identical object when called multiple times on the same chunk, and it must not return a chunk with an empty string (start == end).

The lambda that returns the wrong 'end*' is the fourth item. A steady return of 'i' would be needed for the current chunk, so the eq? fails when it shouldn't. Plain string inputs only ever have one chunk, constructed in irregex-basic-string-chunker, so this eq? should not fail.
https://github.com/ashinn/irregex/blob/353b8db8472f9b36a7b08ca21a4227d827750d93/irregex.scm#L1861

(define irregex-basic-string-chunker
  (make-irregex-chunker (lambda (x) #f)
                        car
                        cadr
                        caddr
                        (lambda (src1 i src2 j)
                          (substring (car src1) i j))))

The 'x' and 'src' in the eq? of wrap-end-chunker are the (car init) and 'src' from the look-behind section. The debug print shows them diverging when 'end*' stops behaving. The 'init' and 'src' values are passed by an external driver to the lambda produced by the look-behind section. In the case of irregex-replace/all this driver is irregex-fold/fast.
https://github.com/ashinn/irregex/blob/353b8db8472f9b36a7b08ca21a4227d827750d93/irregex.scm#L3780

134 2020-07-12 10:44

And the post size limit dropped. 7.902 bytes got the 413.

135 2020-07-12 10:47

[2/2]

(define (irregex-fold/fast irx kons knil str . o)
  (if (not (string? str)) (error "irregex-fold: not a string" str))
  (let* ((irx (irregex irx))
         (matches (irregex-new-matches irx))
         (finish (or (and (pair? o) (car o)) (lambda (i acc) acc)))
         (start (if (and (pair? o) (pair? (cdr o))) (cadr o) 0))
         (end (if (and (pair? o) (pair? (cdr o)) (pair? (cddr o)))
                  (caddr o)
                  (string-length str)))
         (init-src (list str start end))
         (init (cons init-src start)))
    (if (not (and (integer? start) (exact? start)))
        (error "irregex-fold: not an exact integer" start))
    (if (not (and (integer? end) (exact? end)))
        (error "irregex-fold: not an exact integer" end))
    (irregex-match-chunker-set! matches irregex-basic-string-chunker)
    (let lp ((src init-src) (i start) (acc knil))
      (if (>= i end)
          (finish i acc)
          (let ((m (irregex-search/matches
                    irx
                    irregex-basic-string-chunker
                    init
                    src
                    i
                    matches)))
            (if (not m)
                (finish i acc)
                (let ((j (%irregex-match-end-index m 0))
                      (acc (kons i m acc)))
                  (irregex-reset-matches! matches)
                  (cond
                   ((flag-set? (irregex-flags irx) ~consumer?)
                    (finish j acc))
                   ((= j i)
                    ;; skip one char forward if we match the empty string
                    (lp (list str (+ j 1) end) (+ j 1) acc))
                   (else
                    (lp (list str j end) j acc))))))))))

The 'init' value is constructed once and never modified, but 'src' is modified by both loop calls to 'lp' by advancing the start position. The current position is already advanced as 'i', so advancing the current chunk's start position in sync looks suspicious. The modified chunk will no longer count as the initial chunk because they are compared with eq?, and plain strings have a single chunk. This tampering with the current chunk's start position is from:

Fixing folds on conditional begin patterns which aren't treated as searchers
ashinn committed Nov 28, 2012
https://github.com/ashinn/irregex/commit/2949a461474e0ac30d8c72f0dc81127b19c04d0d

By contrast irregex-fold/chunked/fast doesn't tamper with the current chunk and communicates the advancing of the current position within a chunk solely through the index 'i'.
https://github.com/ashinn/irregex/blob/353b8db8472f9b36a7b08ca21a4227d827750d93/irregex.scm#L3825

(define (irregex-fold/chunked/fast irx kons knil cnk start . o)
  (let* ((irx (irregex irx))
         (matches (irregex-new-matches irx))
         (finish (or (and (pair? o) (car o)) (lambda (src i acc) acc)))
         (i (if (and (pair? o) (pair? (cdr o)))
                (cadr o)
                ((chunker-get-start cnk) start)))
         (init (cons start i)))
    (if (not (integer? i)) (error "irregex-fold/chunked: not an integer" i))
    (irregex-match-chunker-set! matches cnk)
    (let lp ((start start) (i i) (acc knil))
      (if (not start)
          (finish start i acc)
          (let ((m (irregex-search/matches irx cnk init start i matches)))
            (if (not m)
                (finish start i acc)
                (let ((end-src (%irregex-match-end-chunk m 0))
                      (end-index (%irregex-match-end-index m 0)))
                  (if (and (eq? end-src start) (= end-index i))
                      (if (>= end-index ((chunker-get-end cnk) end-src ))
                          (let ((next ((chunker-get-next cnk) end-src)))
                            (lp next ((chunker-get-start cnk) next) acc))
                          (lp end-src (+ end-index 1) acc))
                      (let ((acc (kons start i m acc)))
                        (irregex-reset-matches! matches)
                        (if (flag-set? (irregex-flags irx) ~consumer?)
                            (finish end-src end-index acc)
                            (lp end-src end-index acc)))))))))))

Applying this non-tampering approach to irregex-fold/fast yields the diff:

136 2020-07-12 10:51

[post size limit dropped/2]

$ TZ=GMT diff -u irregex.scm irregex2.scm 
--- irregex.scm	2020-07-12 09:41:49.754022271 +0000
+++ irregex2.scm	2020-07-12 09:45:05.067970996 +0000
@@ -3813,9 +3813,11 @@
                     (finish j acc))
                    ((= j i)
                     ;; skip one char forward if we match the empty string
-                    (lp (list str (+ j 1) end) (+ j 1) acc))
+                    ; (lp (list str (+ j 1) end) (+ j 1) acc))
+                    (lp src (+ j 1) acc))
                    (else
-                    (lp (list str j end) j acc))))))))))
+                    ; (lp (list str j end) j acc))))))))))
+                    (lp src j acc))))))))))
 
 (define (irregex-fold irx kons . args)
   (if (not (procedure? kons)) (error "irregex-fold: not a procedure" kons))

This fixes the failing tests because the eq? of wrap-end-chunker now works properly which makes 'end*' in look-behind follow along with 'i', moving the look-behind check to the right place.

scheme@(guile-user)> (irregex-replace/all "(?<=x)a" "xa xa xa" "-")
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 0 8 | xa xa xa 0 0
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 1 8 | xa xa xa 0 1
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 2 8 | xa xa xa 0 2
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 3 8 | xa xa xa 0 3
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 4 8 | xa xa xa 0 4
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 5 8 | xa xa xa 0 5
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 6 8 | xa xa xa 0 6
look-behind ((xa xa xa 0 8) . 0) (xa xa xa 0 8) xa xa xa 7 8 | xa xa xa 0 7
$1 = "x- x- x-"
scheme@(guile-user)> (irregex-replace/all "(?<=x)a" "xa aaa x" "-")
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 0 8 | xa aaa x 0 0
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 1 8 | xa aaa x 0 1
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 2 8 | xa aaa x 0 2
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 3 8 | xa aaa x 0 3
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 4 8 | xa aaa x 0 4
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 5 8 | xa aaa x 0 5
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 6 8 | xa aaa x 0 6
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 7 8 | xa aaa x 0 7
look-behind ((xa aaa x 0 8) . 0) (xa aaa x 0 8) xa aaa x 8 8 | xa aaa x 0 8
$2 = "x- aaa x"
scheme@(guile-user)> (irregex-replace/all "(?<=[xyz])a" "xa ya za" "-")
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 0 8 | xa ya za 0 0
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 1 8 | xa ya za 0 1
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 2 8 | xa ya za 0 2
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 3 8 | xa ya za 0 3
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 4 8 | xa ya za 0 4
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 5 8 | xa ya za 0 5
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 6 8 | xa ya za 0 6
look-behind ((xa ya za 0 8) . 0) (xa ya za 0 8) xa ya za 7 8 | xa ya za 0 7
$3 = "x- y- z-"
scheme@(guile-user)> 

However, I am not familiar with all of irregex, only the small bits and pieces I've looked at, so it's entirely possible that this breaks something else. I do not know whether or not the chunk tampering behavior by irregex-fold/fast is relied on in some other part of the code. The shinnoid would have to chime in on that. Someone familiar with chicken scheme might also rerun the test suite with this change.
https://lists.nongnu.org/archive/html/chicken-users/2020-07/msg00005.html

137 2020-07-13 10:11

https://github.com/ashinn/irregex/commit/ac27338c5b490d19624c30d787c78bbfa45e1f11
fix irregex-replace/all with look-behind patterns
ashinn committed Jul 13, 2020

@@ -332,7 +333,16 @@
    (lambda (x) (and (not (eq? x src)) ((chunker-get-next cnk) x)))
    (chunker-get-str cnk)
    (chunker-get-start cnk)
-   (lambda (x) (if (eq? x src) i ((chunker-get-end cnk) x)))
+   (lambda (x)
+     ;; TODO: this is a hack workaround for the fact that we don't
+     ;; have either a notion of chunk equivalence or chunk truncation,
+     ;; until which time (neg-)look-behind in a fold won't work on
+     ;; non-basic chunks.
+     (if (or (eq? x src)
+             (and (not ((chunker-get-next cnk) x))
+                  (not ((chunker-get-next cnk) src))))
+         i
+         ((chunker-get-end cnk) x)))
    (chunker-get-substring cnk)
    (chunker-get-subchunk cnk)))
 

Chunks are now equated in the fourth item of wrap-end-chunker if they are eq? or they are both lacking a next chunk. Plain strings have a single chunk via irregex-basic-string-chunker so the second part of the 'or' will always be true for plain strings. Irregex-fold/fast was not changed.

138 2020-07-13 22:49

Here is a related bug. We might want to add 'start' to the start of every line, and we might have a file with some number of empty lines.

$ for k in $(seq 1 6); do echo ""; done > emptylines.txt
$ hd emptylines.txt 
00000000  0a 0a 0a 0a 0a 0a                                 |......|
00000006

We might use sed:

$ sed -e 's/^/start/' emptylines.txt 
start
start
start
start
start
start
$ 

We might use regular expressions in python:

$ python3
[...]
>>> s = "\n" * 6
>>> s
'\n\n\n\n\n\n'
>>> import re
>>> re.sub ("(?m)^", "start", s)
'start\nstart\nstart\nstart\nstart\nstart\nstart'
>>> print (_)
start
start
start
start
start
start
start
>>> 

Sed doesn't have a match at the very end of the input because it uses the text file convention for a final newline. We might also use irregex:

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)> (display $1)
start

start
start
start
start
startscheme@(guile-user)> 

There is no match at the start of the second line, when there clearly should be. We'll add the usual verbosity to see what happens:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-13 20:15:27.347747147 +0000
@@ -3419,11 +3419,14 @@
                (fail))))
         ((bol)
          (lambda (cnk init src str i end matches fail)
+           (simple-format #t "bol ~S ~A" src i)
            (if (or (and (eq? src (car init)) (eqv? i (cdr init)))
                    (and (> i ((chunker-get-start cnk) src))
                         (eqv? #\newline (string-ref str (- i 1)))))
-               (next cnk init src str i end matches fail)
-               (fail))))
+               (begin (display " -> yes\n")
+                      (next cnk init src str i end matches fail))
+               (begin (display " -> no\n")
+                      (fail)))))
         ((bow)
          (lambda (cnk init src str i end matches fail)
            (if (and (if (> i ((chunker-get-start cnk) src))

Rerunning the test:

$ guile --no-auto-compile -l irregex2.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 1 6) 1 -> no
bol ("\n\n\n\n\n\n" 1 6) 2 -> yes
bol ("\n\n\n\n\n\n" 2 6) 2 -> no
bol ("\n\n\n\n\n\n" 2 6) 3 -> yes
bol ("\n\n\n\n\n\n" 3 6) 3 -> no
bol ("\n\n\n\n\n\n" 3 6) 4 -> yes
bol ("\n\n\n\n\n\n" 4 6) 4 -> no
bol ("\n\n\n\n\n\n" 4 6) 5 -> yes
bol ("\n\n\n\n\n\n" 5 6) 5 -> no
bol ("\n\n\n\n\n\n" 5 6) 6 -> yes
$1 = "start\n\nstart\nstart\nstart\nstart\nstart"
scheme@(guile-user)> 

There is no match at index 1, and every location from 2 to 5 is retested after success. Both of these bugs are in irregex-fold/fast which completely mishandles empty matches.

139 2020-07-14 03:07

Here is the current irregex 0.9.8 behavior on searching for an empty regex, as a building block for later.

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-replace/all "" "aaa" "*")
$1 = "*a*a*a"
scheme@(guile-user)> (irregex-replace/all "" "\n\n\n" "*")
$2 = "*\n*\n*\n"
scheme@(guile-user)> 

There is no match at the very end of the input. This may simply be the behavior desired by the author. If it is not desired, the loop end condition of irregex-fold/fast can be made strict.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3807

140 2020-07-14 13:36

[1/2]
The second line skip bug >>138 is caused by the chunk tampering behavior of irregex-fold/fast >>135. This seems to have been introduced to compensate for the erroneous empty match detection, so a test suite rerun on >>136 would fail, as these have to be fixed simultaneously. The (bol) branch >>138 can only succeed after the start of the input if it can look at the previous character and check that it is a #\newline. But after

bol ("\n\n\n\n\n\n" 0 6) 0 -> yes

the chunk start position is advanced to 1, and the (bol) attempt at 1 doesn't have the first character available for #\newline checking.

bol ("\n\n\n\n\n\n" 1 6) 1 -> no

The only way it can succeed at 1 is if the chunk start position stays at 0, so this bug is directly caused by the chunk tampering behavior. Even if some other parts of the code rely on it, there is something to fix about it since now it is the source of a bug.

The retest after success behavior is caused by the erroneous empty match detection. This can be demonstrated as a separate bug:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
$1 = "---**a---"

The same empty match is replaced twice. We add the usual verbosity:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-14 02:49:04.508860562 +0000
@@ -3234,9 +3225,12 @@
                         flags
                         (lambda (cnk init src str i end matches fail) i))))
                (lambda (cnk init src str i end matches fail)
+                 (simple-format #t "look-ahead ~S ~A" src i)
                  (if (check cnk init src str i end matches (lambda () #f))
-                     (next cnk init src str i end matches fail)
-                     (fail)))))
+                     (begin (display " -> yes\n")
+                            (next cnk init src str i end matches fail))
+                     (begin (display " -> no\n")
+                            (fail))))))
             ((neg-look-ahead)
              (let ((check
                     (lp (sre-sequence (cdr sre))

and we get:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 3 7) 3 -> yes
look-ahead ("---a---" 4 7) 4 -> no
look-ahead ("---a---" 4 7) 5 -> no
look-ahead ("---a---" 4 7) 6 -> no
look-ahead ("---a---" 4 7) 7 -> no
$1 = "---**a---"

After the first empty match at 3, the current position is only advanced to 3, not to 4 as would happen on the empty match branch, so it is matched again. The empty match test of irregex-fold/fast >>135 is bogus.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3824
It checks that the match end position 'j' equals the start of the search interval 'i'. It can therefore only catch empty matches at the start of the search interval. But empty matches can occur at any later position. To detect them the match end position 'j' must be tested against the match start position. To fix both empty matches and chunk tampering:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-14 10:32:08.239441307 +0000
@@ -3816,16 +3813,20 @@
             (if (not m)
                 (finish i acc)
                 (let ((j (%irregex-match-end-index m 0))
+                      (jstart (%irregex-match-start-index m 0))
                       (acc (kons i m acc)))
                   (irregex-reset-matches! matches)
                   (cond
                    ((flag-set? (irregex-flags irx) ~consumer?)
                     (finish j acc))
-                   ((= j i)
+                   ; ((= j i)
+                   ((= j jstart)
                     ;; skip one char forward if we match the empty string
-                    (lp (list str (+ j 1) end) (+ j 1) acc))
+                    ; (lp (list str (+ j 1) end) (+ j 1) acc))
+                    (lp src (+ j 1) acc))
                    (else
-                    (lp (list str j end) j acc))))))))))
+                    ; (lp (list str j end) j acc))))))))))
+                    (lp src j acc))))))))))
 
 (define (irregex-fold irx kons . args)
   (if (not (procedure? kons)) (error "irregex-fold: not a procedure" kons))
141 2020-07-14 13:39

[2/2]
A rerun of the second line skip test:

scheme@(guile-user)> (irregex-replace/all 'bol "\n\n\n\n\n\n" "start")
bol ("\n\n\n\n\n\n" 0 6) 0 -> yes
bol ("\n\n\n\n\n\n" 0 6) 1 -> yes
bol ("\n\n\n\n\n\n" 0 6) 2 -> yes
bol ("\n\n\n\n\n\n" 0 6) 3 -> yes
bol ("\n\n\n\n\n\n" 0 6) 4 -> yes
bol ("\n\n\n\n\n\n" 0 6) 5 -> yes
$2 = "start\nstart\nstart\nstart\nstart\nstart\n"

There is no match at the very end of the input for the same reason as >>139. Previously there was because irregex-search/backtrack allows looking for matches past the last character, which is used by some branches of sre->procedure as a signal to try to acquire more input.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L1955
A rerun of the double match test:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*---"

The match positions are correct but the output lost the 'a'. This happens because the kons lambda passed by irregex-replace/all to irregex-fold/fast is also incompetent at detecting empty matches.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3880

(define (irregex-replace/all irx str . o)
  (if (not (string? str)) (error "irregex-replace/all: not a string" str))
  (irregex-fold/fast
   irx
   (lambda (i m acc)
     (let* ((m-start (%irregex-match-start-index m 0))
            (res (if (>= i m-start)
                     (append (irregex-apply-match m o) acc)
                     (append (irregex-apply-match m o)
                             (cons (substring str i m-start) acc)))))
       ;; include the skipped char on empty matches
       (if (= i (%irregex-match-end-index m 0))
           (cons (substring str i (+ i 1)) res)
           res)))
   '()
   str
   (lambda (i acc)
     (let ((end (string-length str)))
       (string-cat-reverse (if (>= i end)
                               acc
                               (cons (substring str i end) acc)))))))

Once again the match end is compared to the search start instead of the match start. The fix:

$ TZ=GMT diff -u irregex.scm irregex2.scm
--- irregex.scm	2020-07-13 20:23:49.195645124 +0000
+++ irregex2.scm	2020-07-14 11:52:27.010199230 +0000
@@ -3888,8 +3889,9 @@
                      (append (irregex-apply-match m o)
                              (cons (substring str i m-start) acc)))))
        ;; include the skipped char on empty matches
-       (if (= i (%irregex-match-end-index m 0))
-           (cons (substring str i (+ i 1)) res)
+       (if (and (= m-start (%irregex-match-end-index m 0))
+                (< m-start (string-length str)))
+           (cons (substring str m-start (+ m-start 1)) res)
            res)))
    '()
    str

A rerun of the double match test:

scheme@(guile-user)> (irregex-replace/all "(?=a)" "---a---" "*")
look-ahead ("---a---" 0 7) 0 -> no
look-ahead ("---a---" 0 7) 1 -> no
look-ahead ("---a---" 0 7) 2 -> no
look-ahead ("---a---" 0 7) 3 -> yes
look-ahead ("---a---" 0 7) 4 -> no
look-ahead ("---a---" 0 7) 5 -> no
look-ahead ("---a---" 0 7) 6 -> no
look-ahead ("---a---" 0 7) 7 -> no
$1 = "---*a---"

I expect that the other kons lambdas will have the same bug. These patches allow wrap-end-chunker to work without the hack from >>137. Also, irregex-fold/chunked/fast >>135 has the bugs of irregex-fold/fast and a few more. This entire irregex thing seems to have a worrying amount of undiscovered bugs.

142 2020-07-14 16:15

>>141

This entire irregex thing seems to have a worrying amount of undiscovered bugs.

Is this some important library that many projects depend on?

143 2020-07-14 21:13

There are three users of irregex-fold/fast in irregex.scm: irregex-replace/all, irregex-extract and irregex-split. There is also a public wrapper: irregex-fold. Irregex-replace/all has the double match bug >>140 which has been dealt with. Irregex-extract also has a double match bug:
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3936

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-extract "(?=a)" "---a---")
$1 = ("" "")

but its lambdas are so simple that after the irregex-fold/fast fix >>140 it starts working:

$ guile --no-auto-compile -l irregex2.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-extract "(?=a)" "---a---")
$1 = ("")

The truly strange one is irregex-split.
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/irregex.scm#L3946

(define (irregex-split irx str . o)
  (if (not (string? str)) (error "irregex-split: not a string" str))
  (let ((start (if (pair? o) (car o) 0))
        (end (if (and (pair? o) (pair? (cdr o))) (cadr o) (string-length str))))
    (irregex-fold/fast
     irx
     (lambda (i m a)
       (cond
        ((= i (%irregex-match-end-index m 0))
         ;; empty match, include the skipped char to rejoin in finish
         (cons (string-ref str i) a))
        ((= i (%irregex-match-start-index m 0))
         a)
        (else
         (cons (substring str i (%irregex-match-start-index m 0)) a))))
     '()
     str
     (lambda (i a)
       (let lp ((ls (if (= i end) a (cons (substring str i end) a)))
                (res '())
                (was-char? #f))
         (cond
          ((null? ls) res)
          ((char? (car ls))
           (lp (cdr ls)
               (if (or was-char? (null? res))
                   (cons (string (car ls)) res)
                   (cons (string-append (string (car ls)) (car res))
                         (cdr res)))
               #t))
          (else (lp (cdr ls) (cons (car ls) res) #f)))))
     start
     end)))

It relies on the bug that it will see offset empty matches twice. On the first sighting it adds the prefix, and on the second one it adds the skipped character. The point of the chunk tampering behavior of irregex-fold/fast seems to be to use the chunk starting position as an undeclared part of the kons accumulator for irregex-split. However, with all of its cases and intricate postprocessing, irregex-split is still wrong:

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-split "-" "-a-")
$8 = ("a")
scheme@(guile-user)> (irregex-split "a" "---a")
$9 = ("---")
scheme@(guile-user)> (irregex-split "a" "---aa")
$10 = ("---")
scheme@(guile-user)> (irregex-split "a" "aaa")
$11 = ()
scheme@(guile-user)> (irregex-split 'any "aaa")
$13 = ()

Irregex-split needs a careful fix, but one that relies on the corrected irregex-fold/fast.

>>142

Is this some important library that many projects depend on?

It is the default regex library in some scheme implementations. I cannot speak to its importance and number of dependents. I only started looking at it because it is used in schemebbs and we hit upon a bug in the quotelink regex in the previous thread.

144 2020-07-15 06:38

>>133,135-141,143
This is not your personal journal. Could you please stop spamming or summarize your point? The thread is on what you're doing, not an invitation to write down your every thought.

145 2020-07-15 11:38

>>144
I am already following your advice. This is indeed the summary, far from containing every thought. It has the failing test case, the explanation and the fix. "Your every thought" would evidently also contain failed investigation paths that did not lead to the fix, but those have been omitted here. If you have some specific examples of "your every thought" that could be reasonably removed without detracting from the failing test case, the explanation or the fix, do not hesitate to point them out so that I can improve my future posts. I am fully aware that it is unusual to see programming posted on a programming board, but if you wish to avoid such content the majority of the threads on this board are available to you, since they lack it. You may also find technology boards rather than programming boards to be more to your taste, should you be inclined to give them a try.

146 2020-07-15 15:29 *

>>145
Well I don't know who you're intending to bore here, but every thread you participate in dies, rightfully. Nobody is interested, nobody is interacting with you, nobody cares about your pedantic, pompous, pretentious essays. If you care so much about irregex, send a message to the maintainer, and you absolutely need an audience use github or whatever. Just stop for one moment and reconsider if a half-dead textboard really needs a fedora-tipping nerd annoying everyone with a pointless monologue.

And of course other programming boards don't have nonsense like this, because as the good old saying goes: This is not your personal blog!

147 2020-07-15 16:53 *

>>146

nobody cares about your pedantic, pompous, pretentious essays.

You seem to care at least enough to make some fairly rude remarks. Why not leave the good man to his business instead of freaking out because someone is actually using this half-dead textboard?

148 2020-07-15 19:14 *

>>147
I care that he posts, not what he posts. And I have been ignoring it, every day I visit this site hoping to meet new scheme/lisp enthusiasts, ready to talk about and discuss ideas (you know... a like a dialogue), but instead this guy is all I see. So it his mindless ramblings are tolerated, I just ask for my rant to be given a pass.

149 2020-07-15 19:20 *

>>146
But this is the personal blog thread.

150 2020-07-15 21:10 *

>>149
But is it his personal blog?

151 2020-07-16 07:03 *

>>149
Even if it was, he should 1. post VIP posts 2. not do exactly the same thing in over half the threads on the frontpage.

152 2020-07-16 13:54

Irregex-split has a strange result on the 'bol test >>140 as well, due to the fact that irregex-fold/fast cannot match a 'bol at position 1.

$ guile --no-auto-compile -l irregex.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-split 'bol "\n\n\n\n\n\n")
$1 = ("\n\n" "\n" "\n" "\n" "\n")

So the chunk tampering behavior makes irregex-split >>143 misbehave as well.

>>146

every thread you participate in dies, rightfully. Nobody is interested, nobody is interacting with you

You would have to know which posts are mine to make this assertion. Either you do not possess this information, in which case you are not in a position to make the assertion and it is therefore vacuous, or you do possess this information, in which case you already know the assertion to be false while you are making it. A subset of my posts can be found in the "How can I run my own instance of this" thread, the only thread that maxed out so far, the current thread prior to the irregex session, and the "LISP Puzzles" thread. They received the average amount of interaction for a programming board, except in "LISP Puzzles" where it was a bit higher. Those three threads happen to be the top three threads by post count on this board, so your statement is contradicted by verifiable reality. Your opinion is also not shared by https://textboard.org/prog/39#t39p106 who referred to the first irregex bughunt, which was written in a style that was a lot less terse than the one used for the current bughunt. It also seems a bit strange that you are offering the interaction you claim is missing, even if you are doing it in an off-topic way unrelated to programming. I find it hard to believe that you really cannot perceive the irony of going out of your way to repeatedly raise the post count of a thread I post in while claiming it to be dead.

send a message to the maintainer
use github

Since you claim to know which posts are mine, you already know my stance on accounts, verification and enabling remote code execution, so these suggestions make no sense. See also: https://textboard.org/prog/140#t140p7

your pedantic, pompous, pretentious essays
[...]
Just stop for one moment and reconsider if a half-dead textboard really needs a fedora-tipping nerd annoying everyone with a pointless monologue.

From https://textboard.org/prog/130#t130p29
"""I do not know how long it will take you, but at some point in the future you will realize that insults like these do not communicate any information about the person being insulted, they only communicate a different kind of information about the insult employer's state of mind."""

And of course other programming boards don't have nonsense like this

You can help out by linking what you consider to be the best programming content posted reasonably recently on an anonymous programming board, so I can use it to improve my future posts. You can also easily avoid seeing my posts during this period by simply not opening "SchemeBBS [part 2]", "LISP Puzzles" and the current thread. If the irregex posts specifically elicit this reaction from you, I can help you with that as well. Here is a script that replaces the body of any post containing "irregex" with a warning.

Array.from (document.querySelectorAll ("dl > dd")).filter (e => /irregex/i.test (e.innerHTML)).forEach (e => { e.innerHTML = "<p>Warning: this post may contain trace amounts of irregex.</p>"; })

I hope this helps you have a more pleasant experience during your stay on textboard.org. The script is idempotent so feel free to run it repeatedly for additional peace of mind. Alternatively it may simply be the case that you are looking for a technology board rather than a programming board.

>>151

exactly the same thing

A moment of reflection will allow you to easily perceive the difference between a bug in the schemebbs source code that can abort the running instance, and the fix for that bug, which belong in "SchemeBBS [part 2]", and a class of bugs in one of the libraries schemebbs depends on but which do not hit schemebbs directly, and therefore belong in a second thread, such as "What are you working on?".

over half the threads on the frontpage

Exactly 2/10, soon to be 1/10. I'm sure you understand that there is a point past which a hyperbole will have the exact opposite of its intended effect.

I'm not sure where this need comes from, to post a stream of trivially refutable falsehoods. Perhaps it is the accepted standard of interaction on some other boards. Here, however, the main currency is on-topic programming content. Those who have outlandish reactions to the irregex posts can instead show up the irregex poster simply by posting better written and more focused on-topic programming content. They are more than welcome to do so, and I will enjoy reading it.

153 2020-07-17 06:56 *

>>152
All I'm saying is fix your problem, submit the patch. That's the normal way of doing it. I don't care if you're afraid of github of whatever. Nobody needs your journals, just get to the point.

154 2020-07-17 20:09

Here is the irregex-split fix consistent with the irregex-fold/fast fix >>140 and the irregex-replace/all fix >>141. The accumulator is a vector because that is ashinn's preferred structure for records throughout irregex.scm. Where an edge case could be reasonably decided either way, the existng 0.9.8 behavior on searching for an empty regex >>139 was used as guidance.

(define (irregex-split irx str . o)
  (if (not (string? str)) (error "irregex-split: not a string" str))
  (let ((start (if (pair? o) (car o) 0))
        (end (if (and (pair? o) (pair? (cdr o))) (cadr o) (string-length str))))
    (irregex-fold/fast
     irx
     (lambda (i m a)
       (let* ((msi   (%irregex-match-start-index m 0))
              (mei   (%irregex-match-end-index m 0))
              (empty (= msi mei))
              (lst   (vector-ref a 0))
              (pos   (vector-ref a 1)))
         (cond ((not empty)
                  (vector-set! a 0 (cons (substring str pos msi) lst))
                  (vector-set! a 1 mei)
                  (vector-set! a 2 #f))
               ((< pos msi)
                  (vector-set! a 0 (cons (substring str pos msi) lst))
                  (vector-set! a 1 msi)
                  (vector-set! a 2 #t))
               ((> pos start)
                  (vector-set! a 0 (cons "" lst))
                  (vector-set! a 2 #t))
               (else
                  (vector-set! a 2 #t)))
         a))
     (let ((acc (make-vector 3)))
       (vector-set! acc 0 '())
       (vector-set! acc 1 start)
       (vector-set! acc 2 #t)
       acc)
     str
     (lambda (i a)
       (let* ((lst   (vector-ref a 0))
              (pos   (vector-ref a 1))
              (empty (vector-ref a 2)))
         (reverse (cond
           ((< pos end)
              (cons (substring str pos end) lst))
           (empty lst)
           (else  (cons "" lst))))))
     start
     end)))

The key is to decouple the match search position used internally by irregex-fold/fast from the split position used internally by irregex-split. The failing tests from >>143 are corrected:

$ guile --no-auto-compile -l irregex2.scm
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-split "-" "-a-")
$1 = ("" "a" "")
scheme@(guile-user)> (irregex-split "a" "---a")
$2 = ("---" "")
scheme@(guile-user)> (irregex-split "a" "---aa")
$3 = ("---" "" "")
scheme@(guile-user)> (irregex-split "a" "aaa")
$4 = ("" "" "" "")
scheme@(guile-user)> (irregex-split 'any "aaa")
$5 = ("" "" "" "")

The 'bol test >>152 is also corrected:

scheme@(guile-user)> (irregex-split 'bol "\n\n\n\n\n\n")
$6 = ("\n" "\n" "\n" "\n" "\n" "\n")

And some more edge cases:

scheme@(guile-user)> (irregex-split "" "")
$7 = ()
scheme@(guile-user)> (irregex-split "" "aaa")
$8 = ("a" "a" "a")
scheme@(guile-user)> (irregex-split 'bos "aaa")
$9 = ("aaa")
scheme@(guile-user)> (irregex-split 'eos "aaa")
$10 = ("aaa")
scheme@(guile-user)> (irregex-split "(?<=a)" "aaa")
$11 = ("a" "a" "a")
scheme@(guile-user)> (irregex-split "(?=a)" "aaa")
$12 = ("a" "a" "a")

The three fixes can be applied as a group, and the 0.9.8 hack to wrap-end-chunker >>137 can be reverted.

>>153

just get to the point

Already answered in >>145.

journals

1. Already answered in >>145.
2. You may not be familiar with the opening post of the thread you choose to visit of your own free will:

This is the type of thread that you can write in every time you visit the site! You can do anything from use this as a personal log for your project, to sharing what you're hacking on at the moment, or to critique something some else did in the thread. I'm looking forward to hearing what you guys are up to!

155 2020-07-19 08:11 *

>>154
You're hopeless, and appear not to even realize it. But who cares anymore? Why even bother to make you reconsider your apparent autism? Hope at least you have fun with your thing, because if you then nobody else will.

156 2020-07-19 09:57

>>156

You're hopeless, and appear not to even realize it. But who cares anymore? Why even bother to make you reconsider your apparent autism?

Answered in >>152 and https://textboard.org/prog/130#t130p29
"""I do not know how long it will take you, but at some point in the future you will realize that insults like these do not communicate any information about the person being insulted, they only communicate a different kind of information about the insult employer's state of mind."""

Hope at least you have fun with your thing, because if you then nobody else will.

Answeed in >>152:
1. "the top three threads by post count"
2. "Your opinion is also not shared by https://textboard.org/prog/39#t39p106 who [...]"
3. """I find it hard to believe that you really cannot perceive the irony of going out of your way to repeatedly raise the post count of a thread I post in while claiming it to be dead."""

Another irregex aborting bug will follow shortly, I just need to reduce the size of the test case.

157 2020-07-19 09:59 *

Meant for >>155, of course.

158 2020-07-20 02:50

As pointed out at the end of >>141 irregex-fold/chunked/fast >>135 has the bugs of irregex-fold/fast and a few more.
- Because of its placement kons doesn't see every match.
- Because of its placement irregex-reset-matches! is not called on every round.
- Because of its placement the ~consumer? test can miss rounds.
- The proper order on the match branch is: kons -> reset-matches! -> ~consumer? -> empty -> else, as used in irregex-fold/fast >>135.
- The empty match test is bogus. It compares the match end index with the search start index instead of the match start index.
- The 'next' chunk is not checked for existence, so the start index of a non-existent chunk may be requested from the chunker. Depending on the chunker implementation this may abort.

The last point can be demonstrated on the rope chunker from the documentation, which is not part of the library proper but is present in the test suite.
http://synthcode.com/scheme/irregex/#SECTION_3.4
https://github.com/ashinn/irregex/blob/ac27338c5b490d19624c30d787c78bbfa45e1f11/test-irregex.scm#L122

$ cat attack.scm 
(define (rope . args)
  (map (lambda (x) (if (pair? x) x (list x 0 (string-length x)))) args))

(define rope-chunker
  (make-irregex-chunker
   (lambda (x) (and (pair? (cdr x)) (cdr x)))
   caar
   cadar
   caddar
   (lambda (src1 i src2 j)
     (if (eq? src1 src2)
         (substring (caar src1) i j)
         (let lp ((src (cdr src1))
                  (res (list (substring (caar src1) i (caddar src1)))))
           (if (eq? src src2)
               (string-join
                (reverse (cons (substring (caar src2) (cadar src2) j) res))
                "")
               (lp (cdr src)
                   (cons (substring (caar src) (cadar src) (caddar src))
                         res))))))))

$ guile --no-auto-compile -l irregex.scm -l attack.scm 
GNU Guile 2.2.3
[...]
scheme@(guile-user)> (irregex-fold/chunked '(or "a\n" bol) (lambda (start i m acc) acc) '() rope-chunker (rope "a\n"))
ERROR: In procedure cadar:
In procedure cadar: Wrong type (expecting pair): #f

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> 

So this is another attack vector for irregex. Evidently the 'next' chunk must be checked for existence.

159 2020-07-20 15:30 *

>>152
You don't have a job do you? I can't imagine how you'd interact with other people.

160 2020-07-21 13:29

Section 3.1 of the irregex documentation specifies the meaning of the index parameter of the kons lambda taken by irregex-fold:
http://synthcode.com/scheme/irregex/#SECTION_3.1

(irregex-fold <irx> <kons> <knil> <str> [<finish> <start> <end>])
This performs a fold operation over every non-overlapping place <irx> occurs in the string str.
The <kons> procedure takes the following signature:
(<kons> <from-index> <match> <seed>)
where <from-index> is the index from where we started searching (initially <start> and thereafter the end index of the last match), <match> is the resulting match-data object, and <seed> is the accumulated fold result starting with <knil>.

Equating the search start index with the last match end index is false, as the documentation notes a bit further on:

Note if an empty match is found <kons> will be called on that empty string, and to avoid an infinite loop matching will resume at the next char. It is up to the programmer to do something sensible with the skipped char in this case.

This is a poor design choice, because it effectively mandates that all nontrivial kons lambdas keep track of their own last match end index, as the fixed irregex-split >>154 does. With author powers to change the public contract the proper design choice would be to switch the meaning of the index parameter of the kons to the last match end index. Irregex-fold/fast would keep track internally of both the search start index and the last match end index, and would pass the former to searches and the latter to 'kons' and 'finish'. This would simplify empty match handling, as well as irregex-replace/all and irregex-split. But this is outside the scope of a mere bugfix as it requires a change to the public contract, and is therefore something only ashinn can do.

With the group of fixes for the fold family >>140 >>141 >>154 and the bonus aborting attack >>158, I think this should be enough material for the second irregex bughunt. The first irregex bughunt, with the leftmost longest violations from no backtracking and the bonus DoS attacks, can be found in the group of posts starting at https://textboard.org/prog/39#t39p100 as well as a few preceding posts in that thread.

>>159

You don't have a job do you?

I work in academia and have for several decades. In all those decades, however, I've never had such a devoted follower. What's your day job, Anon?

I can't imagine how you'd interact with other people.

You can easily glean the answer by comparing the two sides of the exchange: without any need to resort to insults, and using elements of reality instead of "a stream of trivially refutable falsehoods". >>152

161 2020-07-21 17:53

>>160
>>159 wasn't me (though he is right). I gave up after >>155.

162 2020-07-23 18:37 *

What happened to our resident boomer?

163 2020-07-25 14:46 *

I don't understand what problems people have with the write-ups from >>160

I personally enjoy reading them, and his writing style.

164 2020-07-26 10:05 *

>>163
Thanks. If you have some constructive critique or spot some programming error feel free to suggest improvements.

165 2020-12-31 15:10

When trying to perform integer division of some number n by some divisor d, we may instead wish to approximate this with multiplication by some magic factor followed by integer division with a power of the base of the number system of our hardware. The last step can be performed using a right shift operation, and the multiply-shift combination is likely to be faster than performing integer division. For example our base might be 2 and our divisor 10, as when trying to produce the decimal string representing our number n. The correctness condition we impose is that we must obtain the same result as if we performed the integer division. Since the right shift operation rounds downward on positive integers, our approximation must be no less than the result of the integer division, and may only be greater by an amount that does not overflow into the result for the successor integer. As an example, we cannot allow the result of the integer division 20//10 to be approximated by anything less since downward rounding will follow, and we cannot allow the approximation for the integer division 29//10 to reach as high as 30//10. The correctness condition can be expressed as:

n    n * magic   n + 1
- <= --------- < -----
d    basepower     d  

Taking the cross product of the first inequality yields

basepower <= d * magic

The n has simplified away so this inequality can be made to hold regardless of n. We are left with a d-multiple that is no less than basepower. As an example we might take basepower to be 2^16=65536 and magic to be 6554 which gives

n // 10 <= n * 6554 // 65536

The second inequality yields

n * magic * d < basepower * (n + 1)

Since magic * d is a d-multiple that is no less than basepower, we make this explicit by writing

magic * d = basepower + over

Substitution and simplification yield

n * over < basepower

As an example we might have

6554 * 10 = 65536 + 4

which yields

n * 4 < 65536

Several conclusions can be drawn. One is that minimizing 'over' helps satisfy this inequality. This is equivalent to choosing magic * d to be the smallest d-multiple no less than basepower. Another conclusion is that if 'over' is not 0, it will be at least 1 and thus basepower will be greater than n. The main conclusion, however, is that n has not simplified away and therefore in the general case the inequality will not hold for arbitrary n. For any choice of magic factor and basepower where 'over' is not 0, we can always find a large enough n to break this inequality. It is for this reason that the process of picking an appropriate magic factor and basepower does not depend merely on the base and the divisor, it also depends on the maximum value of n upto which we expect our method to work. To go over this limit we need to find a higher basepower and magic factor to satisfy both inequaliies of the correctness condition. This strategy of iterative search can be implemented as follows:

def diim_basebits (base, target, strict):
   shift = 1
   exp   = base

   while exp < target:
      exp   *= base
      shift += 1

   if strict and (exp == target):
      exp   *= base
      shift += 1

   return (shift, exp)

def work_diim (base, div, limit, *, debug = False):
   msg = print

   if debug:
      msg ("base {:,d} div {:,d} limit {:,d}".format (
         base, div, limit))

   shift, exp = diim_basebits (base, div, False)
   more       = True
   tries      = 0

   while more:
      tries   += 1
      exptodiv = divmod (exp, div)

      if exptodiv [1]:
         factor = exptodiv [0] + 1
         over   = div - exptodiv [1]
      else:
         factor = exptodiv [0]
         over   = 0

      product = over * limit

      if debug:
         msg ("try {:,d} shift {:,d} exp {:,d}".format (
            tries, shift, exp))
         msg ("  factor {:,d} over {:,d} product {:,d}".format (
            factor, over, product))

      if product < exp:
         more = False
      else:
         exp   *= base
         shift += 1

   product     = limit * factor
   pbits, pexp = diim_basebits (base, product, True)

   if debug:
      msg ("x div {:,d} -> (x * {:,d}) >>({:,d}) {:,d}".format (
         div, factor, base, shift))
      msg ("  basebits {:,d} max {:,d}".format (
         pbits, product))

   return {
      "factor":   factor,
      "shift":    shift,
      "basebits": pbits,
   }
166 2020-12-31 15:12

This is an inner working function called by a wrapper that takes care of invalid arguments. An example that divides an u16 by 10:

base 2 div 10 limit 65,535
try 1 shift 4 exp 16
  factor 2 over 4 product 262,140
try 2 shift 5 exp 32
  factor 4 over 8 product 524,280
try 3 shift 6 exp 64
  factor 7 over 6 product 393,210
try 4 shift 7 exp 128
  factor 13 over 2 product 131,070
try 5 shift 8 exp 256
  factor 26 over 4 product 262,140
try 6 shift 9 exp 512
  factor 52 over 8 product 524,280
try 7 shift 10 exp 1,024
  factor 103 over 6 product 393,210
try 8 shift 11 exp 2,048
  factor 205 over 2 product 131,070
try 9 shift 12 exp 4,096
  factor 410 over 4 product 262,140
try 10 shift 13 exp 8,192
  factor 820 over 8 product 524,280
try 11 shift 14 exp 16,384
  factor 1,639 over 6 product 393,210
try 12 shift 15 exp 32,768
  factor 3,277 over 2 product 131,070
try 13 shift 16 exp 65,536
  factor 6,554 over 4 product 262,140
try 14 shift 17 exp 131,072
  factor 13,108 over 8 product 524,280
try 15 shift 18 exp 262,144
  factor 26,215 over 6 product 393,210
try 16 shift 19 exp 524,288
  factor 52,429 over 2 product 131,070
x div 10 -> (x * 52,429) >>(2) 19
  basebits 32 max 3,435,934,515

To divide an u16 by 10 we can multiply by 52,429 then right shift by 19, and we need 32 bits for the multiplication. One can occasionally find these constants in library code that generates decimal representations. Examples:

>>> (12345 * 52429) >> 19
1234
>>> (2020 * 52429) >> 19
202
>>> (1337 * 52429) >> 19
133
167 2020-12-31 15:14

For an u32:

base 2 div 10 limit 4,294,967,295
try 1 shift 4 exp 16
  factor 2 over 4 product 17,179,869,180
try 2 shift 5 exp 32
  factor 4 over 8 product 34,359,738,360
try 3 shift 6 exp 64
  factor 7 over 6 product 25,769,803,770
try 4 shift 7 exp 128
  factor 13 over 2 product 8,589,934,590
try 5 shift 8 exp 256
  factor 26 over 4 product 17,179,869,180
try 6 shift 9 exp 512
  factor 52 over 8 product 34,359,738,360
try 7 shift 10 exp 1,024
  factor 103 over 6 product 25,769,803,770
try 8 shift 11 exp 2,048
  factor 205 over 2 product 8,589,934,590
try 9 shift 12 exp 4,096
  factor 410 over 4 product 17,179,869,180
try 10 shift 13 exp 8,192
  factor 820 over 8 product 34,359,738,360
try 11 shift 14 exp 16,384
  factor 1,639 over 6 product 25,769,803,770
try 12 shift 15 exp 32,768
  factor 3,277 over 2 product 8,589,934,590
try 13 shift 16 exp 65,536
  factor 6,554 over 4 product 17,179,869,180
try 14 shift 17 exp 131,072
  factor 13,108 over 8 product 34,359,738,360
try 15 shift 18 exp 262,144
  factor 26,215 over 6 product 25,769,803,770
try 16 shift 19 exp 524,288
  factor 52,429 over 2 product 8,589,934,590
try 17 shift 20 exp 1,048,576
  factor 104,858 over 4 product 17,179,869,180
try 18 shift 21 exp 2,097,152
  factor 209,716 over 8 product 34,359,738,360
try 19 shift 22 exp 4,194,304
  factor 419,431 over 6 product 25,769,803,770
try 20 shift 23 exp 8,388,608
  factor 838,861 over 2 product 8,589,934,590
try 21 shift 24 exp 16,777,216
  factor 1,677,722 over 4 product 17,179,869,180
try 22 shift 25 exp 33,554,432
  factor 3,355,444 over 8 product 34,359,738,360
try 23 shift 26 exp 67,108,864
  factor 6,710,887 over 6 product 25,769,803,770
try 24 shift 27 exp 134,217,728
  factor 13,421,773 over 2 product 8,589,934,590
try 25 shift 28 exp 268,435,456
  factor 26,843,546 over 4 product 17,179,869,180
try 26 shift 29 exp 536,870,912
  factor 53,687,092 over 8 product 34,359,738,360
try 27 shift 30 exp 1,073,741,824
  factor 107,374,183 over 6 product 25,769,803,770
try 28 shift 31 exp 2,147,483,648
  factor 214,748,365 over 2 product 8,589,934,590
try 29 shift 32 exp 4,294,967,296
  factor 429,496,730 over 4 product 17,179,869,180
try 30 shift 33 exp 8,589,934,592
  factor 858,993,460 over 8 product 34,359,738,360
try 31 shift 34 exp 17,179,869,184
  factor 1,717,986,919 over 6 product 25,769,803,770
try 32 shift 35 exp 34,359,738,368
  factor 3,435,973,837 over 2 product 8,589,934,590
x div 10 -> (x * 3,435,973,837) >>(2) 35
  basebits 64 max 14,757,395,256,390,660,915

we can multiply by 3,435,973,837 then right shift by 35, and we need 64 bits for the multiplication. Examples:

>>> (123456789 * 3435973837) >> 35
12345678
>>> (20202020 * 3435973837) >> 35
2020202
>>> (13371337 * 3435973837) >> 35
1337133
168 2020-12-31 15:16

To divide an u32 by 3:

base 2 div 3 limit 4,294,967,295
try 1 shift 2 exp 4
  factor 2 over 2 product 8,589,934,590
try 2 shift 3 exp 8
  factor 3 over 1 product 4,294,967,295
try 3 shift 4 exp 16
  factor 6 over 2 product 8,589,934,590
try 4 shift 5 exp 32
  factor 11 over 1 product 4,294,967,295
try 5 shift 6 exp 64
  factor 22 over 2 product 8,589,934,590
try 6 shift 7 exp 128
  factor 43 over 1 product 4,294,967,295
try 7 shift 8 exp 256
  factor 86 over 2 product 8,589,934,590
try 8 shift 9 exp 512
  factor 171 over 1 product 4,294,967,295
try 9 shift 10 exp 1,024
  factor 342 over 2 product 8,589,934,590
try 10 shift 11 exp 2,048
  factor 683 over 1 product 4,294,967,295
try 11 shift 12 exp 4,096
  factor 1,366 over 2 product 8,589,934,590
try 12 shift 13 exp 8,192
  factor 2,731 over 1 product 4,294,967,295
try 13 shift 14 exp 16,384
  factor 5,462 over 2 product 8,589,934,590
try 14 shift 15 exp 32,768
  factor 10,923 over 1 product 4,294,967,295
try 15 shift 16 exp 65,536
  factor 21,846 over 2 product 8,589,934,590
try 16 shift 17 exp 131,072
  factor 43,691 over 1 product 4,294,967,295
try 17 shift 18 exp 262,144
  factor 87,382 over 2 product 8,589,934,590
try 18 shift 19 exp 524,288
  factor 174,763 over 1 product 4,294,967,295
try 19 shift 20 exp 1,048,576
  factor 349,526 over 2 product 8,589,934,590
try 20 shift 21 exp 2,097,152
  factor 699,051 over 1 product 4,294,967,295
try 21 shift 22 exp 4,194,304
  factor 1,398,102 over 2 product 8,589,934,590
try 22 shift 23 exp 8,388,608
  factor 2,796,203 over 1 product 4,294,967,295
try 23 shift 24 exp 16,777,216
  factor 5,592,406 over 2 product 8,589,934,590
try 24 shift 25 exp 33,554,432
  factor 11,184,811 over 1 product 4,294,967,295
try 25 shift 26 exp 67,108,864
  factor 22,369,622 over 2 product 8,589,934,590
try 26 shift 27 exp 134,217,728
  factor 44,739,243 over 1 product 4,294,967,295
try 27 shift 28 exp 268,435,456
  factor 89,478,486 over 2 product 8,589,934,590
try 28 shift 29 exp 536,870,912
  factor 178,956,971 over 1 product 4,294,967,295
try 29 shift 30 exp 1,073,741,824
  factor 357,913,942 over 2 product 8,589,934,590
try 30 shift 31 exp 2,147,483,648
  factor 715,827,883 over 1 product 4,294,967,295
try 31 shift 32 exp 4,294,967,296
  factor 1,431,655,766 over 2 product 8,589,934,590
try 32 shift 33 exp 8,589,934,592
  factor 2,863,311,531 over 1 product 4,294,967,295
x div 3 -> (x * 2,863,311,531) >>(2) 33
  basebits 64 max 12,297,829,381,041,378,645

the magic factor is:

>>> hex (2863311531)
'0xaaaaaaab'

Examples:

>>> (123456789 * 2863311531) >> 33
41152263
>>> 123456789 // 3
41152263
>>> (20202020 * 2863311531) >> 33
6734006
>>> 20202020 // 3
6734006
>>> (13371337 * 2863311531) >> 33
4457112
>>> 13371337 // 3
4457112
169 2020-12-31 15:18

If we have a ternary machine and we want to produce hex strings of numbers up to a million:

base 3 div 16 limit 1,000,000
try 1 shift 3 exp 27
  factor 2 over 5 product 5,000,000
try 2 shift 4 exp 81
  factor 6 over 15 product 15,000,000
try 3 shift 5 exp 243
  factor 16 over 13 product 13,000,000
try 4 shift 6 exp 729
  factor 46 over 7 product 7,000,000
try 5 shift 7 exp 2,187
  factor 137 over 5 product 5,000,000
try 6 shift 8 exp 6,561
  factor 411 over 15 product 15,000,000
try 7 shift 9 exp 19,683
  factor 1,231 over 13 product 13,000,000
try 8 shift 10 exp 59,049
  factor 3,691 over 7 product 7,000,000
try 9 shift 11 exp 177,147
  factor 11,072 over 5 product 5,000,000
try 10 shift 12 exp 531,441
  factor 33,216 over 15 product 15,000,000
try 11 shift 13 exp 1,594,323
  factor 99,646 over 13 product 13,000,000
try 12 shift 14 exp 4,782,969
  factor 298,936 over 7 product 7,000,000
try 13 shift 15 exp 14,348,907
  factor 896,807 over 5 product 5,000,000
x div 16 -> (x * 896,807) >>(3) 15
  basebits 26 max 896,807,000,000

we can multiply by 896,807 then right shift by 15 ternary positions. The multiplication requires 26 ternary positions. Examples:

>>> (123456 * 896807) // (3 ** 15)
7716
>>> 123456 // 16
7716
>>> (202020 * 896807) // (3 ** 15)
12626
>>> 202020 // 16
12626
>>> (133713 * 896807) // (3 ** 15)
8357
>>> 133713 // 16
8357

If we meet a time traveler from ancient Sumer who has a base-60 machine, and we want to produce hex strings of numbers up to a million:

base 60 div 16 limit 1,000,000
try 1 shift 1 exp 60
  factor 4 over 4 product 4,000,000
try 2 shift 2 exp 3,600
  factor 225 over 0 product 0
x div 16 -> (x * 225) >>(60) 2
  basebits 5 max 225,000,000

The final 'over' being 0 shows that this will work for arbitrarily large numbers, provided that we have the space for the multiplication. Examples:

>>> (123456 * 225) // (60 ** 2)
7716
>>> 123456 // 16
7716
>>> (202020 * 225) // (60 ** 2)
12626
>>> 202020 // 16
12626
>>> (133713 * 225) // (60 ** 2)
8357
>>> 133713 // 16
8357

The advantage of this iterative method is that it allows the problem to be tackled using only basic knowledge of fractions, without having to resort to rings and modular inverses. Here is a small challenge. Try to find a necessary and sufficient condition for the base and the divisor so that the final 'over' value will be 0. Once you have the condition try to write some code that implements it with some reasonable efficiency.

170 2020-12-31 18:09 *

tldr

171 2021-01-05 12:23

Wondering what user space should look like in an OS that does not force every user space program to be its own mini-kernel the way that Linux does.

172 2021-01-05 22:58

>>171
wait it does that?

173 2021-01-06 10:26

>>172
It's an oversimplification, but the operating system does basically abstract itself away for each process, and let it operate as though it were the only program running on a system. "mini-kernel" is a bit too much, becuase I/O and other system operations are still handeled by the Kernel.

174 2021-01-06 11:02

>>172
It is kind of hyperbole, but also kind of true. What is a kernel? It is a software that switches between tasks and handles interrupts, manages memory, manages access to devices and manages multiple-processor systems. A simple program is one that just works with one file descriptor at a time and performs simple syscalls. A program that is not simple will need to solve one or more of the aforementioned problems itself.

A bare Linux process is more bare than a bare machine, but it has analogues for everything that exists on a bare machine, and all that Linux gives you is a common language for different hardwares. It does not solve the actual problems for you.

An analogograph: interrupt → signal; cooperative scheduler → event loop; hardware interval timer → timer_create; preemptive scheduler → stack-switching using a timer signal; multi-processor system → kernel threads; device access → file descriptors & mmap()'d files; DMA-based networking/disk → io_uring; page table → precisely what mmap() manipulates; device control & interprocessor interrupts → syscalls.

But at least Linux gets you user isolation, a shared file system and TCP/IP. And Docker, to undo much of that...

175 2021-01-06 12:21

>>174
tbf i thought that's how all kernels did it

176 2021-01-06 16:26

>>175
I think he refers to user-mode drivers/filesystems/daemons/etc
which are small OS layers. Normal programs don't utilize this.

177


VIP:

do not edit these