[ prog / sol / mona ]

prog


Everything is Unicode, until the exploits started rolling in

50 2021-01-29 15:33

I don't know if an encoding would be a good way to go about this. It feels like confusing the map with the territory.
A formal or semi-formal grammar enforced on top of a sane character encoding could do that job just as well, by parsing (and maybe tokenizing!) the text.
We have always separated jobs through layers of abstraction, in this case written text is at the lowest level, and encoding single characters is the simplest solution, even if better devices could be developed. I say characters are the simplest solution (and less prone to error) because of morphological and grammatical rules regarding case, conjugation, word composition, etc.
What we really transmit through text is ideas, the content of our text is semantic. But semantics are elusive and always changing. Some people don't seem to understand that a word's meaning is not entirely the same across time. At any rate, the lexical shape of text is of minor importance. Enforcing a spelling or limiting the play on words such as puns, intentional misspellings, agglutination, morphological shifts, coinage of words such as 'smog' (smoke-fog), etc, would choke natural language. It would work marvels for formal writing, such as documents expeded by an authority, legal contracts and whatnot. But for most writing, especially informal writing on the internet, it would be too constraining.
On the other hand, I would really love to see a sort of database of words implemented as a network of links both etymological and semantic. Thinking of it, the entries would be abstract concepts and words would be the output, the specific spelling would be the end-product of a combination of all semantic components and grammatical/morphological rules, where available (a character set being but a set of symbols to perform an output, kind of what pixels would be to a mathematical function(plot)). For instance, you could have "walk, walked" as a set of outputs (an infinitive and a perfective), and "tread, treaded" as another with slightly different semamtics.
Another thought just occurred to me. In order to more accurately reflect how language actually works, the database wouldn't be in the hands of an authority, but it would be distributed much in the spirit that has spawned descentralized systems such as ipfs. Nodes would be able to coin new words, expressions, gramma rules (like gender-neutral pronouns), and they would be adopted on demand by those who agree and not imposed by an authority.
Consider gender neutral pronouns. If the authority enforces them, half the populace will not be happy, if the authority decides not to, the other half will be unhappy (granted, perhaps 60% of the population don't actually care, so it's really 20%/20%).
Eventually languages will diverge as they have since the dawn of time into dialects and subfamilies, with the "encoding" reflecting not only that, but also their evolution and thus proving resilient to time.
It is better to adopt a solution that has a better chance to persist over the long term than one that is bound to a specific decade or century.

51


VIP:

do not edit these