This article is the third section of “The hard argument against LLMs being AGI” essay series.
In previous section we used the idea of algorithmic fingerprint to demonstrate and explain how Few-Shot Learning LLMs actually work. Then we established that “cognition” of LLMs is a delusion from misunderstanding how Few-Shot Learning works and caused by the Observer Effect, which is already familiar to philosophy of science from quantum mechanics and many other problems. Then we established that LLMs are dissimilar to humans, AGI and human culture, because it is not adaptive, but rather just a static model after it has been pretrained.
In this section I will introduce three aspects of semantic coherence. A computational grammar that is capable of sentence level semantic coherence from natural language (with quantum computers only), how narrative understanding and narrative memory works according to narratology and how fractal dimensionality changes of text segments (in relationship to general Zipf’s Law-like presentation) can detect story elements in narratives.
With traditional computers today we have two options. Either we can build fast CPU’s which are good at calculating depth-first algorithms or we can build arrays of GPU’s, which are good at calculating breadth-first algorithms. Quantum computers are in both domains simultaneously. Another distinction is that bits have discrete values, while qubits are continuously valued. With traditional computers we can emulate more complex logic than Turing Machine is able to calculate, but that is either slow or unreliable (often both).
Algorithms must be expressed with specific formalism known as computational grammar, which is essentially always a form of pattern matching. Often computers today do things, which we wouldn’t consider as pattern matching, but that is just because of the layers and layers of hidden complexity. Semantic composition of natural language sentences is known to be possible with Categorical Grammar, but those only work with quantum computers. Considering LLMs being AGI claim this doesn’t look promising if we accept that semantic coherence is a required feature (I will talk more about justification of this at the existential semiotics part). I was first introduced to Categorical Grammar at the Quantum Natural Language Processing (QNLP) conference of 2022, but it was already developed in the 1930s; before the first transistor computers. There is a curious connection between Categorical Grammar and functional programming (they are both based on Curry-Howard-Lambek correspondence), especially Haskel but I will not go there here.
QNLP researcher Bob Coecke and his team has developed a DisCoCat model of language, which utilizes the fact that language is distributional, compositional and categorical. Transformers are only able to model distributional hypothesis with (very complex) n-grams. They do not have access to composition or categorical aspects natively (those features have to be emulated, which makes them impractical for scaling).
I have been investigating fictional narrative generation models since 2016 and building some very small prototypes of ideas myself. However, that hobby has made me review a lot of literature, reading papers and playing with prototypes 10 hours a week on average ever since (in addition to working in the field of data science and engineering 40 hours a week). I didn’t focus on the problem of generating syntactically coherent language, because I predicted for myself in 2017 that this problem will get solved by Google and other players within five years. Instead I focused on what I needed after that to produce fictional narratives. Goal orientation is a really useful feature for narratives, but goal oriented problem solvers tend to be combinatorially explosive, rendering them useless with classic computers, unless I want to hard code all the rules.
The mathematical properties of Zipf’s Law were interesting (also this paper and this paper; the latter paper kind of combines the Zipf’s Law to epistemic memory about which I will tell more about soon). Another key idea I discovered from Fabula.net research team was that fractal dimensionality changes can predict plot changes in fiction. This paper with Carlos Leon’s paper on narrative memory kind of connected these two ideas. When Bob Coecke introduced his paper on DisCoCirc (DisCoCat is for sentences, DisCoCirc is for paragraphs and larger texts), I noticed that it is compatible with Leon’s model. So what is going on here?
When we consider the grammatical categories of most popular words according to Zipf’s Law, they tend to be grammatical. Nouns rank less high in fractal dimension and they tend to move around the scale based on the context. Often when we consider about the meaning of words, we actually mean the associativity of nouns (sometimes verbs too). When we listen to narratives, the grammar provides a low attention frame from which we have to pick up what is the associative meaning of nouns used by the agent uttering for us. If we move along a narrative sequence when the fractal dimensionality of words changes it is a good implicator of change in topicality.
Next two paragraphs are a bit thick, don’t worry if you don’t fully grasp it now (we will return to the details in next section). The image and the caption text above might give you some support.
The Fabula.net research team proved that we can have fractal analysis based methods for analysing narrative structures of fiction, but only if we know the theme of the book and follow the fractal changes of related (bag of) words. In their experiments they followed the sentimental changes of a novel, which had sentimentality as its core theme. However, in Leon’s model of narrative memory he explicitly explains how the cognitive parts work. We have kind of “a scaffolding for representing how narratives are produce from the implicit procedural memory to the explicit procedural–semantic memory”.
In other words there the implicit procedural memory can roughly be understood as the noumenological subconsious aspects of narrative understanding (semantic composition of narrative) and the explicitly procedural-semantic memory can be understood as the phenomenological conscious aspects of narratives (distributional semantics). The phenomenological consciousness works with artifacts that can be narratively articulated, such as words. The procedural-semantic memory does semantic composition from episodic memories (joining the dots as we listen or read a sequence of events).
The key takeaway is that there are two different processes, the semantic composition and distributional semantics of which the former is considered linguistically noumenological and the latter is considered as linguistically phenomenological. However, neuroscientists believe that human beings do the phenomenological and noumenological processing simultaneously. We can not separate the associativity of words from our subconsciousness. In the last section I will show how this relates back to AGI.