This article is the fourth and final section of “The hard argument against LLMs being AGI” essay series.
In previous section we established that there is semantic coherence, which is harder computational problem than syntactic coherence of language. While low level semantics can be discovered by LLMs through the distributional hypothesis (or distributional semantics) more complex level cognition of human narratives tends to need more complex computational logic for uncovering the meaning of words. Words themselves and their distributions are considered as phenomenological in linguistics and their meanings within specific contexts as noumenological. In neuroscience it is believed that while LLMs are limited to the phenomenological world alone, human beings travel both worlds simultaneously. In this section we wrap this idea back to the Connectionistic doctrine.
In 1970’s or something like that Hubert Dreyfus came up with the idea of Connectionism, where he claimed that complex intelligence can emerge from complex networks by just scaling their size. He was proposing neural networks as an alternative to the symbolic AI paradigm, but got ridiculed by Cognitivists, like Chomsky and Fodor (the computational side of Cognitivism is known as Computationalism). Later in 1980s Dreyfus’s ideas start to gather empirical evidence and soon start surpassing the success of Cognitivistic AI paradigm. Then comes Geoffrey Hinton and several others and we suddenly have a connection between natural language processing and neural networks after sufficient computational prowess emerges.
Connectionism was the foundational base for early post-Cognitivism. Modern post-Cognitivism considers that the debate between Connectionism and Cognitivism to be largely based on whether in some situations we should use dynamic systems thinking instead of symbolic stuff. However, Connectionism includes the idea that if we just scale neural networks, full sentience will emerge. Modern post-Cognitivism tends to think that Connectionism works only at the lowest level of cognition of algebraic logical problem solving (when we simulate the solution with articulation), but the higher levels are qualitatively different and existential.
Considering the end of the last section, there also exists some theories of how this phenomenological and neumenological simultaneousity happens. Humans need to get immersed in the narrative in order to connect the dots of meaning. In Game Design this happens through dopamine loops. In text we have sentences, paragraphs and chapters, which ought to follow similar loops. When the dopamine rhythms go off, we often get distracted and have to reread something, because we didn’t connect the dots. Movies follow cinematographic grammar, which replicates sentences (shots), paragraphs (scenes) and chapters (milieu; often “long cut” like fade to black etc.), same goes to music (this is what I believe Eero Tarasti has researched based on conversations with one of his mentees; not sure if he has published about it; I haven’t had time to read it).
When humans encode solution models, we first need the dopamine reward to trigger the event, which then enables something to be stored to the long-term memory. This happens through something implicit in our subconsciousness like “this is how this kind of problem feels like”, all the while our consciousness ties the explicit relevant phenomenological pattern to our sensory experiences. This intuition providing feeling gives humans the ability to generalize knowledge from one problem domain to another. This human problem solving method is not causally functional because it is from the rationalism of distributional semantics, but it is causally functional because it is existential as in something that is different from distributional semantics in noumenological sense.
What that means is that human beings and other animals try to adapt to their environment and either move to a better feeling environment or reshape the environment to feel better. There is a close connection to human culture and the kind of history we have built. For example non-mammal animals might never develop a cultural religion that is based on “All knowing Father / Mother god of all gods”, because that comes from mammals looking after their pupils and when we grow up, we still want to believe that such authority and guardian exists, because the world without one is a scary place for mammals.
When we decode solution models, the sensory phenomenology evokes emotion-like patterns, which create a posteriori by which we choose one way of articulating the problem. Then we first simulate the steps in our consciousness, then we enact it and if we fail, the process starts over. This is how Eero Tarasti considers signs in his existential semiotics. The noumenological meaning comes from posteriori and the a priori phenomenological word alone doesn’t have it. LLMs do nothing like this. The posteriori of some complex AI system can not have explicit access to the human existential posteriori through text corpuses as a dataset. Some of that might be learnable implicitly, but in such situation we might only gain epistemic access to macro level of meanings (in general), but not able to distill the micro level of meaning (in context). For this reason AGI might be impossible even with Categorical Grammars and similar quantum computing stuff. For this reason LLMs most definitely are not AGI.
LLMs are tools for human beings, which post-Cognitivists might refer to as “psychotechnologies” (tools that fit the human mind as opposed to physical tools fitting the human hand). In post-Cognitivism (of John Vervaeke, episodes 27–34 of Awakening the Meaning Crisis) we have this divide of three modes of thinking: intelligence, reason and wisdom. Intelligence is the algebraic logic of algorithms, which we can run on classic computers. Key feature of them is that there exist an algorithmic fingerprint and there exists only one deterministic answer (which can be a probabilistic model, but it is always a singular artifact). Wisdom on the other hand includes the experiences we have as humans, which are collectively correlated in “mysterious ways”.
The following paragraph might sound a bit dense as it is a bit more philosophically metaphorical, where I consider world a bit “too computationally”, but in a sense enacting is cheap, as is encryption, but also embodiment of knowledge is far harder, as is decryption, unless you have correct extensions / psychotechnologies / decryption key available for you. Encryption and compression are a bit similar in the sense that they both encode and decode, but in encryption the decoding / decrypting process is not computationally cheap. It is kind of philosophically saying that reality doesn’t offer its truths for free, but once we discover a good model, it gives us plentifully.
Wisdom allows us to have human short cuts for understanding and guess work, which is far more efficient than random, and might be more optimal than mathematical optimization. This is because existential knowing of humans tends to be the key for decrypting the correct solution model within the world we live in and if we ignore that posteriori, the problem space is far more complex (consider the curse of dimensionality from the first section). The reasoning is the domain of AGI, which comes from the ability to use that posteriori. In a sense post-Cognitivism would claim that ASI (Artificial Superior Intelligence) would be in the domain of wisdom and have the ability to decrypt and discover keys, the experiences humans would need for understanding. In other words, AGI would only understand what is a key and what is not.
In philosophy this model is know as the 4E model in philosophy of mind. In post-Cognitivism this models is known as 4P’s, at least as put forward by John Vervaeke (same episodes 27–34 in Awakening from the Meaning Crisis). The idea is that intelligence is in between the two lowest parts (propositional / enacting and procedural / extended), reason is between the middle parts (procedural / extended and perspectival / embedded) and wisdom is between the highest parts (perspectival / embedded and participatory / embodied).
As a conclusion for this entire set of article I would say the following. I think the combination of these ideas in this section (and the empirical research they are based on) is a good frame for evaluating ANI, AGI and ASI. LLMs thus in my opinion are not capable to AGI and if they were we could not run them on classic computers as proposed in the first section. Quantum computers might be able to at least partially do AGI level stuff, but the scope might be limited. In associativity of language there are some complex things, which emerge after sentence level analysis (which is the probable limit of Categorical Grammar).
I also have a kind of push forward, but I believe 99% of readers are unable to understand how these connect (you have to take math a bit more seriously). The problem of language and many other subjects studied with (fractal) dimension reductionistic systems are related to division algebra (also check this): classic physics can be explained by real number systems (classic computers; total ordering), dynamic wave systems can be explained by complex number systems (quantum computers; partial ordering; qubits are in a sense complex numbers), quantum mechanics developed quarternions to explain their empirical phenomenology (non-commutative), while Einstein developed octonions to calculate relativistic physics (non-associative). If all of this sounds like gibberish, I could try to simplify it by saying that Euclidian space is an illusion, which Einstein solved by figuring out that there is something intelligible “between the integer dimension” (fractal dimensionality). However, there is infinite set of different division algebra, which means that we might not have been able to yet formalize all the associative ways (and more complex ways) of using language. We already know some problems, which are very probably impossible to crack with quantum computers.