Knowledge as an emergent effect of the complexity of large language models

Rafal Maciag

Abstract:

The following abstract contains a proposal for interpreting the phenomenon of knowledge that appears during the exploitation of large language models (LLMs). As the ontological context of this knowledge, language is proposed, which is also a social phenomenon. Knowledge is interpreted here as an emergent effect of the complex system, which is the working LLM. To explain this effect, we propose to build a trajectory for each input token which is based on the theory of discursive space. These trajectories traverse the manifold that contains all the computational spaces that each token traverses. This reasoning is justified by the nonlinear and nondeterministic nature of LLMs. This task requires at least a preliminary understanding of the descriptive category of what knowledge is. Evidence of the importance of knowledge in the context of LLMs is also provided by research, e.g., Fierro et al., Heersmink et al., Peterson, Kim, and Thorne.

The approach to knowledge here is pragmatic, which means that it is assumed that the form of retention/articulation of knowledge is broadly understood linguistic utterances. Language is a product of social circumstances and therefore remains permanently determined historically and territorially. It also produces deep, abstract phenomena capable of transferring knowledge, which primarily include discourse. This approach was proposed by Michel Foucault.

The result of LLM training is a stable computational structure of great complexity, which is illustrated by the large number of parameters. The resulting structure, which is a composition in time of many multidimensional computational spaces, is not strictly deterministic.

In the process of processing semantic inferences in a model based on a stable, trained set of parameters, a specific "phase transition" occurs from the numerical level to the epistemic level (of knowledge as a phenomenon). The concept of emergence can be used to describe this "transition", and the hypothesis can be formulated that knowledge is an emergent effect of a complex LLM system, in the sense given to emergence by Philip Clayton, developing the definition by el-Hani and Pereira.

The emergence problem can theoretically be solved by a model of the trajectory of semantic signatures, i.e., vectors in different spaces that build the LLM, according to the order of calculations.

The formal way of analyzing knowledge in LLMs is to reconstruct the trajectories of semantic signatures in a manifold, which would consist of all the computational spaces of the model. These trajectories would be a representation of the particular model's knowledge and would be visible to the outside as an emergent phenomenon of knowledge. This approach would constitute an extended application of the definition of knowledge proposed in the theory of discursive knowledge by Rafal Maciag: “knowledge is a set of states of gnosemes in an n-dimensional manifold that can be interpreted locally as a knowledge space”.

Philisophical issues:

Epistemological issues—the problem of knowledge as a social phenomenon;
Ontological issues—the problem of complex systems and accompanying phenomena, e.g. emergence;
Hermeneutical issues—since the subject of the research is an artificial text, the hermeneutical context becomes important.