Entries by Jörg Lehmann

Openness, Efficiency and Closed Infrastructures

The concept of data spaces, that the European Commission is pursuing, is not only a technical one; it also implies a political constitution. Data spaces such as GAIA-X do not require centralised management. The operation of such a data space can take place within a federation that establishes the means to control data integrity and […]

Openness, and Some of its Shades

Openness, that lighthouse of the 20th century, came along with open interfaces (APIs). In the case of galleries, libraries, archives, and museums (GLAMs), it was the Open Archives Initiative Protocol for Metadata Harvesting, or OAI-PMH for short. At the time, the idea was to provide an interface that makes metadata available in interoperable formats and […]

Orientation in Turbulent Times

Cultural heritage institutions such as galleries, libraries, archives and museums (GLAMs) currently find themselves in a difficult situation: Generative AI models have fundamentally changed the meaning of the term “openness”. Until recently, the open provision of digital cultural heritage was an absolute ideal, as was the protection of intellectual property rights (IPR). There is a […]

Large Language Models and their WEIRD Consequences

In his book “The Weirdest People in the World“, evolutionary psychologist Joseph Henrich focuses on a particular species that he calls “WEIRD people”. This play on words can be resolved because WEIRD stands for “white, educated, industrialised, rich, democratic”. Henrich wonders how it was possible for a small section of the population, most of whom […]

Power Hungry Magic

“Any sufficiently advanced technology is indistinguishable from magic”, Arthur C. Clarke already knew, and it is part of the magic of new technologies that their downsides are systematically concealed. This is also the case with the energy consumption of large language models (LLMs): As with the schnitzel that ends up on consumers’ plates and makes […]

Feeding the Cuckoo

Large Language Models (LLMs) combine words that frequently appear in similar contexts in the training dataset; on this basis, they predict the most probable word or sentence. The larger the training dataset, the more possible combinations there are, and the more ‘creative’ the model appears. The sheer size of models such as GPT-4 already provides […]


Humans search for themselves in non-human creatures and inanimate artefacts. Apes, the “next of kin”, or dogs, the “most faithful companions” are good examples of the former, robots are good examples of the latter: A human-like design of the robots’ bodies and a humanising linguistic framing of their capabilities supports, according to a common hypothesis, […]

On the Tyranny of the Majority

Large Language Models (LLMs) predict the statistically most probable word when they generate texts. The fact that the predicted word or sentence is the most probable does on the one hand not mean that it is true or false. On the other hand, the prediction of probabilities leads to a favouring of the majority opinion. […]


Language models that generate texts on the basis of probabilities are best approached with solid scepticism with regard to factual accuracy, and with a little humour. Jack Krawczyk, who is responsible for the development of the chatbot “Bard” at Google, openly admitted in March 2023: “Bard and ChatGPT are large language models, not knowledge models. […]

It’s the statistics, stupid

“It’s the statistics, stupid”, one could say when it comes to dealing with generative pretrained transformers (GPTs). Yet, we all still have to learn this, only one year after the presentation of ChatGPT. Statistical correlations are key to understanding how stochastic prediction models work and what they are capable of. Put in simple terms, machine […]