Openness, and Some of its Shades
Openness, that lighthouse of the 20th century, came along with open interfaces (APIs). In the case of galleries, libraries, archives, and museums (GLAMs), it was the Open Archives Initiative Protocol for Metadata Harvesting, or OAI-PMH for short. At the time, the idea was to provide an interface that makes metadata available in interoperable formats and thus enables exchange between different institutions. In addition, the harvesting of distributed resources described in XML format is made possible, which may be restricted to named sets defined by the provider. The objects are referenced via URLs in the metadata; this also facilitates access to the objects themselves. Basically, the protocol is not designed to differentiate between users; licences and rights statements can be included, but it was not foreseen to mask specific material from access: The decision whether or not (and which) use would be made from material protected by intellectual property rights in the end lies with the users.
The 21st century brought a new concept: Data sovereignty. This implies, on the one hand, that data are subject to the laws and governance structures that apply in the jurisdiction where the data are hosted; and for the hosts, on the other hand, the concept stands for the notion that rights holders can determine themselves what third parties may and can do with the data. With regard to the situation that there is now a second lighthouse – provision of cultural heritage data sets for innovation and research – providing orientation in troubled times, the role of cultural heritage institutions as access brokers becomes tangible: If rights holders do not wish to provide their (IPR protected) data openly to commercial AI companies, GLAM institutions as data providers are in the position to negotiate differentiations in the use of these data. For example, they may be used freely by startups, small and medium-sized enterprises (SMEs) and companies active in the cultural sector, while for big tech this could involve fees. Interestingly, the European Data Governance Act foresees such a case and includes a relevant set of instruments. There is a chapter on the use of data provided by public sector bodies (Chapter II, Article 6), which regulates the provision of data in exchange for fees and allows for the differentiation of the fees to be charged between private users, SMEs and startups on the one hand, and larger corporations on the other, which don’t fall under the former condition. In this way, a possibility for differentiation within the framework of commercial users is created, whereby the fees have to be oriented at the costs of the infrastructure to provide data. For these cases, cultural heritage institutions need new licences (or rights statements), clarifying whether or not commercial enterprises are excluded from the access to data based on the opt-out option of the rights holders; and clarifying whether or not big tech corporations get access by paying fees while data are provided free of cost to start-ups and and SMEs.
While this describes the legal side of the role of GLAM institutions as access brokers, there is also a technical side to data sovereignty, addressed by “data spaces”. APIs like OAI-PMH will continue to ensure the exchange between institutions, but will lose in importance in terms of data provision for third parties (apart from the provision of material which is in the public domain). By contrast, the concept of data spaces, which is of central importance for the European Commission’s policy for the upcoming years, will gain in importance. One planned data space is, e.g., the European Data Space for Cultural Heritage, which is to be created in collaboration with Europeana; existing similar initiatives include the European Open Science Cloud (EOSC) and the European Collaborative Cloud for Cultural Heritage (ECCCH). A technical implementation of such a data space is GAIA-X, a European initiative for an independent cloud infrastructure. Amongst other functionalities, it enables GLAM institutions to keep their data on premise while delivering processed data to users of the infrastructure after applying an algorithm of their choice to the data held by the cultural heritage institution: Instead of downloading terabytes of data and processing them on their own, the algorithm (or machine learning model) can be selected and sent to the data. An example providing such functionalities has been developed by Berlin State Library with the CrossAsia Demonstrator. Such an infrastructure does not only enable the handling of data with various rights of use, but also allows a differentiation between users as well as payment services. In other words: It grants full sovereignty over the data. As with all technical solutions, there is a downside: Such data spaces are usually complex and difficult to manage, which entails an obstacle for cultural heritage institutions, and often results in the need for additional manpower.
Linked (but not bound) to the concepts of data spaces and data sovereignty is the idea of a commons. “Commons” designates a shared resource that is managed by a community for the benefit of its members. Europeana, the meta-aggregator and web portal for the digital collection of European cultural heritage, explicitly conceptualises the planned European Data Space for Cultural Heritage as “an open and resilient commons for the users of European cultural data, where data owners – as opposed to platforms – have control of their data and of how, when and with whom it is shared“. The formulation chosen here is indicative of a learning process with regard to openness: defining an open commons “as opposed to platforms” addresses an issue which is characteristic of open commons, namely the over-use of the available resources which may lead to their depletion. In the classical examples of commons like fishing grounds or pasture, the resource is endangered if users try to profit from it without contributing at the same time to its preservation. However, this is not the case with digital resources. Rather, the issue lies with the potential loss of communal benefits due to actions motivated by self-interest. In the 21st century, the rise of the big platforms has revealed what has been termed “the paradox of open”: “open resources are most likely to contribute to the power of those with the best means to make use of them“. The need for data spaces managed by a community for the benefit of its members does not only add another shade to openness; at the same time, it opens up another front – the turn against platformization implies a rejection of the dominance of non-European big tech companies.