FDM/OS talk: Code, models, data — licensing questions in research software (DHinfra)

On 19 March 2026 I gave a talk (in German) at the FDM/OS series at the University of Graz. The audience was allegedly three times the usual size for that slot. The topic obviously resonates well beyond a single institution.

After consulting our internal procedures and a year’s worth of team decisions, I kept the scope deliberately modest: University of Graz, DHinfra, what we have learned in practice, and definitely NOT a widely applicable legal primer.

Slides (Zenodo, CC BY 4.0): doi:10.5281/zenodo.19162768

Accordingly, this is not a generic compliance talk. In DHinfra, software is partly budgeted as infrastructure (alongside hardware), with a dedicated FOSS investment envelope (for three software packages). With this, we evaluate candidates, negotiate commitments, and decide what the consortium formally supports. The three tools that received that investment are QLever (high-performance SPARQL search, Apache 2.0), Zellij (ontology-oriented data management, Apache 2.0), and liiive.now (collaborative IIIF image annotation, MIT).

Huge thanks to my colleagues who have carried a good part of the load in our processes—especially Lukas Waldhofer, Florian Wachter, Elisabeth Steiner, and Walter Scholger, as well as the FOSS working group at dhinfra.at.

Two points I tried to land#

First, choosing “a fitting” open licence for “research software” is rarely enough. A PI (or whoever owns the research project) needs to think about the value chain. These include dependencies, how a component sits in a larger stack, reuse by others, and how funding and institutional policies constrain choices. Licensing sits in the many middles of that flow, not only at the beginning or end. It is definitely not a simple checkbox you can tick off using a form wizard.

Second, this is too heavy to expect every researcher to internalise alone—especially when so much work runs on third-party funding with reporting and exploitation clauses that interact with software, models, and data. What helps is advance planning and access to consolidated expertise (legal + research IT / RDM), ideally with coordination beyond a single chair where that is realistic. In my opinion, we should not pretend that one short talk will solve organisational design (flaws).

Why the “rabbit hole” metaphor sticks#

Licensing for code, models, training data, and documentation stacks layers of obligations and compatibility questions. Recent news and public discussions—e.g. attribution and model licensing in commercial AI products (including cases that have drawn scrutiny in the press)—are a very good reminder that a naive reuse of weights or code can cause serious problems. For a recent cast (not legal advice), see for example the March 2026 overview “AI Model Licensing: Legal Rules for Open-Source Attribution” on Recording Law.

This growing complexity is why I am glad we dug into it in DHinfra. I am also somewhat relieved to step back from the detail: I understand much better what can go wrong with careless licensing, and I am convinced that funding bodies and institutions should treat licensing expertise as infrastructure, not as an optional extra left, again, to researchers.

If this note and/or talk helps nudge a single PI toward an earlier conversation with RDM or legal, it was worth the nerves.

PS: Recordings will be available soon.