Collaboration with the German Literature ArchiveStructuring Rilke
22 August 2025
Creative chaos reigns in the diaries of Rainer Maria Rilke. For researchers, this presents a challenge that they aim to address with digital tools. These are being developed by Hussein Mohammed and Quang-Vinh Dang at the Visual Manuscript Analysis Lab of the CSMC.
On 1 December 2022, the German Literature Archive Marbach (DLA) made an announcement that reverberated throughout Germany’s cultural sphere. The Süddeutsche Zeitung hailed it as ‘possibly the most significant acquisition in its history’. Claudia Roth, then Federal Government Commissioner for Culture and the Media, struck a similar note when she described it as ‘perhaps the most important acquisition of an estate in post-war history’. For Sandra Richter, Director of the DLA, it was a ‘once-in-a-century acquisition’ for which the archive had worked for decades.
After having remained in private hands for almost a hundred years, the DLA had acquired the literary estate of Rainer Maria Rilke on that day. One of the greatest German poets of modernism, he led a rather unsettled life – also geographically – and often moved across borders. Nevertheless, this estate is astonishingly complete: the approximately 10,000 pages of documents include numerous letters, mementos, photographs, personal belongings, and drawings, some from his early childhood. In view of this abundance of previously inaccessible material, ‘exciting prospects open up for research and cultural history’, rejoiced the literary critic Gustav Seibt, but he also pointed out: ‘This requires proper recording, indexing, and cataloguing of the new holdings. This should now be done quickly and enthusiastically, ideally digitised in such a way that as many Rilke readers as possible can access it.’

The problem: images without insight
This wish, however, is by no means easy to fulfil. In fact, it touches on a challenge that extends far beyond Marbach and the Rilke estate. In recent years, archives around the world have digitised significant collections, rapidly increasing the volume of available material. However, the research value of digitisation depends on the material’s accessibility and structure, not just its digital existence. Vast sets of manuscript scans or other data are often minimally catalogued, described only at a general metadata level rather than with detailed, content-level information. Creating this structured access is a major undertaking and typically requires far more work than digitisation itself.
Attempts to automate transcription using technologies like Handwritten Text Recognition (HTR) often falter in practice: notebooks and manuscripts do not conform to neat, modern standards. They feature unknown scripts, variable languages, complex layouts, faded or overlapping text, switching implements such as ink or pencil, and annotations added in different directions, not to mention doodles or pasted inserts. Even if perfect transcription were possible, this would still neglect the vital visual context that is crucial for understanding both the artefact and its history – an ambition that is right at the heart of the research at the CSMC.

Computational Visual Catalogues
It was this fundamental issue that Hussein Mohammed had in mind when the DLA approached him at the end of 2024. Richter and her colleagues were seeking a partner to advance the digital exploration of an important part of the Rilke estate: the author’s notebooks. Quite apart from their contents, their visual design is a spectacle in itself. Instead of neatly writing from the top left to the bottom right, Rilke filled the pages in all directions, underlined, overwrote, switched languages, pens, and colours, added little drawings and occasionally – presumably to mark particularly important passages – placed a plant between the pages. With digital versions of 56 such notebooks, Richter turned to the Visual Manuscript Analysis Lab at the CSMC to explore how this material could be made fruitful for Rilke researchers like herself – only early this year had she published a much-acclaimed new biography of the poet.
‘We wanted to develop a solution that would help the scholars in Marbach address their specific questions regarding the Rilke notebooks, but at the same time be general enough to be useful for working with other digital archives as well’, Hussein explains. ‘Developing the right parameters for this was the greatest intellectual challenge.’ The approach he and his colleague Qunag Vinh-Dang have developed over recent months is to build a system in which different AI models work together, each fulfilling a specific role.
Through a carefully constructed pipeline, their system can automatically extract, structure, and catalogue key visual attributes from large sets of manuscript images. The system first identifies where the main manuscript pages are within each image, and informs the subsequent model where to search for individual words. Another model then classifies visual features for every word or area: colour, orientation, writing implement, and more. The result of these interconnected processes is stored as a full ‘Computational Visual Catalogue’ (CVC) for each of the notebooks: structured digital files that link every visual detail to its location in every image.

An interface for interactive visual manuscript research
To facilitate access to the CVCs, Hussein developed ‘ScriptSight’, an application for visual and interactive exploration of the data. With ScriptSight, users can select combinations of features they are interested in, for example ‘all vertically-oriented words in red ink’, and instantly retrieve images and locations from across thousands of pages. Crucially, ScriptSight overlays the AI models’ predictions onto the images themselves, allowing total transparency: researchers can see exactly what the model detected, evaluate the accuracy, and critically engage with the results.
Although designed for Rilke’s notebooks, the approach is deliberately generalisable. Given a corpus with sufficiently similar visual features, the models generating the CVCs can be adapted with modest effort: a handful of annotated images will ‘fine-tune’ them for new visual characteristics or scholarly interests. The extensibility of the CVC structure means that any relevant visual feature, not just text-based ones, can become part of the searchable, structured record: from handwriting styles to material textures or inserted objects.

ScriptSight allows users to engage with the digital material.
Future prospects
After months of developing the models and the software tool, Hussein and Quang-Vinh have now completed the first phase of this project and published ScriptSight on the CSMC website. The CVCs from Rilke’s notebooks are available in the University of Hamburg’s research data repository. Following the signing of a Memorandum of Understanding with the DLA, the door is open for further collaboration in the future.
Given the materials now available in Marbach, it is to be expected that a new phase of intensive scholarly and public engagement with Rilke has only just begun. On 4 December, the author’s 150th birthday, an exhibition entitled ‘Rilke's Worlds’ will open there, for the first time incorporating parts of the estate acquired in 2022.