Similarity Measurement of Visual Patterns in Written Artefacts
2023–2025
RFA05

This project investigated how computer vision and machine learning could support research in manuscript studies by analysing complex visual and material patterns on manuscript pages. It developed methods that detected, recognised, compared, and grouped patterns across diverse data types: conventional images of pages; X-ray fluorescence point measurements (XRF, used to infer elemental composition); and multispectral imaging (MSI, images captured beyond visible light). The overall goal was to provide reliable, quantitative evidence to assist questions about script, decoration, materials, production, and use.
Because annotated examples were often scarce or biased, the project prioritised learning-free statistical techniques, methods that did not require labelled training data. These approaches measured similarity and structure directly from the data and thus avoided dependence on prior annotations. Where annotations were available, the project designed novel training-based systems that coped with limited labels through data-efficient strategies and regularisation, reducing the risk of overfitting and bias.
The project produced robust similarity metrics and end-to-end analysis pipelines that enabled quantitative assessment of fine details in visual patterns and discovery of statistical regularities not evident to the naked eye. These pipelines supported classification (assigning items to known categories), detection (locating features of interest), recognition (identifying recurring motifs or hands), retrieval (finding visually similar items), and clustering (grouping items without prior labels), always tailored to the specific research question.
The developed methods and approaches were applied to several use cases, including the following:
- Analysed Tamil palm-leaf manuscripts to detect words written in particular styles and colophon-related text, identify handmade binding holes, and estimate the distances between them, in collaboration with the PLMPI initiative.
- Analysed Rilke’s notebooks to interpret complex page layouts automatically and generate cataloguing information—such as text orientations, writing implements, and scripts—using data from the Deutsche Literaturarchiv Marbach.
- Restored undertext in Georgian palimpsests (manuscripts in which earlier writing had been overwritten) by applying generative image inpainting to remove overtext and enhance readability, using optical and MSI data from the ERC DeLiCaTe project.
- Examined handwriting styles in Homer’s Iliad on papyri: the system detected Greek letters, clustered them hierarchically by handwriting style, and generated representative models for each cluster, using images and annotations from EGRAPSA at the University of Basel.
- Detected and recognised seals on Arabic manuscripts from the Staatsbibliothek zu Berlin, using very few training examples.
- Analysed the Zhangzhung Nyengyu tsakali collection to detect specific symbols, measure the density of sieve prints from the paper-making process, and examine handwriting styles of potentially different scribes.
People
Project lead: Hussein Adnan Mohammed
Research Associates: Mahdi Jampour (Jun 2023–Jan 2024) and Quang-Vinh Dang (Jan 2025–Dec 2025)
Preceding projects
Pattern Recognition in 2D Data from Digitised Images and Advanced Acquisition Techniques (2019–2022)
Project lead: Hussein Adnan Mohammed