Recovery of Writing in Large Collections

2023–2025

RFA21

Imaging a Manichaean papyrus in Dublin (left) and imaging Imaging a Manichean papyrus in Warsaw (right).

This project addresses the challenges related to the recovery of text from the same or related manuscripts which are currently dispersed across one or more institutions in an efficient and reproducible manner. It is fairly common that the contents of large manuscript collections have been scattered across libraries, museums, and private collections in different countries. Multispectral imaging (MSI) has traditionally been used to recover text from individual codices, usually palimpsests; it has also been used for a large number of diverse manuscripts from a single collection, notably the library of Saint Catherine’s Monastery, Sinai, and the Dead Sea Scrolls at the Israel Antiquities Authority. However, MSI technology has not yet been deployed to image a collection of related manuscripts across multiple institutions, which poses a number of technological, logistical, and methodological challenges. The Medinet Madi Library of Manichaean codices, of which over 1.110 plates are dispersed across at least four institutions, gives us an opportunity to develop the necessary tools and best practices.

The Medinet Madi corpus of Coptic Manichean texts consists of seven large papyrus codices copied in Egypt around 400 CE that appeared on the Cairo antiquities market in 1929. After being purchased by Chester Beatty and Carl Schmidt in Berlin in the early 1930s, the codices were separated page by page by the accomplished conservator Hugo Ibscher, and later by his son Rolf, and were placed between glass or Perspex® plates. These papyrus leaves, often very fragmentary, are now housed at several European institutions: 690 plates from four codices are stored in the Chester Beatty Library (CBL) in Dublin, 420 plates are at the Ägyptisches Museum und Papyrussammlung (AMP) in Berlin, and 7 pages are split between the University of Warsaw and the Muzeum Narodowe w Warszawie; in addition, a number of plates were looted during WWII and may reside in Russia. All these manuscript pages are of major historical interest because they contain the remains of otherwise lost early texts of Manichaeism, a religion at the intersection of Christian, Zoroastrian, and South Asian traditions, which originated in 3rd century Mesopotamia, heartland of the Sasanian (Iranian) Empire, and spread westwards to the Atlantic, and eastwards across the Silk routes, eventually reaching southern China. The Manichaeans were a persecuted minority almost everywhere they established communities, and their books were sometimes burned; MSI recovery of the Medinet Madi library makes possible the recovery of a significant amount of their otherwise lost and destroyed scriptures.

For the scientific community, the Medinet Madi corpus also poses many logistical and technological problems due to the exceptionally large quantity of data: each of the 1.110 plates must be imaged on both sides, necessitating multiple imaging campaigns. Our solutions to these challenges are applicable to other MSI projects at the CSMC and elsewhere, benefiting the field of cultural heritage imaging more broadly. For example, we have transitioned from exclusively manual image processing, which takes between 4 and 12 hours per dataset, to primarily deterministic “batch” processing, which allows us to recover around 95% of text on all fragments in under 3 hours. Our workflows which we are developing during this project have already contributed to further refining of Hoku and are actively used by the members of the DeLiCaTe project to recover the Armenian and Georgian palimpsests. Manual processing is now used when necessary for finetuning, and data capture has, for the first time, become the most time-intensive part of the MSI pipeline. To tackle this problem, we have reduced our capture sequence time from approximately 7 to 5 minutes by removing the 365nm and 385nm bands, which cannot penetrate glass/Perspex. This optimization allows us to image up to 240 pages in a two-week campaign. Finally, thanks to the versatility, efficiency and adaptability of the MegaVision MSI system we have adapted our image capture setup (e.g., light positions and angles, distance from object to camera sensor) to ensure that subsequent imaging campaigns in Dublin and Berlin are practically identical to previous ones, thereby guaranteeing consistency and reproducibility of results.

The Medinet Madi corpus of Manichaean papyri offers an ideal opportunity to develop best practices for MSI across collections. The recovered texts are of fundamental importance for the history of religions, and there is a similar dispersion of fragmentary texts from related corpora, such as the Turfan and Dunhuang finds. The cultural heritage imaging community also benefits from the procedures and protocols developed, which can be applied to similar collections of disjecta membra.

The exceptionally large quantity of data of the Medinet Madi corpus also makes it a perfect candidate for application of machine learning. New data from the upcoming Dublin and Berlin imaging campaigns will further enhance the already unique and important training data set consisting of spectral data and batch processed images. Current interdisciplinary efforts planned with Hussein Mohammed, who leads the Visual Manuscript Analysis Lab, include the creation of realistic images which aid the transcription effort, as well as digital research on the manuscripts’ paleography.

The imaging and technical part such as data collection, processing and refining of best practices is overseen by Kyle Ann Huskin and Ivan Shevchuk. The philological and cultural/historical aspects of the project is under the direction of Dr. Paul Dilley, the Erling B. “Jack” Holtsmark Associate Professor of Classics at the University of Iowa, and affiliated researcher at the CSMC. Keith Knox, the scientific advisor to the Early Manuscripts Electronic Library (EMEL) and affiliated researcher at the CSMC, is implementing new routines into the image processing software Hoku, which makes batch processing of the images possible.

The work on Medinet Madi library began as a side project within the RFA10 project. A new edition based on the MSI results as well as part of the data were published recently. Progress and developments have been presented at the Deutsche Orientalistentag in Berlin in September 2022 and ultimately led to the creation of project RFA21 shifting the focus towards recovery of large volumes of writing.

People

Project lead: Kyle Ann Huskin

Research Associates: Ivan Shevchuk, Paul Dilley, Keith Knox

Preceding project

Recovery of Damaged Writing (2019–2023)

Principal Investigators: Oliver Hahn, Ira Rabin

Technical staff: Kyle Ann Huskin, Ivan Shevchuk