Recovery of Writing in Large Collections
2023–2025
RFA21
This project takes on the problems of recovering text from large, geographically dispersed collections in efficient and reproducible manner. It is fairly common that the contents of large manuscript collections have been scattered as disjecta membra. Multispectral imaging (MSI) has traditionally been used to recover text from isolated fragments and palimpsested codices. It has also been used successfully to image large quantities of diverse materials in stationary collections, most notably at Saint Catherine’s Monastery in Sinai and at the British Library in London. However, the technology has not yet been deployed to image many artefacts from the same codex that have since been divided among many institutions, which poses a number of technological, logistical, and methodological challenges. The Manichaean papyri from the Medinet Madi library, where over 1.110 plates are dispersed across at least four institutions, gives us an opportunity to develop the necessary tools and best practices.
The Medinet Madi corpus of Coptic Manichean texts consists of seven large papyrus codices copied in Egypt around 400 CE that were uncovered during an archaeological excavation in the late 1920s. After being purchased by Chester Beatty in Dublin and Carl Smith in Berlin in 1930–31, the codices were dismantled page by page, and each fragment now lives between glass or Perspex plates. The fragments are now housed at several European institutions: 690 fragments from four codices are stored in the Chester Beatty Library (CBL) in Dublin, 420 fragments are at the Ägyptisches Museum und Papyrussammlung (AMP) in Berlin, and 7 are split between the University of Warsaw and the National Museum of Warsaw; an unknown number of plates were looted during WWII and may reside in Russia. The fragments are of extreme historical interest because they comprise the remnants of what appears to have been a canonical manuscript of an early translation of the letters of Mani, a 3rd-century CE prophet, healer, and founder of Manichaeism, which flourished for over a millennium across the Mediterranean and as far east as southern China. MSI recovery of these fragments is essential, as primary source material on the religion’s teachings and foundation is relatively sparse.
For the scientific community, the Medinet Madi corpus also poses many logistical and technological problems due to the exceptionally large quantity of data (each of the 1.110 plates must be imaged on both sides) necessitating multiple imaging campaigns. Our solutions to these challenges are applicable to other MSI projects at the CSMC and elsewhere, benefiting the field of cultural heritage imaging more broadly. For example, we have transitioned from exclusively manual image processing, which takes between 4 and 12 hours per dataset, to primarily deterministic “batch” processing, which allows us to recover around 95% of text on all fragments in under 3 hours. Manual processing is now used as a last resort, and data capture has, for the first time, become the most time-intensive part of the project. To tackle this problem, we have reduced our capture sequence time from approximately 7 to 5 minutes by removing the 365nm and 385nm bands, which cannot penetrate glass/Perspex. This optimization allows us to image 240 pages in a two-week campaign. Finally, we have adapted our image capture setup (e.g., light positions and angles, distance from object to camera sensor) to ensure that subsequent imaging campaigns in Dublin and Berlin can be practically identical to all others, thereby guaranteeing consistency and reproducibility of results.
The Medinet Madi corpus of Manichaean papyri offers an ideal opportunity to develop best practices for large collections. The recovered texts are of particular importance for scholars in the history of religion, philology, and literature, among others. The cultural heritage imaging community also benefits from the procedures and protocols developed, as many more large collections, especially of disjecta membra, exist.
The humanities side of this project, including painstakingly editing the Coptic texts and placing them in their sociohistorical context, is done entirely by Dr Paul Dilley of the University of Iowa. The project is also deeply indebted to Dr Keith Knox, the scientific advisor to the Early Manuscripts Electronic Library (EMEL) and the software engineer behind Hoku, which makes batch processing possible.
People
Principal Investigator: Kyle Ann Huskin
Research Associate: Ivan Shevchuk, Paul Dilley
Preceding project
Recovery of Damaged Writing (2019–2023)
Principal Investigators: Oliver Hahn, Ira Rabin
Technical staff: Kyle Ann Huskin, Ivan Shevchuk