Can computers be taught to decipher cuneiform writing?
28 September 2019
Historians and archaeologists of antiquity are increasingly using digital humanities to archive, organise and interrogate their data. What if computers could do the work of Assyriologists and decipher cuneiform scripts? Several projects in Europe and America are working towards this goal.
The challenge is huge. Cuneiform writings consist of three different systems: logographic (one sign for one word), syllabic (one sign for one syllable) and alphabetic, with, for the vast majority of texts, a mixture of syllabic and logographic signs. Moreover, they have been used for more than three thousand years by a dozen languages belonging to different linguistic families.
Assyriologists are far from having deciphered all the cuneiform tablets unearthed to date – over one million – and new texts appear regularly, either during archaeological excavations or through the illicit antiquities market.
Depending on the period, the region and the context, cuneiform tablets present a wide variety of scripts, depending on the number of signs used, their standardised or non-standardised form, the language transcribed, the type of text, etc. Palaeographic studies have proliferated, with Assyriologists trying to recognise the different hands of scribes.
Researchers are forging new technologies to help them in their work. Digital photography has gradually replaced hand copies of the tablets, and in recent years 3D scanners and other imaging techniques have made it possible to better render this "intaglio" writing and the clay tablet which is inscribed on all sides.
Several projects have been launched to teach computers to recognise the cuneiform characters on the tablets and to automatically translate a text from the transliteration of the cuneiform characters. These include the German project led by G. Müller, and the transatlantic project coordinated from Toronto by H. Baker.
The former, the "Computer-unterstützte Keilschriftanalyse", is based on 3D modelling of the tablets and cuneiform signs, and on data processing programmes to perform two- and three-dimensional analysis for automatic sign and word recognition. The aim is to draw up as complete a list as possible of the various signs and their variants, to identify the authors of the texts, and to join fragments of tablets presenting the same writing. The experiment is based on Hittite texts (Turkey, 2nd half of the 2nd millennium BCE). These are large tablets, often fragmentary, with fairly flat surfaces and regular signs. Such a technique would be much more difficult to carry out on the tablets of other periods, which are more rounded and have writing on all the edges, with very irregular signs.
The second project, "Machine Translation and Automated Analysis of Cuneiform Languages", focuses on machine learning of transliteration sign sequences using machine translation technologies. The experiment involves 67,000 short, highly standardised texts produced by the administration of the Third Dynasty of Ur (21st century BCE). The Sumerian vocabulary used in these texts, already transliterated on the basis of the Cuneiform Digital Library Initiative (CDLI), is very limited and repetitive, and the researchers who study them do not usually publish a translation. The aim of the project is therefore to make a computer-generated translation of these texts freely available and to provide tools (Linked Open Data) to be able to exploit them more systematically.
These two projects, as well as many others, have targeted corpora of cuneiform texts that allow the use of such technologies, either by their physical aspect or by their very regular and repetitive character. But this represents only a small part of the known documentation and it will be many years before artificial intelligence can replace Assyriologists in deciphering cuneiform texts.