Tamilex
Establishment of an electronic corpus of Classical Tamil literature and the corresponding historical lexicon informed by emic exegetical and lexicographical sources
2023 – 2046
Southeast India is home to one of the world’s great literary traditions: with a history of over 2,000 years, Tamil is the second-oldest of India’s six classical languages besides Sanskrit. Nevertheless, this literary tradition has been insufficiently explored. Only fragments of it have been translated into other languages so far. The corpus comprises poetry, epics, and a scientific tradition most of which is based on the language-oriented disciplines, including lexicography. Due to diglossia, the early literature, which is concomitant with strong dialectal variations, had to be accompanied by commentaries.
This project aims at establishing an electronic corpus of the most important texts of roughly the first millennium, based on extensive preliminary work in the form of critical editions, translations, and digitisations. It will be endowed with corpus dictionaries (concordances of every occurrence and derivational form), which will then form the basis of a bilingual historical lexicon (Tamil glosses from the commentarial and lexicographic traditions plus an English rendering). Each entry will illustrate both the semantic development and the poetic polysemy/homophony. The quotations will be linked back into the electronic corpus both for the texts and the commentaries, and cross-references will allow access to earlier printed dictionaries. This will yield an interactive online tool that allows direct engagement with the source texts, the exegetical materials, and the lexicographic work(s).
The data will be inserted into a data base with a double web interface in English and Tamil: one for collaborators to enter updated versions and add remarks and improvements, and one for the scholarly public which will allow full use of the material (prospectively also including the full manuscript material), that is, text editions, translations, corpus dictionaries, the interlinked main dictionary, and secondary literature for the purpose of philological, linguistic, and cultural-historical research. There will be a didactic dimension, keeping in mind especially the exponentially growing Tamil community and Tamil diaspora engaging with their heritage, with an added layer of audio recordings of pandit recitation and teaching (partly in Tamil, partly in English). All software developed in Tamilex is fully open source and has been refined in consultation with scholars around the world to be extensible and reusable for future projects. The data format has also been carefully calibrated, with input from similar projects at Paris and Hamburg, so that data can be easily aggregated across projects in order to perform complex meta-analyses.
Contact at CSMC
Professor Dr Eva Wilden
Warburgstraße 26
20354 Hamburg
Tel: +49 40 42838-9417
Email: eva.wilden@uni-hamburg.de