Skip to Main Content
The Quran is a significant religious text, followed by the 1.5 billion believers of the Islamic faith worldwide. The text dates to 610-632 CE and is written in Quranic Arabic, the direct ancestor language of modern standard Arabic in use today. This paper presents the Quranic Arabic Dependency Treebank (QADT) and reports on the approaches and solutions used to apply Natural Language Processing to the unique and challenging language of the Quran. This project differs from other Arabic treebanks by providing a deep computational linguistic model based on historical traditional Arabic grammar. The treebank is part of the Quranic Arabic Corpus (http://corpus.quran.com), a popular free Arabic resource developed at the University of Leeds. Motivated by the importance of the Quran as a central religious text, we also report on how online collaborative annotation was used to bring together Quranic scholars and Arabic language experts to ensure a high level of accuracy for grammatical analysis of the entire Quran.