Skip to Main Content
We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.