Design and Development of Computational Tools for Analyzing Elements of Hindi Poetry

Poetry writing is a qualitative subject and so is its analysis. Mapping of these poetic elements onto a scale of real numbers is a lacking necessity. Albeit, Hindi literary heritage, being so huge and glorified, there is remarkably very few computational works done exploring the underlying structures. Out of which most of them is to detect a particular metre rather than a generalized approach. The state-of-art metadata generator fails to provide any measures of underlying structural elements of poetry. There is no automated system that generates rhyming pattern hidden in a poem for Hindi language or a system to detect and estimate the extent of figure of speech in a given text of any language. In this article, to extract and evaluate elements of poetry, three efficient tools, namely Text2Mātrā, RPaGen and FoSCal, have been designed and developed. The Text2Mātrā tool provides the numeral scansion for any Hindi input text, which can serve as basis for copious analytical and detection work. RPaGen detects the poem type of any input poem and outputs its rhyming pattern. FoSCal gives a quantitative representation of detected figures of speech in any input text, using the scoring scheme formulated using fuzzy approach and weighted analysis. These tools may find their utility in various fields such as education, literary criticism, philology, authorship-attribution, etc. There have been various computational activities done in the field of poetry analysis over the various languages across the world. However, quantifying the extent of Figure of Speech in poetic compositions, in any language, is entirely a novel approach. Mapping the aesthetic properties of a subjective idea (like poetry) onto a numeral scale, to the best of our knowledge, is first of its kind for Hindi language.

the chanda rules in Hindi poetic composition. The aspects 103 of poetry stated above are not exhaustive. It varies greatly 104 from language to language, which implies that algorithms 105 designed for poetry analysis in one language more often may 106 not be equally applicable to another. In contrast to Hindi, 107 a significant amount of work has been done in computing 108 poetic elements of other languages, such as metre detection 109 and classification of Arabic [1], [4], [18] and Persian poetry 110 [28], a study of metre as a stylistic feature in Latin poetry 111 [6], an expert system for harmony test of Arabic poetry [3], 112 a statistical evaluation of Chinese Tang [2], [32] and English 113 [10] poetry, an emotion based classification for Marathi [26], 114 Punjabi [8], and Arabic [20] poetry, a study of rhythm of 115 Tibetan poetry [14]. One of the surprises of this study is that 116 none of the above languages has a computational system or 117 tool that recognises and quantifies figure of speech, a very 118 important component of poetry, which is one of the novelties 119 of this article. 120 From the literature reviwed, it is to note that, we do not 121 yet have any automated tool to detect the nature of rhymes 122 contained in Hindi poetry. It is clear that numerical transfor-123 mation of text is desirable when analysis comes into picture. 124 One such text to numerical converter is a part of metadata 125 generator given by Audichya and Saini [15], but an insouciant 126 work done in automating it and lacking to provide an algo-127 rithm or a caliber result set. Acting upon the identified gap we 128 created tool that can generate rhyme pattern(s) hidden in an 129 input poem and a tool which outputs a sequence of 1s and 2s 130 tantamount to the input text. The use of rhetorical components 131 enhances aesthetic beauty in poetry. Ornate compositions 132 especially impress the listeners. Its correct use is also consid-133 ered a sign of intelligence. Comparison of the aesthetic beauty 134 generated by ornamentation in two or more compositions is 135 absolutely expected. Such comparisons do happen in literary 136 analysis, but these comparisons are subjective. Therefore, 137 there is room for ambiguity in these analyses. However, this 138 skeptical situation can be avoided if the method of analysis 139 is made objective rather than subjective. There is no tool 140 or any automated system to the best of authors knowledge 141 that estimates the measure of aesthetic components of any 142 input Hindi text that can provide solution to the aforestated 143 problem.

144
In this article we propose a toolset that is probably the 145 most rigorous computational research done to the present in 146 the area of interest. Three efficient tools are offered, namely 147 Text2Mātrā, RPaGen and FoSCal. The Text2Mātrā tool pro-148 vides the numeral scansion for any input text, which can 149 serve as the basis for copious analytical and detection (for 150 example, to detect chanda type, to detect rhythm pattern, 151 verifying metrical correctness of a given verse, etc.) work. 152 In the current context, RPaGen is the first of its kind, it detects 153 the poem type of any input poem and outputs its rhyming 154 pattern. FoSCal is a tool in a class by itself that estimates 155 the extent of aesthetic components (alliteration) of any input 156 text. The proposed toolset (Text2Mātrā, RPaGen and FoSCal) 157 covers the gaps identified. anyone affected. Learning poems will help to grow intellect 179 and creativity. Poetry is a healthy way to let out the surging 180 emotions in a growing teen. It helps any learning child to 181 understand the impact of words. Owing to it, this work may 182 find its utility in the field of education and even intended poets 183 or lyricists may find these tools equally helpful. Apart from 184 the educational aspect, these tools are highly serviceable to 185 the community of literary critic. It not only helps them to 186 strengthen their observations but also helps them to create 187 one. 188 This article has been organized in the following order.

189
Section 2 provides an introduction to Hindi-alphabet and ele-

198
Hindi-Alphabet is defined as a well-organized set of aks . ara 199 (letter). An aks . ara is a root sound that cannot be broken 200 anymore and can be pronounced by one effort of voice.

201
Akṡara can be of two types, svara (vowels) and vyañjana 202 (consonants). There is a total of eleven svara and thirty-203 three basic vyañjana in the Hindi-Alphabet (see Table 1)

212
There are some basic terminologies that one must know 213 before they dive into chandas. 214

215
A consonant along with a vowel or a vowel alone is consid-216 ered as one syllable. Depending on the vowel (long or short) 217 type a syllable can be long (guru) or short (laghu).

234
The terms discussed above are required for the understand-235 ing of a set of rules discussed in section 3(A), which is used 236 for the creation of Algorithm 1.  The present article is an attempt towards the design and 252 development of automation tools for analyzing three ele-253 ments of Hindi poetry. These tools provide a way to identify 254 and extract features (apparent or non-apparent) from among 255 elements that, in some form, are crucial to approximate a 256 subjective idea (such as poetry) into an objective form (such 257 as number(s)). The basics needed to understand the current 258 context is discussed below: In Sanskrit and languages (such as Hindi) derived from 261 Sanskrit, chanda refers to poetic compositions with a 262 well-ordered and predefined miscellany of morae or 263 syllables. Metres are chiefly of two types, namely, mātrika 264 and varṅika. It would be surprising to know that at the beginning of such 271 a long history of chanda based poetry, there was no accepted 272 practice of rhyme but over a period it became inseparable. 273 Lack of rhyme in singable compositions imply non-elegance 274 and is often perceived as a hindrance to its production.

275
The presence of rhyme is definitely visible in the compo-276 sitions of old or ancient Hindi, from where it spontaneously 277 came into the compositions of modern Hindi. At the same 278 time, we also accept that in modern times, many free verse 279 compositions are often free from rhyme. But such composi-280 tions are certainly not lyrical.

281
Details concerning the design of algorithm (Algorithm 2) 282 to generate rhyme pattern(s) for any input poem is discussed 283 in section 3(B). In Hindi, adornment that is used to enhance the elegance of 286 any poetic composition and makes it influential is alaṅkāra 287 or figure of speech (FoS). It makes any expression abstruse 288 and brings ingenuity to it.

289
Concepts needed for the understanding of scoring scheme 290 which is used for designing Algorithm 3 (section 3(C)) are 291 stated below.

292
Alaṅkāra in Hindi can be broadly classified as 293 arthālalaṅkāra, śabdālaṅkāra and ubhayalaṅkāra. Śab-294 dalaṅkāra can be further categorized as character-oriented 295 (varṅa mūlaka) and word-oriented (śabda mūlaka). The 296 first FoS tool for Hindi language, proposed in this arti-297 cle, covers character-oriented śabdālaṅkāra or alliteration, 298 which is further categorized into four types: chekānuprāsa, 299 vr . ttyanuprāsa, śrutyanuprāsa and antyānuprāsa (see Fig. 2 Here, and form rhyming between the two lines 367 and so we can say antyānuprāsa is present.

369
This section contains the various algorithms designed by 370 referring to the rules specified across a number of sources 371 along with their methodology. This discussion is followed by 372 the utility of the proposed tools.

373
The asymptotic computational complexities, in terms of 374 space and time, of the tools are discussed just below the 375 algorithm specifications of the respective tools.    Table 3, rhyme-making words (or phrases) are underlined 437 for clarity.     The procedure of extraction of rhyme pattern for a given 495 input poem is depicted in Fig. 4. proposed tool provides a quantitative measure of the presence 534 and extent of rhetorical elements in any text, giving a score 535 ranging from zero to one, where zero and one correspond to 536 lowest and highest score respectively.

537
The tool focuses on alliteration, a type of alaṅkāra that is 538 based on the repetition of one or more aks . ara. Alliteration 539 can be of four types (discussed in section 2); the score is 540 calculated for each type independently and then fused to get 541 the final score for the entire poem. Scoring methodology, pro-542 pounded by authors of this article, for each type is discussed 543 below.

544
Antyānuprāsa is present in the text if rhyming is present. 545 The algorithm for generating a rhyme pattern for a given 546 poem is discussed in section 3(B). Rhyming can be best, 547 medium, worst or not present. It is obvious that the best rhyme 548 in any poem will be more captivating than one with inferior 549 rhyme (medium, worst or not present). So, the best rhyme is 550 scored 1, which is the highest scoring value and not present 551 is scored 0, which is the lowest scoring value. Medium and 552 worst types are scored 2 /3 and 1 /3 respectively. This assignment 553 is justified by the fact that these weights are equidistant on the 554 number scale with extreme values zero and one (see Fig. 5). 555 Sounds in Hindi are classified into six classes based on 556 their place of articulation (see Table 4). Śrutyanuprāsa is 557 realized when sounds belonging to specific articulation points 558 dominate. The majority implies dominance, but in some 559 cases, even non-majoritarian may dominate. Based on con-560 text, function checkShrutya(text,d) is supplied with a suit-561 able value for dominance (i.e., d). Recommended dominance 562 value is forty percent or above.

579
The µ(x) value calculated using (1) yields the desired 580 ŚrutyanuprāsaScore (śScore) 581 Here, d is the point of dominance set by the user and a is the The final alaṅkāra score for the input poem is given using (2).   The process of alaṅkāra score calculation for a given input 650 poem is depicted in Fig. 7.  Text_to_mātrā_converter, generates numeral scansion for a 680 given text in Hindi. As an illustration, the input-output for 681 four representative chanda types are given in 682 Rhyme_pattern_generator extracts rhyme pattern from given 719 Hindi poem. As an illustration, the input-output for a few 720 representative poems is given in Table 6. A sufficiently large 721 number of valid and invalid inputs were provided to check 722 the robustness of the tool and some of them are listed in 723 Table 6 [36].

724
Poem in example 1 (Table 6) belongs to class PT3. There-725 fore, lines 1 and 2 should rhyme. Further, there are two ways 726 to get the rhyme pattern of the input poem, first is when 727 line 1 rhymes with lines 4, 6 and 8 and the second is when 728 line 2 rhymes with lines 4, 6 and 8. The tool provides both 729 possible rhyming patterns. So, when you look at the output of 730 example input 1 (Table 6), it is a list of three elements. First 731 is the rhyming pattern and rhyme class between lines 1 and 2, 732  second is a rhyming pattern in poem corresponding to line 1 733 and third is the rhyming pattern in the poem corresponding to 734 line 2. 735 Poem in example 4 (Table 6) is a combination of PT2 736 (example 2, Table 6) and PT1 (example 3, Table 6) for which 737 the output given by the RPaGen is unknown type, which is 738 correct as no such poem type is defined. Similarly, for the 739 poem in example 5 (Table 6), output given by the RPaGen 740 is unknown type, which is valid as there is no rhyme in the 741 poem.   In case, the length of the poem supplied as input is not a 743 valid one, RPaGen outputs invalid input message (example 744 input 8, Table 6).  Table 7. Apart from stories, 756 it is tested on 50 newspaper articles, three of them are shown 757 in Table 7. The tool is then used to generate score for 551 dohā 758 by Tulsidas (score shown in Table 7  There are tools developed for other languages such as 764 Sanskrit metre identifier [37]. This tool outputs the probable 765 metre for any input verse in Sanskrit. The tool is raw and 766 primitive but the work is under progress. Geet Gatiroop pro-767 vides a tool that calculates the instant counts for lines of any 768 input verse in Hindi language [35]. The optimal automated 769 rhyme determining tool is a state of art for Russian poetry 770 [30]. RhymeDesign, is an open-source implementation tool 771 for detecting sonic device in poetry for English language [22]. 772

773
The mathematical and statistical study of the aesthetic ele-774 ments of poetic compositions is important for many rea-775 sons. In this connection, in this research, we have proposed 776 some tools for the study and analysis of the elements of 777 Hindi poetry. The Text2Mātrā tool provides a scansion, which 778 can be the foundation for various observations related to 779 chanda, like chanda type detection and classification, rhythm 780 determination, correctness verification of chanda, etc. The 781 RPaGen tool provides a multifunctional output of rhyme pat-782 tern(s) for an input Hindi poem. The salutary of the tool can 783 be seen in the very article, the pattern produced by RPaGen 784 is used by FoSCal for alaṅkāra scoring. The FoSCal, the 785 maiden tool for alaṅkāra score generation, engineered over 786 one of the so many types of alaṅkāra, creates a space and 787 motivation for automation of the related works which are still 788 unexplored.