M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman
{"title":"A Stylometric Dataset for Bengali Poems","authors":"M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman","doi":"10.1145/3582768.3582788","DOIUrl":null,"url":null,"abstract":"Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.