A Stylometric Dataset for Bengali Poems

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI:10.1145/3582768.3582788

M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman

{"title":"A Stylometric Dataset for Bengali Poems","authors":"M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman","doi":"10.1145/3582768.3582788","DOIUrl":null,"url":null,"abstract":"Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.

查看原文本刊更多论文

孟加拉语诗歌的文体测量数据集

诗歌是一种用不同的风格、美学和节奏来表达情感的文学形式。孟加拉语有丰富的诗集。每个诗人都有自己表达思想和情感的风格。然而，对这一孟加拉语分支的语体学研究仍处于早期发展阶段。在本文中，我们提出了一个文体测量数据集，其中以文本格式存储了137位诗人的6070首诗。据我们所知，这是第一个孟加拉语诗歌的风格测量数据集，它将为孟加拉语不断扩大的研究领域增加一个额外的维度。为了探索该数据集的可用性，我们使用深度学习开发了诗歌体裁分类器，可以对这些诗歌进行分类。除了分类之外，还介绍了一些深度学习分类器的性能分析。分类器包括GRU和CNN。其中，GRU以91.48分的f1得分表现较好。本文发表后，该数据集将在https://github.com/shuhanmirza/Bengali-Poem-Dataset上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval

自引率

0.00%

发文量