使用深度学习的乌尔都语诗的诗人归属

Iqra Siddiqui, Fizza Rubab, Haania Siddiqui, Abdul Samad
{"title":"使用深度学习的乌尔都语诗的诗人归属","authors":"Iqra Siddiqui, Fizza Rubab, Haania Siddiqui, Abdul Samad","doi":"10.1109/ICAI58407.2023.10136675","DOIUrl":null,"url":null,"abstract":"Poet attribution focuses on determining ownership of a piece of poetry by insights obtained from analyzing his existing poetry. Its significance is immense including in detection of plagiarism and characterization of poetry of a poet. Urdu, Pakistan's lingua franca with the richest poetic tradition, has been a subject of misinformation and misattribution. This paper presents a novel approach to poet attribution in Urdu Ghazals through the application of machine and deep learning models. Our aim is to establish an accurate and comprehensive characterization of ghazals that captures the unique writing style of each poet. To achieve this, we trained and tested a range of machine learning, deep learning, and transformer-based classification models on a dataset containing 17,609 couplets of 15 notable ghazal poets. We used classifiers such as SVM and logistic regression to obtain preliminary results, achieving an accuracy of 64% with SVM. However, to achieve even better results, we employed deep learning models such as MLP, CNNs, and GRUs, with LSTMs resulting in the highest accuracy of 59.96%. We then used transformer-based models, including roBERTa and BERT, which achieved an outstanding accuracy of approximately 80% in classifying 15 poets. This work represents a significant contribution to the field of computational poetry analysis, as it is the first to explore poet attribution in Urdu Ghazals using deep learning and transformer-based models. Our analytical approach enables us to examine and analyze each model's capabilities in capturing the writing style of Urdu Ghazal poets, leading to a more comprehensive and accurate characterization of these works.","PeriodicalId":161809,"journal":{"name":"2023 3rd International Conference on Artificial Intelligence (ICAI)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Poet Attribution of Urdu Ghazals using Deep Learning\",\"authors\":\"Iqra Siddiqui, Fizza Rubab, Haania Siddiqui, Abdul Samad\",\"doi\":\"10.1109/ICAI58407.2023.10136675\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Poet attribution focuses on determining ownership of a piece of poetry by insights obtained from analyzing his existing poetry. Its significance is immense including in detection of plagiarism and characterization of poetry of a poet. Urdu, Pakistan's lingua franca with the richest poetic tradition, has been a subject of misinformation and misattribution. This paper presents a novel approach to poet attribution in Urdu Ghazals through the application of machine and deep learning models. Our aim is to establish an accurate and comprehensive characterization of ghazals that captures the unique writing style of each poet. To achieve this, we trained and tested a range of machine learning, deep learning, and transformer-based classification models on a dataset containing 17,609 couplets of 15 notable ghazal poets. We used classifiers such as SVM and logistic regression to obtain preliminary results, achieving an accuracy of 64% with SVM. However, to achieve even better results, we employed deep learning models such as MLP, CNNs, and GRUs, with LSTMs resulting in the highest accuracy of 59.96%. We then used transformer-based models, including roBERTa and BERT, which achieved an outstanding accuracy of approximately 80% in classifying 15 poets. This work represents a significant contribution to the field of computational poetry analysis, as it is the first to explore poet attribution in Urdu Ghazals using deep learning and transformer-based models. Our analytical approach enables us to examine and analyze each model's capabilities in capturing the writing style of Urdu Ghazal poets, leading to a more comprehensive and accurate characterization of these works.\",\"PeriodicalId\":161809,\"journal\":{\"name\":\"2023 3rd International Conference on Artificial Intelligence (ICAI)\",\"volume\":\"163 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Artificial Intelligence (ICAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAI58407.2023.10136675\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Artificial Intelligence (ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAI58407.2023.10136675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

诗人归属的重点是通过分析诗人现存的诗歌而获得的见解来确定一首诗的归属。它的意义是巨大的,包括发现抄袭和诗人的诗歌特征。乌尔都语是巴基斯坦最丰富的诗歌传统的通用语,一直是误传和误归的对象。本文通过机器和深度学习模型的应用,提出了一种乌尔都语Ghazals中诗人归属的新方法。我们的目标是建立一个准确而全面的特征,捕捉每个诗人独特的写作风格。为了实现这一目标,我们在包含15位著名诗人的17,609对联的数据集上训练和测试了一系列机器学习、深度学习和基于变压器的分类模型。我们使用SVM和逻辑回归等分类器获得初步结果,SVM的准确率达到64%。然而,为了获得更好的结果,我们采用了深度学习模型,如MLP、cnn和gru,其中lstm的准确率最高,达到59.96%。然后,我们使用了基于变压器的模型,包括roBERTa和BERT,它们在对15位诗人进行分类时达到了大约80%的出色准确率。这项工作代表了对计算诗歌分析领域的重大贡献,因为它是第一个使用深度学习和基于转换器的模型探索乌尔都语Ghazals中的诗人归属的研究。我们的分析方法使我们能够检查和分析每个模型在捕捉乌尔都语加扎勒诗人的写作风格方面的能力,从而更全面、更准确地描述这些作品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Poet Attribution of Urdu Ghazals using Deep Learning
Poet attribution focuses on determining ownership of a piece of poetry by insights obtained from analyzing his existing poetry. Its significance is immense including in detection of plagiarism and characterization of poetry of a poet. Urdu, Pakistan's lingua franca with the richest poetic tradition, has been a subject of misinformation and misattribution. This paper presents a novel approach to poet attribution in Urdu Ghazals through the application of machine and deep learning models. Our aim is to establish an accurate and comprehensive characterization of ghazals that captures the unique writing style of each poet. To achieve this, we trained and tested a range of machine learning, deep learning, and transformer-based classification models on a dataset containing 17,609 couplets of 15 notable ghazal poets. We used classifiers such as SVM and logistic regression to obtain preliminary results, achieving an accuracy of 64% with SVM. However, to achieve even better results, we employed deep learning models such as MLP, CNNs, and GRUs, with LSTMs resulting in the highest accuracy of 59.96%. We then used transformer-based models, including roBERTa and BERT, which achieved an outstanding accuracy of approximately 80% in classifying 15 poets. This work represents a significant contribution to the field of computational poetry analysis, as it is the first to explore poet attribution in Urdu Ghazals using deep learning and transformer-based models. Our analytical approach enables us to examine and analyze each model's capabilities in capturing the writing style of Urdu Ghazal poets, leading to a more comprehensive and accurate characterization of these works.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信