BI-SENT：基于双语方面的乌尔都语COVID-19推文情感分析。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-06-13 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0317562

Ehtesham Hashmi, Amna Altaf, Muhammad Waqas Anwar, Muhammad Hasan Jamal, Usama Ijaz Bajwa

{"title":"BI-SENT：基于双语方面的乌尔都语COVID-19推文情感分析。","authors":"Ehtesham Hashmi, Amna Altaf, Muhammad Waqas Anwar, Muhammad Hasan Jamal, Usama Ijaz Bajwa","doi":"10.1371/journal.pone.0317562","DOIUrl":null,"url":null,"abstract":"The COVID-19 pandemic resulted in over 600 million cases worldwide, and significantly impacted both physical and mental health, fostering widespread anxiety and fear. Consequently, the extensive use of online social networks to express emotions made sentiment analysis a crucial tool for understanding public sentiment. Traditionally, sentiment analysis in the Urdu language has focused on sentence-level analysis. However, aspect-level sentiment analysis is increasingly important and remains underexplored due to the challenges of the costly and time-consuming manual dataset annotation process. This study presents an innovative bilingual aspect-based sentiment analysis for Urdu and Roman Urdu using unsupervised methods. For Urdu, a syntactic rule-based approach achieves an accuracy of 83% in extracting aspect terms, marking a 5% improvement in F1-score over existing methods. For Roman Urdu, the study employs collocation patterns and topic modeling to identify and categorize key aspects, resulting in a perplexity score of -7 and a coherence score of 41. The results not only demonstrate the semantic coherence of the identified categories but also represent a significant advancement in aspect-level sentiment analysis by eliminating the need for manual annotation. This study offers new insights into the sentiments expressed during the pandemic, providing valuable feedback for policymakers and health organizations.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0317562"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165425/pdf/","citationCount":"0","resultStr":"{\"title\":\"BI-SENT: bilingual aspect-based sentiment analysis of COVID-19 Tweets in Urdu language.\",\"authors\":\"Ehtesham Hashmi, Amna Altaf, Muhammad Waqas Anwar, Muhammad Hasan Jamal, Usama Ijaz Bajwa\",\"doi\":\"10.1371/journal.pone.0317562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The COVID-19 pandemic resulted in over 600 million cases worldwide, and significantly impacted both physical and mental health, fostering widespread anxiety and fear. Consequently, the extensive use of online social networks to express emotions made sentiment analysis a crucial tool for understanding public sentiment. Traditionally, sentiment analysis in the Urdu language has focused on sentence-level analysis. However, aspect-level sentiment analysis is increasingly important and remains underexplored due to the challenges of the costly and time-consuming manual dataset annotation process. This study presents an innovative bilingual aspect-based sentiment analysis for Urdu and Roman Urdu using unsupervised methods. For Urdu, a syntactic rule-based approach achieves an accuracy of 83% in extracting aspect terms, marking a 5% improvement in F1-score over existing methods. For Roman Urdu, the study employs collocation patterns and topic modeling to identify and categorize key aspects, resulting in a perplexity score of -7 and a coherence score of 41. The results not only demonstrate the semantic coherence of the identified categories but also represent a significant advancement in aspect-level sentiment analysis by eliminating the need for manual annotation. This study offers new insights into the sentiments expressed during the pandemic, providing valuable feedback for policymakers and health organizations.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0317562\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165425/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0317562\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0317562","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

2019冠状病毒病大流行在全球造成6亿多例病例，严重影响了身心健康，引发了广泛的焦虑和恐惧。因此，广泛使用在线社交网络来表达情绪使得情绪分析成为理解公众情绪的重要工具。传统上，乌尔都语的情感分析主要集中在句子层面的分析。然而，方面级情感分析越来越重要，但由于人工数据集注释过程成本高昂且耗时，因此仍未得到充分开发。本研究提出了一种创新的基于双语方面的乌尔都语和罗马乌尔都语情感分析方法。对于乌尔都语，基于语法规则的方法在提取方面术语方面的准确率达到83%，比现有方法的f1分数提高了5%。对于罗马乌尔都语，本研究使用搭配模式和主题建模来识别和分类关键方面，导致其困惑得分为-7分，连贯得分为41分。结果不仅证明了识别类别的语义一致性，而且通过消除手动注释的需要，代表了方面级情感分析的重大进步。这项研究为大流行期间表达的情绪提供了新的见解，为政策制定者和卫生组织提供了宝贵的反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

BI-SENT: bilingual aspect-based sentiment analysis of COVID-19 Tweets in Urdu language.

查看原文本刊更多论文

BI-SENT: bilingual aspect-based sentiment analysis of COVID-19 Tweets in Urdu language.

The COVID-19 pandemic resulted in over 600 million cases worldwide, and significantly impacted both physical and mental health, fostering widespread anxiety and fear. Consequently, the extensive use of online social networks to express emotions made sentiment analysis a crucial tool for understanding public sentiment. Traditionally, sentiment analysis in the Urdu language has focused on sentence-level analysis. However, aspect-level sentiment analysis is increasingly important and remains underexplored due to the challenges of the costly and time-consuming manual dataset annotation process. This study presents an innovative bilingual aspect-based sentiment analysis for Urdu and Roman Urdu using unsupervised methods. For Urdu, a syntactic rule-based approach achieves an accuracy of 83% in extracting aspect terms, marking a 5% improvement in F1-score over existing methods. For Roman Urdu, the study employs collocation patterns and topic modeling to identify and categorize key aspects, resulting in a perplexity score of -7 and a coherence score of 41. The results not only demonstrate the semantic coherence of the identified categories but also represent a significant advancement in aspect-level sentiment analysis by eliminating the need for manual annotation. This study offers new insights into the sentiments expressed during the pandemic, providing valuable feedback for policymakers and health organizations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage