An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2024-08-27 DOI:10.1007/s10515-024-00468-3

Eman Fatima, Hira Kanwal, Javed Ali Khan, Nek Dil Khan

{"title":"An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers","authors":"Eman Fatima, Hira Kanwal, Javed Ali Khan, Nek Dil Khan","doi":"10.1007/s10515-024-00468-3","DOIUrl":null,"url":null,"abstract":"<div><p>App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00468-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.

Abstract Image

查看原文本刊更多论文

使用微调深度学习分类器对应用商店中讽刺语言的检测和分类进行探索性自动研究

通过应用程序商店，用户可以对应用程序提出有见地的反馈意见，开发人员可以利用这些意见来改进软件应用程序并使其不断发展。然而，由于终端用户的反馈越来越多，要找到对质量改进和应用程序增强有价值且相关的用户评论具有挑战性。此外，据我们所知，迄今为止，现有的情感分析方法在识别最终用户评论的情感以用于需求决策时，缺乏对讽刺及其类型的考虑。此外，还没有关于通过分析应用程序评论来检测讽刺的工作报道。本文提出了一种自动方法，通过自然语言处理（NLP）和深度学习（DL）算法检测最终用户评论中的讽刺及其类型，并识别有价值的需求相关信息，从而帮助软件工程师更好地理解最终用户的情绪。为此，我们抓取了 Play Store 中七个软件应用程序的 55,000 条最终用户评论。然后，通过对最终用户评论进行批判性分析，并恢复常用的讽刺类型（如讽刺、幽默、奉承、自嘲和被动攻击），开发出一种新颖的讽刺编码指南。接下来，我们利用编码指南和内容分析方法，对 10,000 条用户评论进行了注释，并使其可以被最先进的 DL 算法解析。我们在巴基斯坦两所不同的大学进行了一项调查，以确定参与者手动识别最终用户评论中讽刺语言的准确性。我们开发了一个基本事实来比较 DL 算法的结果。然后，我们应用各种经过微调的 DL 分类器，首先检测最终用户反馈中的讽刺，然后进一步将讽刺性评论分类为更精细的讽刺类型。为此，首先要对最终用户评论进行预处理，并与数据集中的实例进行平衡。然后，应用特征工程对 DL 分类器进行微调。二元分类的平均准确率分别为 97%、96%、96%、96%、96%、86% 和 90%，CNN、LSTM、BiLSTM、GRU、BiGRU、RNN 和 BiRNN 分类器的平均准确率分别为 90%、91%、92%、91%、91%、75% 和 89%。这些信息将有助于提高情感分析方法的性能，从而更好地理解与已识别的新特征或问题相关的情感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.