A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study

Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma
{"title":"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":null,"url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\nvaluable insights into users' perceptions of app functionality and their\nevolving needs. Given the volume of user reviews received daily, an automated\nmechanism to generate feature-level sentiment summaries of user reviews is\nneeded. Recent advances in Large Language Models (LLMs) such as ChatGPT have\nshown impressive performance on several new tasks without updating the model's\nparameters i.e. using zero or a few labeled examples. Despite these\nadvancements, LLMs' capabilities to perform feature-specific sentiment analysis\nof user reviews remain unexplored. This study compares the performance of\nstate-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\nextracting app features and associated sentiments under 0-shot, 1-shot, and\n5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\nrule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\n5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\npositive sentiment towards correctly predicted app features, with 5-shot\nenhancing it by 7%. Our study suggests that LLM models are promising for\ngenerating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Analyzing user reviews for sentiment towards app features can provide valuable insights into users' perceptions of app functionality and their evolving needs. Given the volume of user reviews received daily, an automated mechanism to generate feature-level sentiment summaries of user reviews is needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have shown impressive performance on several new tasks without updating the model's parameters i.e. using zero or a few labeled examples. Despite these advancements, LLMs' capabilities to perform feature-specific sentiment analysis of user reviews remain unexplored. This study compares the performance of state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for extracting app features and associated sentiments under 0-shot, 1-shot, and 5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms rule-based approaches by 23.6% in f1-score with zero-shot feature extraction; 5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting positive sentiment towards correctly predicted app features, with 5-shot enhancing it by 7%. Our study suggests that LLM models are promising for generating feature-specific sentiment summaries of user reviews.
使用大型语言模型对应用程序评论进行细粒度情感分析:评估研究
分析用户评论中对应用程序功能的情感可以为了解用户对应用程序功能的看法和不断变化的需求提供宝贵的信息。鉴于每天都会收到大量的用户评论,因此需要一种自动机制来生成用户评论的特征级情感摘要。大型语言模型(LLMs)(如 ChatGPT)的最新进展表明,在不更新模型参数的情况下,即使用零个或少量标记示例的情况下,它在一些新任务上的表现令人印象深刻。尽管取得了这些进步,但 LLMs 对用户评论进行特定特征情感分析的能力仍有待开发。本研究比较了最先进的 LLM(包括 GPT-4、ChatGPT 和 LLama-2-chat 变体)在 0 次拍摄、1 次拍摄和 5 次拍摄场景下提取应用特征和相关情感的性能。结果表明,表现最好的 GPT-4 模型在零次特征提取的 f1 分数上比基于规则的方法高出 23.6%;在 5 次特征提取的 f1 分数上进一步提高了 6%。在对正确预测的应用特征进行正面情感预测方面,GPT-4 的 f1 分数达到 74%,而 5-shot 则提高了 7%。我们的研究表明,LLM 模型有望生成针对用户评论特征的情感总结。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信