{"title":"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":null,"url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\nvaluable insights into users' perceptions of app functionality and their\nevolving needs. Given the volume of user reviews received daily, an automated\nmechanism to generate feature-level sentiment summaries of user reviews is\nneeded. Recent advances in Large Language Models (LLMs) such as ChatGPT have\nshown impressive performance on several new tasks without updating the model's\nparameters i.e. using zero or a few labeled examples. Despite these\nadvancements, LLMs' capabilities to perform feature-specific sentiment analysis\nof user reviews remain unexplored. This study compares the performance of\nstate-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\nextracting app features and associated sentiments under 0-shot, 1-shot, and\n5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\nrule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\n5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\npositive sentiment towards correctly predicted app features, with 5-shot\nenhancing it by 7%. Our study suggests that LLM models are promising for\ngenerating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Analyzing user reviews for sentiment towards app features can provide
valuable insights into users' perceptions of app functionality and their
evolving needs. Given the volume of user reviews received daily, an automated
mechanism to generate feature-level sentiment summaries of user reviews is
needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have
shown impressive performance on several new tasks without updating the model's
parameters i.e. using zero or a few labeled examples. Despite these
advancements, LLMs' capabilities to perform feature-specific sentiment analysis
of user reviews remain unexplored. This study compares the performance of
state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for
extracting app features and associated sentiments under 0-shot, 1-shot, and
5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms
rule-based approaches by 23.6% in f1-score with zero-shot feature extraction;
5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting
positive sentiment towards correctly predicted app features, with 5-shot
enhancing it by 7%. Our study suggests that LLM models are promising for
generating feature-specific sentiment summaries of user reviews.