Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
O. I. Babina
{"title":"Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus","authors":"O. I. Babina","doi":"10.3103/S0005105524010060","DOIUrl":null,"url":null,"abstract":"<p>The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105524010060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.

Abstract Image

从客户反馈语料库中挖掘观点方面的主题建模
摘要 本文介绍了一种从文本内容中提取意见方面的方法,即识别客户对给定对象的评价参数。这些参数是形成客户对产品或服务的态度的基础。所提出的方法利用主题建模工具来划分词汇类别,这些词汇的语义与影响客户对对象看法的参数相一致。我们的研究特别探讨了如何应用 BERTopic 模型作为主题建模工具来应对这一挑战。所概述的方法包含几个连续步骤,其中包括文本数据的预处理,包括删除停顿词、转换为小写字符和词法化。此外,从语料库中提取名词性、动词性和形容词性的单成分和多成分短语后,还特别考虑了意见方面的独特词汇表现。随后,将语料库中的句子表示为特征空间中的向量,该特征空间由提取的单词和短语表示。最后一步是在客户评论语料库中使用 BERTopic 模型,利用语料句子的向量表示进行主题建模。实验研究是在特定领域的俄语语料库上进行的,该语料库包括从客户评论网站上收集的客户对航空公司服务的反馈。然后将得出的主题分布与人工构建的领域概念模型进行对比。对比分析表明,自动主题分布与该领域的概念结构相一致,精确度为 0.955,召回率为 0.875。这些研究结果肯定了使用 BERTopic 模型解决基于语料库的意见挖掘问题的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS COMPUTER SCIENCE, INFORMATION SYSTEMS-
自引率
40.00%
发文量
18
期刊介绍: Automatic Documentation and Mathematical Linguistics  is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信