Unsupervised Topic Discovery in User Comments

Christoph Stanik, Tim Pietz, W. Maalej
{"title":"Unsupervised Topic Discovery in User Comments","authors":"Christoph Stanik, Tim Pietz, W. Maalej","doi":"10.1109/RE51729.2021.00021","DOIUrl":null,"url":null,"abstract":"On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and grasp a large amount of user comments, which can be redundant and of a different quality. Consequently, researchers suggested automated approaches to extract valuable comments, e.g., through problem report classifiers. However, these approaches do not aggregate semantically similar comments into specific aspects to provide insights like how often users reported a certain problem.We introduce an approach for automatically discovering topics composed of semantically similar user comments based on deep bidirectional natural language processing algorithms. Stakeholders can use our approach without the need to configure critical parameters like the number of clusters. We present our approach and report on a rigorous multiple-step empirical evaluation to assess how cohesive and meaningful the resulting clusters are. Each evaluation step was peer-coded and resulted in inter-coder agreements of up to 98%, giving us high confidence in the approach. We also report a thematic analysis on the topics discovered from tweets in the telecommunication domain.","PeriodicalId":440285,"journal":{"name":"2021 IEEE 29th International Requirements Engineering Conference (RE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE51729.2021.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and grasp a large amount of user comments, which can be redundant and of a different quality. Consequently, researchers suggested automated approaches to extract valuable comments, e.g., through problem report classifiers. However, these approaches do not aggregate semantically similar comments into specific aspects to provide insights like how often users reported a certain problem.We introduce an approach for automatically discovering topics composed of semantically similar user comments based on deep bidirectional natural language processing algorithms. Stakeholders can use our approach without the need to configure critical parameters like the number of clusters. We present our approach and report on a rigorous multiple-step empirical evaluation to assess how cohesive and meaningful the resulting clusters are. Each evaluation step was peer-coded and resulted in inter-coder agreements of up to 98%, giving us high confidence in the approach. We also report a thematic analysis on the topics discovered from tweets in the telecommunication domain.
用户评论中的无监督主题发现
在Twitter等社交媒体平台上,用户经常与软件供应商和服务提供商分享他们的观点和评论。流行的软件产品每天可能会得到成千上万的用户评论。研究表明,这样的评论包含了对涉众有价值的信息,例如特性想法、问题报告或支持查询。然而,人工管理和掌握大量的用户评论是很困难的,这些评论可能是冗余的,而且质量参差不齐。因此,研究人员建议采用自动化方法提取有价值的评论,例如,通过问题报告分类器。然而,这些方法并没有将语义上相似的评论聚合到特定的方面,以提供诸如用户报告某个问题的频率之类的见解。提出了一种基于深度双向自然语言处理算法的语义相似用户评论主题自动发现方法。利益相关者可以使用我们的方法,而无需配置集群数量等关键参数。我们提出了我们的方法,并报告了一个严格的多步骤实证评估,以评估结果集群的凝聚力和意义。每个评估步骤都是对等编码的,导致编码人员之间的一致性高达98%,这给了我们对方法的高度信心。我们还报告了从电信领域的推文中发现的主题的专题分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信