挖掘Reddit作为软件需求的新来源

Tahira Iqbal, Moniba Khan, K. Taveter, N. Seyff
{"title":"挖掘Reddit作为软件需求的新来源","authors":"Tahira Iqbal, Moniba Khan, K. Taveter, N. Seyff","doi":"10.1109/RE51729.2021.00019","DOIUrl":null,"url":null,"abstract":"Mining app stores and social media has proven to be a good source for collecting user feedback to foster requirements engineering and software evolution. Recent literature on mining software-related data from social platforms, such as Twitter and Facebook, shows that it complements app store mining. However, there are many other platforms where users discuss and provide feedback on software applications that are not thoroughly researched and analysed. One of such platforms is reddit. In this paper, we introduce reddit as a new potential data source and explore if and how requirements engineering and software evolution can benefit from obtaining user feedback from reddit. We also present an exploratory study in which we analysed the usage characteristics (i.e., frequency of posts, number of comments, and number of users for each subreddit) of reddit posts about software applications. Furthermore, we examined the content of the posts and the results reveal that almost 54% of posts contain useful information. Finally, we investigated the potential of automatic classification and applied machine learning algorithms to unstructured and noisy reddit data to perform automated classification into the categories of bug reports, feature related, and irrelevant. We found that the Support Vector Machine algorithm with the F1-score of 84% can be effective in categorizing reddit posts. Our results show that reddit posts provide useful feedback on software applications that can foster requirements engineering and software evolution.","PeriodicalId":440285,"journal":{"name":"2021 IEEE 29th International Requirements Engineering Conference (RE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Mining Reddit as a New Source for Software Requirements\",\"authors\":\"Tahira Iqbal, Moniba Khan, K. Taveter, N. Seyff\",\"doi\":\"10.1109/RE51729.2021.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mining app stores and social media has proven to be a good source for collecting user feedback to foster requirements engineering and software evolution. Recent literature on mining software-related data from social platforms, such as Twitter and Facebook, shows that it complements app store mining. However, there are many other platforms where users discuss and provide feedback on software applications that are not thoroughly researched and analysed. One of such platforms is reddit. In this paper, we introduce reddit as a new potential data source and explore if and how requirements engineering and software evolution can benefit from obtaining user feedback from reddit. We also present an exploratory study in which we analysed the usage characteristics (i.e., frequency of posts, number of comments, and number of users for each subreddit) of reddit posts about software applications. Furthermore, we examined the content of the posts and the results reveal that almost 54% of posts contain useful information. Finally, we investigated the potential of automatic classification and applied machine learning algorithms to unstructured and noisy reddit data to perform automated classification into the categories of bug reports, feature related, and irrelevant. We found that the Support Vector Machine algorithm with the F1-score of 84% can be effective in categorizing reddit posts. Our results show that reddit posts provide useful feedback on software applications that can foster requirements engineering and software evolution.\",\"PeriodicalId\":440285,\"journal\":{\"name\":\"2021 IEEE 29th International Requirements Engineering Conference (RE)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 29th International Requirements Engineering Conference (RE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RE51729.2021.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 29th International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE51729.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

挖掘应用商店和社交媒体已被证明是收集用户反馈以促进需求工程和软件进化的良好来源。最近关于从社交平台(如Twitter和Facebook)挖掘软件相关数据的文献表明,它是对应用商店挖掘的补充。然而,还有许多其他平台,用户讨论和提供对软件应用程序的反馈,这些应用程序没有经过彻底的研究和分析。reddit就是这样一个平台。在本文中,我们介绍了reddit作为一个新的潜在数据源,并探讨了需求工程和软件进化是否以及如何从reddit获得用户反馈中受益。我们还提出了一项探索性研究,其中我们分析了关于软件应用的reddit帖子的使用特征(即帖子的频率,评论的数量和每个子reddit的用户数量)。此外,我们检查了帖子的内容,结果显示,几乎54%的帖子包含有用的信息。最后,我们研究了自动分类的潜力,并将机器学习算法应用于非结构化和嘈杂的reddit数据,以执行错误报告、功能相关和不相关类别的自动分类。我们发现f1得分为84%的支持向量机算法可以有效地对reddit帖子进行分类。我们的结果表明,reddit帖子为软件应用程序提供了有用的反馈,可以促进需求工程和软件进化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Mining Reddit as a New Source for Software Requirements
Mining app stores and social media has proven to be a good source for collecting user feedback to foster requirements engineering and software evolution. Recent literature on mining software-related data from social platforms, such as Twitter and Facebook, shows that it complements app store mining. However, there are many other platforms where users discuss and provide feedback on software applications that are not thoroughly researched and analysed. One of such platforms is reddit. In this paper, we introduce reddit as a new potential data source and explore if and how requirements engineering and software evolution can benefit from obtaining user feedback from reddit. We also present an exploratory study in which we analysed the usage characteristics (i.e., frequency of posts, number of comments, and number of users for each subreddit) of reddit posts about software applications. Furthermore, we examined the content of the posts and the results reveal that almost 54% of posts contain useful information. Finally, we investigated the potential of automatic classification and applied machine learning algorithms to unstructured and noisy reddit data to perform automated classification into the categories of bug reports, feature related, and irrelevant. We found that the Support Vector Machine algorithm with the F1-score of 84% can be effective in categorizing reddit posts. Our results show that reddit posts provide useful feedback on software applications that can foster requirements engineering and software evolution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信