SOCluster - Towards Answering Unanswered Questions on Stack Overflow via Answered Questions

Abhishek Kumar, Deep Ghadiyali, S. Chimalakonda, Akhila Sri Manasa Venigalla
{"title":"SOCluster - Towards Answering Unanswered Questions on Stack Overflow via Answered Questions","authors":"Abhishek Kumar, Deep Ghadiyali, S. Chimalakonda, Akhila Sri Manasa Venigalla","doi":"10.1145/3578527.3578544","DOIUrl":null,"url":null,"abstract":"Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising, which is observed in various similar community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. To address this issue, these communities have explored clustering mechanisms to answer unanswered questions using other answered questions in the same cluster, which could also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best improvement for the three evaluation metrics on all four datasets. We further manually assessed the content in the clusters to confirm the similarity of elements in clusters. This revealed clusters to correspond to topics such as mouse over effect, speed optimisation, how to perform ‘some’ action in JavaScript, and so on. The source code and tool are available for download on Github at: https://github.com/rishalab/SOCluster, and the demo can be found here: https://youtu.be/Ewm-M_rg_x8.","PeriodicalId":326318,"journal":{"name":"Proceedings of the 16th Innovations in Software Engineering Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Innovations in Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578527.3578544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising, which is observed in various similar community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. To address this issue, these communities have explored clustering mechanisms to answer unanswered questions using other answered questions in the same cluster, which could also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best improvement for the three evaluation metrics on all four datasets. We further manually assessed the content in the clusters to confirm the similarity of elements in clusters. This revealed clusters to correspond to topics such as mouse over effect, speed optimisation, how to perform ‘some’ action in JavaScript, and so on. The source code and tool are available for download on Github at: https://github.com/rishalab/SOCluster, and the demo can be found here: https://youtu.be/Ewm-M_rg_x8.
SOCluster -通过回答问题来回答关于堆栈溢出的未回答问题
Stack Overflow (SO)平台有一个庞大的问题和答案数据集,由用户之间的交互驱动。但在雅虎、Quora等各种类似的社区问答平台(Q&A)中,未解问题的数量在不断上升。为了解决这个问题,这些社区探索了集群机制,使用同一集群中其他已回答的问题来回答未回答的问题,这也可以提高对新问题的响应时间。在这里,我们提出了SOCluster,一种使用基于图的聚类方法对SO问题进行聚类的方法和工具。我们选择了不涉及代码片段和图像的10k、20k、30k和40k SO问题4个数据集,并对它们进行聚类。通过使用Silhouette系数、Calinkski-Harabasz指数和Davies-Bouldin指数等常用指标分析生成的聚类,我们对工具进行了初步评估。我们对8个不同的阈值相似度值进行了聚类,并通过三个评价指标分析了输出聚类所反映的有趣趋势。在90%的阈值相似度下,它显示了所有四个数据集上三个评估指标的最佳改进。我们进一步手动评估聚类中的内容,以确认聚类中元素的相似性。这揭示了集群对应于诸如鼠标悬停效果、速度优化、如何在JavaScript中执行“某些”操作等主题。源代码和工具可以在Github上下载:https://github.com/rishalab/SOCluster, demo可以在这里找到:https://youtu.be/Ewm-M_rg_x8。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信