Abhishek Kumar, Deep Ghadiyali, S. Chimalakonda, Akhila Sri Manasa Venigalla
{"title":"SOCluster -通过回答问题来回答关于堆栈溢出的未回答问题","authors":"Abhishek Kumar, Deep Ghadiyali, S. Chimalakonda, Akhila Sri Manasa Venigalla","doi":"10.1145/3578527.3578544","DOIUrl":null,"url":null,"abstract":"Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising, which is observed in various similar community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. To address this issue, these communities have explored clustering mechanisms to answer unanswered questions using other answered questions in the same cluster, which could also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best improvement for the three evaluation metrics on all four datasets. We further manually assessed the content in the clusters to confirm the similarity of elements in clusters. This revealed clusters to correspond to topics such as mouse over effect, speed optimisation, how to perform ‘some’ action in JavaScript, and so on. The source code and tool are available for download on Github at: https://github.com/rishalab/SOCluster, and the demo can be found here: https://youtu.be/Ewm-M_rg_x8.","PeriodicalId":326318,"journal":{"name":"Proceedings of the 16th Innovations in Software Engineering Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SOCluster - Towards Answering Unanswered Questions on Stack Overflow via Answered Questions\",\"authors\":\"Abhishek Kumar, Deep Ghadiyali, S. Chimalakonda, Akhila Sri Manasa Venigalla\",\"doi\":\"10.1145/3578527.3578544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising, which is observed in various similar community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. To address this issue, these communities have explored clustering mechanisms to answer unanswered questions using other answered questions in the same cluster, which could also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best improvement for the three evaluation metrics on all four datasets. We further manually assessed the content in the clusters to confirm the similarity of elements in clusters. This revealed clusters to correspond to topics such as mouse over effect, speed optimisation, how to perform ‘some’ action in JavaScript, and so on. The source code and tool are available for download on Github at: https://github.com/rishalab/SOCluster, and the demo can be found here: https://youtu.be/Ewm-M_rg_x8.\",\"PeriodicalId\":326318,\"journal\":{\"name\":\"Proceedings of the 16th Innovations in Software Engineering Conference\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th Innovations in Software Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578527.3578544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Innovations in Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578527.3578544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SOCluster - Towards Answering Unanswered Questions on Stack Overflow via Answered Questions
Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising, which is observed in various similar community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. To address this issue, these communities have explored clustering mechanisms to answer unanswered questions using other answered questions in the same cluster, which could also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best improvement for the three evaluation metrics on all four datasets. We further manually assessed the content in the clusters to confirm the similarity of elements in clusters. This revealed clusters to correspond to topics such as mouse over effect, speed optimisation, how to perform ‘some’ action in JavaScript, and so on. The source code and tool are available for download on Github at: https://github.com/rishalab/SOCluster, and the demo can be found here: https://youtu.be/Ewm-M_rg_x8.