Saikat Mondal, C. Saifullah, Avijit Bhattacharjee, M. M. Rahman, C. Roy
{"title":"Early Detection and Guidelines to Improve Unanswered Questions on Stack Overflow","authors":"Saikat Mondal, C. Saifullah, Avijit Bhattacharjee, M. M. Rahman, C. Roy","doi":"10.1145/3452383.3452392","DOIUrl":null,"url":null,"abstract":"Stack Overflow is one of the largest and most popular question-answering (Q&A) websites. It accumulates millions of programming related questions and answers to support the developers in software development. Unfortunately, a large number of questions are not answered at all, which might hurt the quality or purpose of this community-oriented knowledge base. Up to 29% of Stack Overflow questions do not have any answers. There have been existing attempts in detecting the unanswered questions. Unfortunately, they primarily rely on the question attributes (e.g., score, view count) that are not available during the submission of a question. Detection of the potentially unanswered questions in advance during question submission could help one improve the question and thus receive the answers in time. In this paper, we compare unanswered and answered questions quantitatively and qualitatively by analyzing a total of 4.8 million questions from Stack Overflow. We find that topics discussed in the question, the experience of the question submitter, and readability of question texts could often determine whether a question would be answered or not. Our qualitative study also reveals several other non-trivial factors that not only explain (partially) why the questions remain unanswered but also guide the novice users to improve their questions. We develop four machine learning models to predict the unanswered questions during their submission. According to the experiments, our models predict the unanswered questions with a maximum of about 79% accuracy and significantly outperform the state-of-the-art prediction models.","PeriodicalId":378352,"journal":{"name":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452383.3452392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Stack Overflow is one of the largest and most popular question-answering (Q&A) websites. It accumulates millions of programming related questions and answers to support the developers in software development. Unfortunately, a large number of questions are not answered at all, which might hurt the quality or purpose of this community-oriented knowledge base. Up to 29% of Stack Overflow questions do not have any answers. There have been existing attempts in detecting the unanswered questions. Unfortunately, they primarily rely on the question attributes (e.g., score, view count) that are not available during the submission of a question. Detection of the potentially unanswered questions in advance during question submission could help one improve the question and thus receive the answers in time. In this paper, we compare unanswered and answered questions quantitatively and qualitatively by analyzing a total of 4.8 million questions from Stack Overflow. We find that topics discussed in the question, the experience of the question submitter, and readability of question texts could often determine whether a question would be answered or not. Our qualitative study also reveals several other non-trivial factors that not only explain (partially) why the questions remain unanswered but also guide the novice users to improve their questions. We develop four machine learning models to predict the unanswered questions during their submission. According to the experiments, our models predict the unanswered questions with a maximum of about 79% accuracy and significantly outperform the state-of-the-art prediction models.