{"title":"利用深度学习方法检测堆栈溢出中的重复问题","authors":"Liting Wang, Li Zhang, Jing Jiang","doi":"10.1109/APSEC48747.2019.00074","DOIUrl":null,"url":null,"abstract":"Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"584 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Detecting Duplicate Questions in Stack Overflow via Deep Learning Approaches\",\"authors\":\"Liting Wang, Li Zhang, Jing Jiang\",\"doi\":\"10.1109/APSEC48747.2019.00074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.\",\"PeriodicalId\":325642,\"journal\":{\"name\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"584 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC48747.2019.00074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC48747.2019.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Duplicate Questions in Stack Overflow via Deep Learning Approaches
Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.