{"title":"基于语义和相关性方法的堆栈溢出重复问题检测","authors":"Zhifang Liao, Wen-Xiong Li, Yan Zhang, Song Yu","doi":"10.1109/APSEC53868.2021.00019","DOIUrl":null,"url":null,"abstract":"Stack Overflow is a popular online Q&A website related to programming. Although Stack Overflow has detailed questioning guidance, duplicate questions still appear frequently, and a large number of duplicate questions make the quality of the community degraded. To solve this problem, Stack Overflow allows users with high reputations to manually mark duplicate questions. However, this method is inefficient and causes many duplicate questions to remain undiscovered. Therefore, this paper proposes a duplicate questions detection model based on semantic and relevance. The model employs Siamese BiLSTM to encode question pairs and captures the semantic interaction information of title and body through soft align attention and inference composition. The soft term match captures the relevance information in the title. We evaluate the effectiveness of the model in six question groups on Stack Overflow. Compared with the latest deep learning model, the F1-Score and ACC of our model increased by 9.401% and 8.901%, respectively. Experimental results show that our model outperforms the baselines and achieves competitive performance.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Detecting Duplicate Questions in Stack Overflow via Semantic and Relevance Approaches\",\"authors\":\"Zhifang Liao, Wen-Xiong Li, Yan Zhang, Song Yu\",\"doi\":\"10.1109/APSEC53868.2021.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stack Overflow is a popular online Q&A website related to programming. Although Stack Overflow has detailed questioning guidance, duplicate questions still appear frequently, and a large number of duplicate questions make the quality of the community degraded. To solve this problem, Stack Overflow allows users with high reputations to manually mark duplicate questions. However, this method is inefficient and causes many duplicate questions to remain undiscovered. Therefore, this paper proposes a duplicate questions detection model based on semantic and relevance. The model employs Siamese BiLSTM to encode question pairs and captures the semantic interaction information of title and body through soft align attention and inference composition. The soft term match captures the relevance information in the title. We evaluate the effectiveness of the model in six question groups on Stack Overflow. Compared with the latest deep learning model, the F1-Score and ACC of our model increased by 9.401% and 8.901%, respectively. Experimental results show that our model outperforms the baselines and achieves competitive performance.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Duplicate Questions in Stack Overflow via Semantic and Relevance Approaches
Stack Overflow is a popular online Q&A website related to programming. Although Stack Overflow has detailed questioning guidance, duplicate questions still appear frequently, and a large number of duplicate questions make the quality of the community degraded. To solve this problem, Stack Overflow allows users with high reputations to manually mark duplicate questions. However, this method is inefficient and causes many duplicate questions to remain undiscovered. Therefore, this paper proposes a duplicate questions detection model based on semantic and relevance. The model employs Siamese BiLSTM to encode question pairs and captures the semantic interaction information of title and body through soft align attention and inference composition. The soft term match captures the relevance information in the title. We evaluate the effectiveness of the model in six question groups on Stack Overflow. Compared with the latest deep learning model, the F1-Score and ACC of our model increased by 9.401% and 8.901%, respectively. Experimental results show that our model outperforms the baselines and achieves competitive performance.