Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy
{"title":"用神经生成增强改进编程问答","authors":"Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy","doi":"10.1145/3539618.3591860","DOIUrl":null,"url":null,"abstract":"Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Programming Q&A with Neural Generative Augmentation\",\"authors\":\"Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy\",\"doi\":\"10.1145/3539618.3591860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.\",\"PeriodicalId\":425056,\"journal\":{\"name\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3539618.3591860\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539618.3591860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Programming Q&A with Neural Generative Augmentation
Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.