用神经生成增强改进编程问答

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2023-07-18 DOI:10.1145/3539618.3591860

Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy

{"title":"用神经生成增强改进编程问答","authors":"Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy","doi":"10.1145/3539618.3591860","DOIUrl":null,"url":null,"abstract":"Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Programming Q&A with Neural Generative Augmentation\",\"authors\":\"Suthee Chaidaroon, Xiao Zhang, Shruti Subramaniyam, Jeffrey Svajlenko, Tanya Shourya, I. Keivanloo, Ria Joy\",\"doi\":\"10.1145/3539618.3591860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.\",\"PeriodicalId\":425056,\"journal\":{\"name\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3539618.3591860\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539618.3591860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

知识密集型编程问答是一个活跃的研究领域。它的应用程序通过帮助开发人员从Internet上的大量信息中快速找到编程答案来提高开发人员的工作效率。在本研究中，我们提出ProQANS及其变体reqans和ReAugProQANS来解决编程问答问题。ProQANS是一种神经搜索方法，它利用互联网上未标记的数据(如StackOverflow)来缓解冷启动问题。reqans通过使用具有新颖三重损失的重新制定的查询来扩展ProQANS。我们进一步使用一个辅助生成模型来增强训练查询，并设计了一个新的双三重损失函数来适应这些生成的查询，以构建另一个称为ReAugProQANS的ReAugProQANS变体。在我们的实证实验中，我们发现ReAugProQANS在域内测试集上具有最佳性能，而ReAugProQANS在域外实际编程问题上具有更好的性能，在MRR指标上分别优于最先进的模型高达477%。结果表明它们对以前未见过的问题具有鲁棒性，并且在实际编程问题中具有广泛的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Programming Q&A with Neural Generative Augmentation

Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

自引率

0.00%

发文量