LinkSO:用于学习检索软件开发论坛上的类似问题答案对的数据集

Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai
{"title":"LinkSO:用于学习检索软件开发论坛上的类似问题答案对的数据集","authors":"Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai","doi":"10.1145/3283812.3283815","DOIUrl":null,"url":null,"abstract":"We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.","PeriodicalId":231305,"journal":{"name":"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums\",\"authors\":\"Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai\",\"doi\":\"10.1145/3283812.3283815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.\",\"PeriodicalId\":231305,\"journal\":{\"name\":\"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3283812.3283815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3283812.3283815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

我们展示了LinkSO,一个用于学习对Stack Overflow上的类似问题进行排序的数据集。Stack Overflow包含大量高质量的众包问题链接,这为评估基于社区的问答(cQA)档案的检索算法和学习对这些档案进行排序提供了一个很好的机会。然而,由于缺失环节的存在,问题环节能否作为评价的相关性判断成为一个问题。我们通过测量问题链接和相关性判断之间的紧密程度来研究这个问题,我们发现它们的一致性在80%到88%之间。我们对LinkSO现有工作的绩效进行了实证研究。虽然现有的工作主要集中在非学习方法上,但我们的研究结果表明,基于学习的方法在进一步提高检索性能方面具有很大的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums
We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信