NLP中的再现性:我们从清单中学到了什么?

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-16 DOI:10.48550/arXiv.2306.09562

Ian H. Magnusson, Noah A. Smith, Jesse Dodge

{"title":"NLP中的再现性:我们从清单中学到了什么?","authors":"Ian H. Magnusson, Noah A. Smith, Jesse Dodge","doi":"10.48550/arXiv.2306.09562","DOIUrl":null,"url":null,"abstract":"Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first analysis of the Checklist by examining 10,405 anonymous responses to it. First, we find evidence of an increase in reporting of information on efficiency, validation performance, summary statistics, and hyperparameters after the Checklist's introduction. Further, we show acceptance rate grows for submissions with more Yes responses. We find that the 44% of submissions that gather new data are 5% less likely to be accepted than those that did not; the average reviewer-rated reproducibility of these submissions is also 2% lower relative to the rest. We find that only 46% of submissions claim to open-source their code, though submissions that do have 8% higher reproducibility score relative to those that do not, the most for any item. We discuss what can be inferred about the state of reproducibility in NLP, and provide a set of recommendations for future conferences, including: a) allowing submitting code and appendices one week after the deadline, and b) measuring dataset reproducibility by a checklist of data collection practices.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Reproducibility in NLP: What Have We Learned from the Checklist?\",\"authors\":\"Ian H. Magnusson, Noah A. Smith, Jesse Dodge\",\"doi\":\"10.48550/arXiv.2306.09562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first analysis of the Checklist by examining 10,405 anonymous responses to it. First, we find evidence of an increase in reporting of information on efficiency, validation performance, summary statistics, and hyperparameters after the Checklist's introduction. Further, we show acceptance rate grows for submissions with more Yes responses. We find that the 44% of submissions that gather new data are 5% less likely to be accepted than those that did not; the average reviewer-rated reproducibility of these submissions is also 2% lower relative to the rest. We find that only 46% of submissions claim to open-source their code, though submissions that do have 8% higher reproducibility score relative to those that do not, the most for any item. We discuss what can be inferred about the state of reproducibility in NLP, and provide a set of recommendations for future conferences, including: a) allowing submitting code and appendices one week after the deadline, and b) measuring dataset reproducibility by a checklist of data collection practices.\",\"PeriodicalId\":352845,\"journal\":{\"name\":\"Annual Meeting of the Association for Computational Linguistics\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Meeting of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2306.09562\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.09562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

自然语言处理的科学进步依赖于研究人员主张的可重复性。CL会议于2020年创建了NLP可重复性检查表，由作者在提交时完成，以提醒他们要包括的关键信息。我们通过检查对清单的10405个匿名回复，提供了对清单的第一次分析。首先，我们发现在引入清单之后，关于效率、验证性能、汇总统计和超参数的信息报告有所增加。此外，我们还显示，对于有更多“是”回应的提交，接受率会上升。我们发现，44%收集新数据的投稿被接受的可能性比没有收集新数据的投稿低5%;这些提交的平均审稿人评价的可重复性也比其他的低2%。我们发现只有46%的提交者声称开源了他们的代码，尽管与那些没有开源的提交者相比，这些提交者的可再现性得分要高8%，是所有项目中最高的。我们讨论了关于NLP可再现性状态的推断，并为未来的会议提供了一组建议，包括:a)允许在截止日期后一周提交代码和附录，b)通过数据收集实践清单衡量数据集的可再现性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reproducibility in NLP: What Have We Learned from the Checklist?

Scientific progress in NLP rests on the reproducibility of researchers' claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first analysis of the Checklist by examining 10,405 anonymous responses to it. First, we find evidence of an increase in reporting of information on efficiency, validation performance, summary statistics, and hyperparameters after the Checklist's introduction. Further, we show acceptance rate grows for submissions with more Yes responses. We find that the 44% of submissions that gather new data are 5% less likely to be accepted than those that did not; the average reviewer-rated reproducibility of these submissions is also 2% lower relative to the rest. We find that only 46% of submissions claim to open-source their code, though submissions that do have 8% higher reproducibility score relative to those that do not, the most for any item. We discuss what can be inferred about the state of reproducibility in NLP, and provide a set of recommendations for future conferences, including: a) allowing submitting code and appendices one week after the deadline, and b) measuring dataset reproducibility by a checklist of data collection practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量