K Bretonnel Cohen, Jingbo Xia, Christophe Roeder, Lawrence E Hunter
{"title":"自然语言处理的可重复性:用于挖掘 PubMed/MEDLINE 的两个 R 库的案例研究》。","authors":"K Bretonnel Cohen, Jingbo Xia, Christophe Roeder, Lawrence E Hunter","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.</p>","PeriodicalId":91924,"journal":{"name":"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860830/pdf/nihms925915.pdf","citationCount":"0","resultStr":"{\"title\":\"Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE.\",\"authors\":\"K Bretonnel Cohen, Jingbo Xia, Christophe Roeder, Lawrence E Hunter\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.</p>\",\"PeriodicalId\":91924,\"journal\":{\"name\":\"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860830/pdf/nihms925915.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
目前,科学界正面临着一场危机,大量已发表的研究成果无法再现的情况广为人知。当前的研究通过案例研究,提出了一种方法论,将计算工作中的可重复性研究提升到超越早期研究的全面阶段。具体地说,它提出了一个案例研究,试图重现两个 R 库的报告,以对 PubMed/MEDLINE 科学出版物库进行文本挖掘。主要发现包括:可以建立复制自然语言处理论文的合理范式;广告宣传的功能难以复制,但并非不可能;可重复性研究可以对已发表系统的功能产生更多的了解。此外,关于可重复性的工作还促成了以用户为中心的新文档的制作,该文档自发布以来已被访问了 260 次--平均每个图书馆每天访问一次。
Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE.
There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.