Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE.

K Bretonnel Cohen, Jingbo Xia, Christophe Roeder, Lawrence E Hunter
{"title":"Reproducibility in Natural Language Processing: A Case Study of Two R Libraries for Mining PubMed/MEDLINE.","authors":"K Bretonnel Cohen, Jingbo Xia, Christophe Roeder, Lawrence E Hunter","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.</p>","PeriodicalId":91924,"journal":{"name":"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860830/pdf/nihms925915.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"LREC ... International Conference on Language Resources & Evaluation : [proceedings]. International Conference on Language Resources & Evaluation","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

There is currently a crisis in science related to highly publicized failures to reproduce large numbers of published studies. The current work proposes, by way of case studies, a methodology for moving the study of reproducibility in computational work to a full stage beyond that of earlier work. Specifically, it presents a case study in attempting to reproduce the reports of two R libraries for doing text mining of the PubMed/MEDLINE repository of scientific publications. The main findings are that a rational paradigm for reproduction of natural language processing papers can be established; the advertised functionality was difficult, but not impossible, to reproduce; and reproducibility studies can produce additional insights into the functioning of the published system. Additionally, the work on reproducibility lead to the production of novel user-centered documentation that has been accessed 260 times since its publication-an average of once a day per library.

Abstract Image

Abstract Image

自然语言处理的可重复性:用于挖掘 PubMed/MEDLINE 的两个 R 库的案例研究》。
目前,科学界正面临着一场危机,大量已发表的研究成果无法再现的情况广为人知。当前的研究通过案例研究,提出了一种方法论,将计算工作中的可重复性研究提升到超越早期研究的全面阶段。具体地说,它提出了一个案例研究,试图重现两个 R 库的报告,以对 PubMed/MEDLINE 科学出版物库进行文本挖掘。主要发现包括:可以建立复制自然语言处理论文的合理范式;广告宣传的功能难以复制,但并非不可能;可重复性研究可以对已发表系统的功能产生更多的了解。此外,关于可重复性的工作还促成了以用户为中心的新文档的制作,该文档自发布以来已被访问了 260 次--平均每个图书馆每天访问一次。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信