表结构识别方法的再现性和可复制性研究

Kehinde E. Ajayi, Muntabhir Hasan Choudhury, S. Rajtmajer, Jian Wu
{"title":"表结构识别方法的再现性和可复制性研究","authors":"Kehinde E. Ajayi, Muntabhir Hasan Choudhury, S. Rajtmajer, Jian Wu","doi":"10.48550/arXiv.2304.10439","DOIUrl":null,"url":null,"abstract":"Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Study on Reproducibility and Replicability of Table Structure Recognition Methods\",\"authors\":\"Kehinde E. Ajayi, Muntabhir Hasan Choudhury, S. Rajtmajer, Jian Wu\",\"doi\":\"10.48550/arXiv.2304.10439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.\",\"PeriodicalId\":294655,\"journal\":{\"name\":\"IEEE International Conference on Document Analysis and Recognition\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.10439\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.10439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着研究人员报告称,直接复制该领域已发表的研究结果的尝试失败,人们开始担心人工智能(AI)的可重复性。可复制性,即在新数据上使用相同程序确认发现的能力,还没有得到很好的研究。在本文中,我们研究了关于表结构识别(TSR)的16篇论文的语料库的再现性和可复制性,TSR是一项旨在识别数字文档中表的单元位置的人工智能任务。我们尝试使用原始作者提供的代码和数据集重现已发表的结果。然后,我们使用与原始数据集类似的数据集以及一个新的数据集GenTSR来检查可复制性,该数据集由386个从科学论文中提取的带注释的表组成。在研究的16篇论文中,我们只在4篇中重现了与原文一致的结果。四篇论文中的两篇在一定的IoU值下使用类似的数据集被确定为可复制的。没有论文被确定为可复制使用新的数据集。我们提供对不可复制性和不可复制性的原因的观察。所有代码和数据都可以在Codeocean上获得,网址为https://codeocean.com/capsule/6680116/tree。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Study on Reproducibility and Replicability of Table Structure Recognition Methods
Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信