基于crf的参考书目字符串提取的实证评价

2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI:10.1109/DAS.2014.64

Manabu Ohta, Daiki Arauchi, A. Takasu, J. Adachi

{"title":"基于crf的参考书目字符串提取的实证评价","authors":"Manabu Ohta, Daiki Arauchi, A. Takasu, J. Adachi","doi":"10.1109/DAS.2014.64","DOIUrl":null,"url":null,"abstract":"This paper reports an empirical evaluation of a CRF-based bibliography parser we have developed for reference strings of research papers. The parser uses a conditional random field (CRF) to estimate the correct bibliographic label such as an author's name and a title for each token in a reference string. We applied the parser specifically designed for reference strings to three academic journals, an English one and two Japanese ones, published in Japan. Experiments showed (i) the parser correctly parsed from 90% to 94% of reference strings depending on the kinds of journals used and (ii) segmentation errors induced by tokenization considerably degraded the final parsing accuracies. This paper also discusses some future directions of the bibliography extraction based on a detailed analysis of the experiments.","PeriodicalId":220495,"journal":{"name":"2014 11th IAPR International Workshop on Document Analysis Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings\",\"authors\":\"Manabu Ohta, Daiki Arauchi, A. Takasu, J. Adachi\",\"doi\":\"10.1109/DAS.2014.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper reports an empirical evaluation of a CRF-based bibliography parser we have developed for reference strings of research papers. The parser uses a conditional random field (CRF) to estimate the correct bibliographic label such as an author's name and a title for each token in a reference string. We applied the parser specifically designed for reference strings to three academic journals, an English one and two Japanese ones, published in Japan. Experiments showed (i) the parser correctly parsed from 90% to 94% of reference strings depending on the kinds of journals used and (ii) segmentation errors induced by tokenization considerably degraded the final parsing accuracies. This paper also discusses some future directions of the bibliography extraction based on a detailed analysis of the experiments.\",\"PeriodicalId\":220495,\"journal\":{\"name\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2014.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2014.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文报告了我们为研究论文的参考字符串开发的基于crf的书目解析器的实证评估。解析器使用条件随机场(CRF)来估计正确的书目标签，例如引用字符串中每个标记的作者姓名和标题。我们将专门为参考字符串设计的解析器应用于在日本出版的三本学术期刊，一本英文期刊和两本日文期刊。实验表明:(1)解析器根据使用的日志类型正确解析了90%到94%的参考字符串;(2)由标记化引起的分割错误大大降低了最终解析的准确性。在对实验结果进行详细分析的基础上，探讨了今后书目抽取的发展方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Empirical Evaluation of CRF-Based Bibliography Extraction from Reference Strings

This paper reports an empirical evaluation of a CRF-based bibliography parser we have developed for reference strings of research papers. The parser uses a conditional random field (CRF) to estimate the correct bibliographic label such as an author's name and a title for each token in a reference string. We applied the parser specifically designed for reference strings to three academic journals, an English one and two Japanese ones, published in Japan. Experiments showed (i) the parser correctly parsed from 90% to 94% of reference strings depending on the kinds of journals used and (ii) segmentation errors induced by tokenization considerably degraded the final parsing accuracies. This paper also discusses some future directions of the bibliography extraction based on a detailed analysis of the experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 11th IAPR International Workshop on Document Analysis Systems

自引率

0.00%

发文量