Daiki Namikoshi, Manabu Ohta, A. Takasu, J. Adachi
{"title":"CRF-based bibliography extraction from reference strings using a small amount of training data","authors":"Daiki Namikoshi, Manabu Ohta, A. Takasu, J. Adachi","doi":"10.1109/ICDIM.2017.8244665","DOIUrl":null,"url":null,"abstract":"The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.","PeriodicalId":144953,"journal":{"name":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","volume":"133 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2017.8244665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.