{"title":"Reference metadata extraction from scientific papers","authors":"Zhixin Guo, Hai Jin","doi":"10.1109/PDCAT.2011.72","DOIUrl":null,"url":null,"abstract":"Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-structured, and the automatic reference metadata extraction process becomes an important task. This paper describes a framework for automatic reference metadata extraction from scientific papers. Our system can extract title, author, journal, volume, year, and page from scientific papers in PDF. We utilize a document metadata knowledge base to guide the reference metadata extraction process. The experiment results show that our system achieves a high accuracy.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2011.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Bibliographical information of scientific papers is of great value since the Science Citation Index is introduced to measure research impact. Most scientific documents available on the web are unstructured or semi-structured, and the automatic reference metadata extraction process becomes an important task. This paper describes a framework for automatic reference metadata extraction from scientific papers. Our system can extract title, author, journal, volume, year, and page from scientific papers in PDF. We utilize a document metadata knowledge base to guide the reference metadata extraction process. The experiment results show that our system achieves a high accuracy.