HeeKwan Koo, Taehong Kim, H. Chun, Dongmin Seo, Hanmin Jung, Sungin Lee
{"title":"Effects of Unpopular Citation Fields in Citation Matching Performance","authors":"HeeKwan Koo, Taehong Kim, H. Chun, Dongmin Seo, Hanmin Jung, Sungin Lee","doi":"10.1109/ICISA.2011.5772372","DOIUrl":null,"url":null,"abstract":"Citation matching is a problem of identifying which citations correspond to the same publication. Previous studies on citation matching select typically from a corpus or database of citation records, such as CORA, an arbitrary set of citation record fields such as author, title - a practice informed by \"common sense\" - in order to automatically group citations that refer to the same document. This study describes a systematic and computational approach to extract out the 'best candidate' citation record fields, to propose that there is always the best combination of citation record fields that helps increase citation matching performance and is applicable regardless of which research framework one may adopt, such as Machine Learning methods or Information Retrieval algorithms. Cross comparisons between previous studies and our approach, shown as pairwise F1 measures, within our framework based on field selection are presented.","PeriodicalId":425210,"journal":{"name":"2011 International Conference on Information Science and Applications","volume":"1964 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Information Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISA.2011.5772372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Citation matching is a problem of identifying which citations correspond to the same publication. Previous studies on citation matching select typically from a corpus or database of citation records, such as CORA, an arbitrary set of citation record fields such as author, title - a practice informed by "common sense" - in order to automatically group citations that refer to the same document. This study describes a systematic and computational approach to extract out the 'best candidate' citation record fields, to propose that there is always the best combination of citation record fields that helps increase citation matching performance and is applicable regardless of which research framework one may adopt, such as Machine Learning methods or Information Retrieval algorithms. Cross comparisons between previous studies and our approach, shown as pairwise F1 measures, within our framework based on field selection are presented.