Effects of Unpopular Citation Fields in Citation Matching Performance

2011 International Conference on Information Science and Applications Pub Date : 2011-04-26 DOI:10.1109/ICISA.2011.5772372

HeeKwan Koo, Taehong Kim, H. Chun, Dongmin Seo, Hanmin Jung, Sungin Lee

引用次数: 4

Abstract

Citation matching is a problem of identifying which citations correspond to the same publication. Previous studies on citation matching select typically from a corpus or database of citation records, such as CORA, an arbitrary set of citation record fields such as author, title - a practice informed by "common sense" - in order to automatically group citations that refer to the same document. This study describes a systematic and computational approach to extract out the 'best candidate' citation record fields, to propose that there is always the best combination of citation record fields that helps increase citation matching performance and is applicable regardless of which research framework one may adopt, such as Machine Learning methods or Information Retrieval algorithms. Cross comparisons between previous studies and our approach, shown as pairwise F1 measures, within our framework based on field selection are presented.

查看原文本刊更多论文

非热门引文领域对引文匹配绩效的影响

引文匹配是识别哪些引文对应于同一出版物的问题。以前关于引文匹配的研究通常是从引文记录的语料库或数据库(如CORA)中选择，这是一组任意的引文记录字段，如作者、标题——这是一种基于“常识”的做法——以便自动对引用同一文档的引文进行分组。本研究描述了一种系统的计算方法来提取“最佳候选”引文记录字段，并提出无论采用哪种研究框架(如机器学习方法或信息检索算法)，都存在有助于提高引文匹配性能的最佳引文记录字段组合。在我们的基于领域选择的框架内，提出了以前的研究与我们的方法之间的交叉比较，显示为成对F1测量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Information Science and Applications

自引率

0.00%

发文量