Identifying the Central Figure of a Scientific Paper

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00173

Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe

{"title":"Identifying the Central Figure of a Scientific Paper","authors":"Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe","doi":"10.1109/ICDAR.2019.00173","DOIUrl":null,"url":null,"abstract":"Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a \"central figure\" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a "central figure" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.

查看原文本刊更多论文

识别科学论文的中心图形

出版商越来越多地使用图形摘要来促进科学搜索，特别是跨学科的搜索。它们呈现在各种媒体上，易于共享且信息丰富。然而，只有极少数的科学出版物配有图形摘要。对于绝大多数没有精选图形摘要的论文，我们该怎么办?在本文中，我们首先假设科学论文实际上包括一个作为图形抽象的“中心人物”。这些数字传达了关键的结果，并为论文提供了视觉识别。通过对15年来8653篇论文的6263位作者的调查数据，我们发现超过87%的论文被认为包含中心人物，这些中心人物主要用于总结重要结果，解释关键方法，或提供额外的讨论。然后，我们训练了一个模型来自动识别中心图形，达到了78%的前3名准确率和34%的精确匹配准确率。我们发现，准确度的主要提升来自于类似于摘要的图片标题。我们在https://github.com/viziometrics/centraul_figure上公开了所有的数据和结果。我们的目标是自动化中心图形识别，以提高搜索引擎的性能，并帮助科学家在文献中连接思想。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量