Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina
{"title":"科学论文中页面对象检测的级联方法","authors":"Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina","doi":"10.1145/3558100.3563851","DOIUrl":null,"url":null,"abstract":"In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A cascaded approach for page-object detection in scientific papers\",\"authors\":\"Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina\",\"doi\":\"10.1145/3558100.3563851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.\",\"PeriodicalId\":146244,\"journal\":{\"name\":\"Proceedings of the 22nd ACM Symposium on Document Engineering\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3558100.3563851\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3558100.3563851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
近年来,页面对象检测(POD)已成为一种流行的文档理解任务,鉴于文档的潜在复杂性,它被证明是一项重要的任务。神经网络的兴起促进了一种更通用的学习方法来完成这项任务。然而,在文献中,不同的对象,如公式或数字等,通常是单独考虑的。在本文中,我们描述了与科学论文相关的六类对象的联合定位,即孤立公式、嵌入公式、图形、表格、变量和参考文献。通过对这些对象类的定性分析,我们注意到类之间的层次结构,并提出了一种新的定位方法,使用两个级联的You Only Look Once (YOLO)网络。我们还提出了一个由所有六个对象类的标记边界框组成的新数据集。该数据集结合了文献中常用的两个数据集进行公式定位,在这些数据集中的文档图像上添加了图形、表格、变量和参考文献的标签。使用这个数据集,我们在所有类中获得了0.755的平均f1分数,当单独考虑对象类进行本地化时,这与对象类的最新水平相当。
A cascaded approach for page-object detection in scientific papers
In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.