Proceedings of the 22nd ACM Symposium on Document Engineering最新文献

From print to online newspapers on small displays: a layout generation approach aimed at preserving entry points 从印刷到在线小屏幕报纸:一种旨在保留入口点的版面生成方法

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563847

Sebastián Gallardo Díaz, Dorian Mazauric, Pierre Kornprobst

引用次数: 0

A cascaded approach for page-object detection in scientific papers 科学论文中页面对象检测的级联方法

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563851

Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina

{"title":"A cascaded approach for page-object detection in scientific papers","authors":"Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina","doi":"10.1145/3558100.3563851","DOIUrl":"https://doi.org/10.1145/3558100.3563851","url":null,"abstract":"In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122391431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Academic writing and publishing beyond documents 学术写作和出版以外的文件

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563840

C. Mahlow, M. Piotrowski

引用次数: 2

Binarization of photographed documents image quality, processing time and size assessment 二值化对拍摄的文档图像质量、处理时间和尺寸进行评估

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3564159

R. Lins, R. Bernardino, Ricardo da Silva Barboza, S. Simske

引用次数: 2

Detecting malware using text documents extracted from spam email through machine learning 通过机器学习从垃圾邮件中提取文本文档来检测恶意软件

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563854

Luis Ángel Redondo-Gutierrez, Francisco Jáñez-Martino, Eduardo FIDALGO, Enrique Alegre, V. González-Castro, R. Alaíz-Rodríguez

引用次数: 0

Anonymizing and obfuscating PDF content while preserving document structure 匿名化和模糊化PDF内容，同时保留文档结构

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563849

Charlotte Curtis

引用次数: 0

Triplet transformer network for multi-label document classification 三联体变压器网络多标签文档分类

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563843

J. Melsbach, Sven Stahlmann, Stefan Hirschmeier, D. Schoder

引用次数: 1

Tab this folder of documents: page stream segmentation of business documents 标签此文件夹的文件:页流分割的业务文件

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563852

Thisanaporn Mungmeeprued, Yuxin Ma, Nisarg Mehta, Aldo Lipani

引用次数: 3

Optical character recognition with transformers and CTC 光学字符识别与变压器和CTC

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563845

Israel Campiotti, R. Lotufo

{"title":"Optical character recognition with transformers and CTC","authors":"Israel Campiotti, R. Lotufo","doi":"10.1145/3558100.3563845","DOIUrl":"https://doi.org/10.1145/3558100.3563845","url":null,"abstract":"Text recognition tasks are commonly solved by using a deep learning pipeline called CRNN. The classical CRNN is a sequence of a convolutional network, followed by a bidirectional LSTM and a CTC layer. In this paper, we perform an extensive analysis of the components of a CRNN to find what is crucial to the entire pipeline and what characteristics can be exchanged for a more effective choice. Given the results of our experiments, we propose two different architectures for the task of text recognition. The first model, CNN + CTC, is a combination of a convolutional model followed by a CTC layer. The second model, CNN + Tr + CTC, adds an encoder-only Transformers between the convolutional network and the CTC layer. To the best of our knowledge, this is the first time that a Transformers have been successfully trained using just CTC loss. To assess the capabilities of our proposed architectures, we train and evaluate them on the SROIE 2019 data set. Our CNN + CTC achieves an F1 score of 89.66% possessing only 4.7 million parameters. CNN + Tr + CTC attained an F1 score of 93.76% with 11 million parameters, which is almost 97% of the performance achieved by the TrOCR using 334 million parameters and more than 600 million synthetic images for pretraining.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127538270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Chinese public procurement document harvesting pipeline 中国公共采购文件收集管道

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI: 10.1145/3558100.3563848

Danrun Cao, Oussama Ahmia, Nicolas Béchet, P. Marteau

引用次数: 1