基于图像的历史寄存器动作分割模型中包含关键字位置

The 6th International Workshop on Historical Document Imaging and Processing Pub Date : 2021-09-05 DOI:10.1145/3476887.3476905

Mélodie Boillet, Martin Maarand, T. Paquet, Christopher Kermorvant

{"title":"基于图像的历史寄存器动作分割模型中包含关键字位置","authors":"Mélodie Boillet, Martin Maarand, T. Paquet, Christopher Kermorvant","doi":"10.1145/3476887.3476905","DOIUrl":null,"url":null,"abstract":"The segmentation of complex images into semantic regions has seen a growing interest these last years with the advent of Deep Learning. Until recently, most existing methods for Historical Document Analysis focused on the visual appearance of documents, ignoring the rich information that textual content can offer. However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information. In this paper, we focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts. An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon). We propose a simple pipeline to enrich document images with the position of text lines containing key-phrases and show that running a standard image-based layout analysis system on these images can lead to significant gains. Our experiments show that the detection of acts increases from 38 % of mAP to 74 % when adding textual information, in real use-case conditions where text lines positions and content are extracted with an automatic recognition system.","PeriodicalId":166776,"journal":{"name":"The 6th International Workshop on Historical Document Imaging and Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers\",\"authors\":\"Mélodie Boillet, Martin Maarand, T. Paquet, Christopher Kermorvant\",\"doi\":\"10.1145/3476887.3476905\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The segmentation of complex images into semantic regions has seen a growing interest these last years with the advent of Deep Learning. Until recently, most existing methods for Historical Document Analysis focused on the visual appearance of documents, ignoring the rich information that textual content can offer. However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information. In this paper, we focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts. An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon). We propose a simple pipeline to enrich document images with the position of text lines containing key-phrases and show that running a standard image-based layout analysis system on these images can lead to significant gains. Our experiments show that the detection of acts increases from 38 % of mAP to 74 % when adding textual information, in real use-case conditions where text lines positions and content are extracted with an automatic recognition system.\",\"PeriodicalId\":166776,\"journal\":{\"name\":\"The 6th International Workshop on Historical Document Imaging and Processing\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 6th International Workshop on Historical Document Imaging and Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3476887.3476905\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 6th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3476887.3476905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近年来，随着深度学习的出现，将复杂图像分割成语义区域的研究越来越受到关注。直到最近，大多数现有的历史文档分析方法都集中在文档的视觉外观上，而忽略了文本内容可以提供的丰富信息。然而，有时仅依靠视觉特征将复杂文档分割成语义区域是不可能的，最近的模型同时嵌入了视觉和文本信息。在本文中，我们着重于使用视觉和文本信息将历史寄存器分割成结构化和有意义的单位，如行为。行为是包含有价值知识的文字记录，如人口统计信息(洗礼、结婚或死亡)或王室决定(捐赠或赦免)。我们提出了一个简单的管道，用包含关键短语的文本行位置来丰富文档图像，并表明在这些图像上运行一个标准的基于图像的布局分析系统可以带来显着的收益。我们的实验表明，当添加文本信息时，在使用自动识别系统提取文本行位置和内容的实际用例条件下，mAP的行为检测率从38%提高到74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers

The segmentation of complex images into semantic regions has seen a growing interest these last years with the advent of Deep Learning. Until recently, most existing methods for Historical Document Analysis focused on the visual appearance of documents, ignoring the rich information that textual content can offer. However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information. In this paper, we focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts. An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon). We propose a simple pipeline to enrich document images with the position of text lines containing key-phrases and show that running a standard image-based layout analysis system on these images can lead to significant gains. Our experiments show that the detection of acts increases from 38 % of mAP to 74 % when adding textual information, in real use-case conditions where text lines positions and content are extracted with an automatic recognition system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 6th International Workshop on Historical Document Imaging and Processing

自引率

0.00%

发文量