Analysis of document snippets as a basis for reconstruction

Markus Diem, Florian Kleber, Robert Sablatnig
{"title":"Analysis of document snippets as a basis for reconstruction","authors":"Markus Diem, Florian Kleber, Robert Sablatnig","doi":"10.2312/VAST/VAST09/101-108","DOIUrl":null,"url":null,"abstract":"In Archaeography, Philology, Forensics, and related research areas fragments of documents are very common. These fragments are the basis for the subsequent reconstruction process, where the goal is to make the original information spread over several fragments visible again. The fragments can originate from paper shredders, hand torn pages or in the case of ancient manuscripts this is due to bad storage conditions, or other destroying facts. So we can distinguish between an \"on-purpose\" destruction because the information contained on the pages should not be readable anymore or a \"time-induced\" destruction for ancient documents which is unintentional. Nevertheless the reconstruction of document fragments is an interesting research question. This paper shows a preliminary step for the page reconstruction namely the automatic orientation of snippets in order to eliminate the rotation in the later reconstruction (puzzling) process. Furthermore features like paper color and the color of the inks used are analyzed as a pre-classification step to find matching snippets. In the case of \"on-purpose\" destruction there is no a-priori information on which fragment belongs to which page which makes a reconstruction based on thousands of fragments from unknown sources difficult since the combinatorial effort explodes (NP-hardness). Preliminary results on orientation and color segmentation are presented and show that these pre-processing steps can be performed reliably and can be used for reconstruction and snippet classification.","PeriodicalId":168094,"journal":{"name":"IEEE Conference on Visual Analytics Science and Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Conference on Visual Analytics Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2312/VAST/VAST09/101-108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In Archaeography, Philology, Forensics, and related research areas fragments of documents are very common. These fragments are the basis for the subsequent reconstruction process, where the goal is to make the original information spread over several fragments visible again. The fragments can originate from paper shredders, hand torn pages or in the case of ancient manuscripts this is due to bad storage conditions, or other destroying facts. So we can distinguish between an "on-purpose" destruction because the information contained on the pages should not be readable anymore or a "time-induced" destruction for ancient documents which is unintentional. Nevertheless the reconstruction of document fragments is an interesting research question. This paper shows a preliminary step for the page reconstruction namely the automatic orientation of snippets in order to eliminate the rotation in the later reconstruction (puzzling) process. Furthermore features like paper color and the color of the inks used are analyzed as a pre-classification step to find matching snippets. In the case of "on-purpose" destruction there is no a-priori information on which fragment belongs to which page which makes a reconstruction based on thousands of fragments from unknown sources difficult since the combinatorial effort explodes (NP-hardness). Preliminary results on orientation and color segmentation are presented and show that these pre-processing steps can be performed reliably and can be used for reconstruction and snippet classification.
对文档片段进行分析,作为重建的基础
在考古学、文献学、法医学和相关研究领域,文献碎片是非常常见的。这些片段是后续重建过程的基础,重建过程的目标是使分散在几个片段上的原始信息再次可见。这些碎片可能来自碎纸机、手撕破的书页,对于古代手稿来说,这是由于储存条件不好或其他破坏因素造成的。因此,我们可以区分“故意”破坏(因为页面上包含的信息应该不再可读)和“时间诱导”破坏(对于古代文件来说,这是无意的)。然而,文件片段的重建是一个有趣的研究问题。本文介绍了页面重建的一个初步步骤,即片段的自动定位,以消除后期重建(迷惑)过程中的旋转。此外,分析纸张颜色和所用油墨的颜色等特征作为预分类步骤,以找到匹配的片段。在“故意”破坏的情况下,没有关于哪个片段属于哪个页面的先验信息,这使得基于来自未知来源的数千个片段的重建变得困难,因为组合努力爆炸了(np硬度)。给出了方向和颜色分割的初步结果,表明这些预处理步骤可以可靠地执行,并可用于图像重构和片段分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信