Interactive approach to the extraction of logical structures from unformatted document images using a sub-structure model

M. Yamaoka, O. Iwaki, N. Babaguchi, T. Kitahashi
{"title":"Interactive approach to the extraction of logical structures from unformatted document images using a sub-structure model","authors":"M. Yamaoka, O. Iwaki, N. Babaguchi, T. Kitahashi","doi":"10.1109/ICDAR.1999.791755","DOIUrl":null,"url":null,"abstract":"Describes a new document analysis method for unformatted documents such as advertisements or catalogs. Conventional model-based approaches to the extraction of logical structures are hard to apply to advertisements or catalogs, because a model of a page can't be defined. However, these kinds of documents have similar configurations of the regions that represent each product, where a local model of a local layout and logical structures can be defined. This model, which we call a sub-structure model, can be used as a template to extract the logical structures from other regions that represent the same kinds of products. In proposed system, a sub-structure model is captured through an interactive process with a user. The system was tested on advertisements in Japanese computer magazines and the experiments show promising results.","PeriodicalId":130039,"journal":{"name":"Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1999.791755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Describes a new document analysis method for unformatted documents such as advertisements or catalogs. Conventional model-based approaches to the extraction of logical structures are hard to apply to advertisements or catalogs, because a model of a page can't be defined. However, these kinds of documents have similar configurations of the regions that represent each product, where a local model of a local layout and logical structures can be defined. This model, which we call a sub-structure model, can be used as a template to extract the logical structures from other regions that represent the same kinds of products. In proposed system, a sub-structure model is captured through an interactive process with a user. The system was tested on advertisements in Japanese computer magazines and the experiments show promising results.
使用子结构模型从未格式化的文档图像中提取逻辑结构的交互式方法
描述一种用于广告或目录等未格式化文档的新文档分析方法。传统的基于模型的逻辑结构提取方法很难应用于广告或目录,因为页面的模型无法定义。但是,这些类型的文档具有表示每个产品的类似区域配置,可以在其中定义本地布局和逻辑结构的本地模型。这个模型,我们称之为子结构模型,可以作为模板从其他区域提取表示相同类型产品的逻辑结构。在该系统中,通过与用户的交互过程捕获子结构模型。该系统在日本电脑杂志的广告上进行了测试,实验显示出令人鼓舞的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信