Document understanding using probabilistic relaxation: application on tables of contents of periodicals

Frank Lebourgeois, H. Emptoz, S. Souafi-Bensafi
{"title":"Document understanding using probabilistic relaxation: application on tables of contents of periodicals","authors":"Frank Lebourgeois, H. Emptoz, S. Souafi-Bensafi","doi":"10.1109/ICDAR.2001.953841","DOIUrl":null,"url":null,"abstract":"This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility with other neighboring blocks at different hierarchical levels. It provides a logical layout that is globally compatible with the training model. We have tested this approach on reading tables of contents of periodicals for documents indexing. Probabilistic relaxation has interesting properties like high-speed training and the 'a priori' recognition rate, which provides the consistency of the model according to the features used, and the samples chosen among the training set.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility with other neighboring blocks at different hierarchical levels. It provides a logical layout that is globally compatible with the training model. We have tested this approach on reading tables of contents of periodicals for documents indexing. Probabilistic relaxation has interesting properties like high-speed training and the 'a priori' recognition rate, which provides the consistency of the model according to the features used, and the samples chosen among the training set.
基于概率松弛的文献理解:在期刊目录上的应用
本文描述了一个同时使用文本属性和文档布局的文档理解系统的统计模型。使用概率松弛作为识别方案来寻找逻辑布局的层次结构。这种方法通常用于图像分析中的像素分类,它可以根据文本块在不同层次上与其他相邻块的局部兼容性将文本块划分为逻辑类。它提供了一个与训练模型全局兼容的逻辑布局。我们在期刊目录检索中对这种方法进行了测试。概率松弛有一些有趣的特性,比如高速训练和“先验”识别率,它根据所使用的特征和在训练集中选择的样本提供模型的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信