Column segmentation by white space pattern matching

M. Ozaki
{"title":"Column segmentation by white space pattern matching","authors":"M. Ozaki","doi":"10.1109/ICDAR.1995.598960","DOIUrl":null,"url":null,"abstract":"Model-based column segmentation is described. Sequences of horizontal white space across a column are used as the basic features. Structures of columns in a specific publication are described by two levels of regular expressions: column expressions (CE) and element expressions (EE). Additional spatial constraints for element attributes can be described. A CE represents patterns of element sequences. An EE represents patterns of white space sequences for each element type. Segmentation is performed in three steps: element candidate extraction using EEs, column structure verification using the CE and ranking by comparison with statistical data. Experiments were performed on columns in two different scientific journals. More than 70% of the columns were correctly segmented as the top choice and more than 87% were in the top three choices. When spatial constraints were applied to element attributes, the rate was more than 90%.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1995.598960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Model-based column segmentation is described. Sequences of horizontal white space across a column are used as the basic features. Structures of columns in a specific publication are described by two levels of regular expressions: column expressions (CE) and element expressions (EE). Additional spatial constraints for element attributes can be described. A CE represents patterns of element sequences. An EE represents patterns of white space sequences for each element type. Segmentation is performed in three steps: element candidate extraction using EEs, column structure verification using the CE and ranking by comparison with statistical data. Experiments were performed on columns in two different scientific journals. More than 70% of the columns were correctly segmented as the top choice and more than 87% were in the top three choices. When spatial constraints were applied to element attributes, the rate was more than 90%.
通过空格模式匹配的列分割
描述了基于模型的列分割。横贯一列的水平留白序列被用作基本特征。特定发布中列的结构由两个级别的正则表达式描述:列表达式(CE)和元素表达式(EE)。可以描述元素属性的附加空间约束。CE表示元素序列的模式。EE表示每个元素类型的空白序列模式。分割分三个步骤进行:使用EEs提取候选元素,使用CE验证列结构,并通过与统计数据的比较进行排序。实验是在两种不同科学期刊的专栏上进行的。超过70%的列被正确分割为首选,超过87%的列在前三个选择中。当空间约束应用于元素属性时,成功率大于90%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信