一个简单的系统,表格提取不考虑边界厚度和去除检测到的假线

S. Deivalakshmi
{"title":"一个简单的系统,表格提取不考虑边界厚度和去除检测到的假线","authors":"S. Deivalakshmi","doi":"10.1109/ICICI.2017.8365236","DOIUrl":null,"url":null,"abstract":"Several types of table layout structures are ubiquitous in digitalized document images and are characterized by their row and column separators. Document image may consist of several undesirable lines introduced due to improper scanning, crease formation, accidental remarks etc., in addition to the desired lines in tables. Since tables being an effective component of document images for representing an information, one needs to extract the table from the document images. The proposed method aims at removing unwanted straight lines in binary document images, without affecting the essential details of the table by a two-step process. In the first step, the extraction of necessary details of the tables containing lines as row and column separators along with their respective frames is performed using Mask Processing. The second step involves the detection and removal of all straight lines using a Pseudo Diagonal Image (PDI) and its rotation. The proposed method exploits the novelty in utilizing a single mask for the detection of tables instead of multiple masks, hence the computational complexity for processing is lesser. Independency in the thickness of table boundary while extraction is also an effective characterization of the proposed algorithm. The Obtained result shows 93.35% precision and 92.33% recall.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A simple system for table extraction irrespective of boundary thickness and removal of detected spurious lines\",\"authors\":\"S. Deivalakshmi\",\"doi\":\"10.1109/ICICI.2017.8365236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Several types of table layout structures are ubiquitous in digitalized document images and are characterized by their row and column separators. Document image may consist of several undesirable lines introduced due to improper scanning, crease formation, accidental remarks etc., in addition to the desired lines in tables. Since tables being an effective component of document images for representing an information, one needs to extract the table from the document images. The proposed method aims at removing unwanted straight lines in binary document images, without affecting the essential details of the table by a two-step process. In the first step, the extraction of necessary details of the tables containing lines as row and column separators along with their respective frames is performed using Mask Processing. The second step involves the detection and removal of all straight lines using a Pseudo Diagonal Image (PDI) and its rotation. The proposed method exploits the novelty in utilizing a single mask for the detection of tables instead of multiple masks, hence the computational complexity for processing is lesser. Independency in the thickness of table boundary while extraction is also an effective characterization of the proposed algorithm. The Obtained result shows 93.35% precision and 92.33% recall.\",\"PeriodicalId\":369524,\"journal\":{\"name\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICI.2017.8365236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

几种表格布局结构在数字化文档图像中普遍存在,其特点是行分隔符和列分隔符。文档图像可能包括由于扫描不当,折痕形成,意外注释等而引入的一些不希望的行,除了表中的所需行。由于表是表示信息的文档图像的有效组件,因此需要从文档图像中提取表。该方法旨在通过两步处理去除二进制文档图像中不需要的直线,而不影响表的基本细节。在第一步中,使用Mask Processing提取包含行作为行和列分隔符以及它们各自的帧的表的必要细节。第二步涉及使用伪对角图像(PDI)及其旋转检测和去除所有直线。该方法利用单个掩码而不是多个掩码检测表的新颖性,从而降低了处理的计算复杂度。表边界的厚度在提取时的独立性也是该算法的有效特征。得到的结果精度为93.35%,召回率为92.33%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A simple system for table extraction irrespective of boundary thickness and removal of detected spurious lines
Several types of table layout structures are ubiquitous in digitalized document images and are characterized by their row and column separators. Document image may consist of several undesirable lines introduced due to improper scanning, crease formation, accidental remarks etc., in addition to the desired lines in tables. Since tables being an effective component of document images for representing an information, one needs to extract the table from the document images. The proposed method aims at removing unwanted straight lines in binary document images, without affecting the essential details of the table by a two-step process. In the first step, the extraction of necessary details of the tables containing lines as row and column separators along with their respective frames is performed using Mask Processing. The second step involves the detection and removal of all straight lines using a Pseudo Diagonal Image (PDI) and its rotation. The proposed method exploits the novelty in utilizing a single mask for the detection of tables instead of multiple masks, hence the computational complexity for processing is lesser. Independency in the thickness of table boundary while extraction is also an effective characterization of the proposed algorithm. The Obtained result shows 93.35% precision and 92.33% recall.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信