{"title":"A simple system for table extraction irrespective of boundary thickness and removal of detected spurious lines","authors":"S. Deivalakshmi","doi":"10.1109/ICICI.2017.8365236","DOIUrl":null,"url":null,"abstract":"Several types of table layout structures are ubiquitous in digitalized document images and are characterized by their row and column separators. Document image may consist of several undesirable lines introduced due to improper scanning, crease formation, accidental remarks etc., in addition to the desired lines in tables. Since tables being an effective component of document images for representing an information, one needs to extract the table from the document images. The proposed method aims at removing unwanted straight lines in binary document images, without affecting the essential details of the table by a two-step process. In the first step, the extraction of necessary details of the tables containing lines as row and column separators along with their respective frames is performed using Mask Processing. The second step involves the detection and removal of all straight lines using a Pseudo Diagonal Image (PDI) and its rotation. The proposed method exploits the novelty in utilizing a single mask for the detection of tables instead of multiple masks, hence the computational complexity for processing is lesser. Independency in the thickness of table boundary while extraction is also an effective characterization of the proposed algorithm. The Obtained result shows 93.35% precision and 92.33% recall.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Several types of table layout structures are ubiquitous in digitalized document images and are characterized by their row and column separators. Document image may consist of several undesirable lines introduced due to improper scanning, crease formation, accidental remarks etc., in addition to the desired lines in tables. Since tables being an effective component of document images for representing an information, one needs to extract the table from the document images. The proposed method aims at removing unwanted straight lines in binary document images, without affecting the essential details of the table by a two-step process. In the first step, the extraction of necessary details of the tables containing lines as row and column separators along with their respective frames is performed using Mask Processing. The second step involves the detection and removal of all straight lines using a Pseudo Diagonal Image (PDI) and its rotation. The proposed method exploits the novelty in utilizing a single mask for the detection of tables instead of multiple masks, hence the computational complexity for processing is lesser. Independency in the thickness of table boundary while extraction is also an effective characterization of the proposed algorithm. The Obtained result shows 93.35% precision and 92.33% recall.