{"title":"无参数表检测方法","authors":"Laiphangbam Melinda, C. Bhagvati","doi":"10.1109/ICDAR.2019.00079","DOIUrl":null,"url":null,"abstract":"In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Parameter-Free Table Detection Method\",\"authors\":\"Laiphangbam Melinda, C. Bhagvati\",\"doi\":\"10.1109/ICDAR.2019.00079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"17 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.