Xinyi Shen, Lingjun Kong, Yunchao Bao, Yaowei Zhou, Weiguang Liu
{"title":"RCANet:一个用于表结构识别的行和列聚合网络","authors":"Xinyi Shen, Lingjun Kong, Yunchao Bao, Yaowei Zhou, Weiguang Liu","doi":"10.1109/ictc55111.2022.9778621","DOIUrl":null,"url":null,"abstract":"Most existing table structure recognition methods can be classified into two major categories: detecting table borders methods and detecting table rows and columns methods. The method of detecting the table borders can produce the imbalance between positive and negative samples, because the number of pixels in the table borders is very small. Although the method of detecting the rows and columns of the table avoids this imbalance, some studies simplify the prediction of rows and columns into column-by-column and row-by-row prediction, which creates a problem with large error tolerance. To solve this problem, two modules are proposed, called Rows Aggregated (RA) module and Columns Aggregated (CA) module. Firstly, the method of feature slicing and tiling is used to make approximate prediction for the rows and columns that solves the problem of large error tolerance. Secondly, the row and column information is further retrieved by calculating the attention maps of channels. Finally, we use RA and CA to build a semantic segmentation network, which is called Rows and Columns Aggregated Network (RCANet), to complete the rows segmentation and columns segmentation. We generate rows and columns masks on ICDAR2013 dataset and evaluate the model. Experiments show that the proposed model has better performance than the segmentation model based on detection table rows and columns method, and its average precision, recall and F1 value are 2.08%, 3.21% and 2.45% higher respectively.","PeriodicalId":123022,"journal":{"name":"2022 3rd Information Communication Technologies Conference (ICTC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"RCANet: A Rows and Columns Aggregated Network for Table Structure Recognition\",\"authors\":\"Xinyi Shen, Lingjun Kong, Yunchao Bao, Yaowei Zhou, Weiguang Liu\",\"doi\":\"10.1109/ictc55111.2022.9778621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most existing table structure recognition methods can be classified into two major categories: detecting table borders methods and detecting table rows and columns methods. The method of detecting the table borders can produce the imbalance between positive and negative samples, because the number of pixels in the table borders is very small. Although the method of detecting the rows and columns of the table avoids this imbalance, some studies simplify the prediction of rows and columns into column-by-column and row-by-row prediction, which creates a problem with large error tolerance. To solve this problem, two modules are proposed, called Rows Aggregated (RA) module and Columns Aggregated (CA) module. Firstly, the method of feature slicing and tiling is used to make approximate prediction for the rows and columns that solves the problem of large error tolerance. Secondly, the row and column information is further retrieved by calculating the attention maps of channels. Finally, we use RA and CA to build a semantic segmentation network, which is called Rows and Columns Aggregated Network (RCANet), to complete the rows segmentation and columns segmentation. We generate rows and columns masks on ICDAR2013 dataset and evaluate the model. Experiments show that the proposed model has better performance than the segmentation model based on detection table rows and columns method, and its average precision, recall and F1 value are 2.08%, 3.21% and 2.45% higher respectively.\",\"PeriodicalId\":123022,\"journal\":{\"name\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ictc55111.2022.9778621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd Information Communication Technologies Conference (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictc55111.2022.9778621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
现有的表结构识别方法主要分为两大类:表边界检测方法和表行、列检测方法。检测表边界的方法会产生正负样本之间的不平衡,因为表边界的像素数量非常小。虽然检测表的行和列的方法避免了这种不平衡,但一些研究将行和列的预测简化为逐列和逐行预测,这就产生了容错性很大的问题。为了解决这一问题,提出了两个模块:RA (Rows Aggregated)模块和CA (Columns Aggregated)模块。首先,采用特征切片和平铺的方法对行和列进行近似预测,解决了容错性大的问题;其次,通过计算通道的注意图,进一步提取行信息和列信息;最后,我们利用RA和CA构建了一个语义分割网络,称为行与列聚合网络(Rows and Columns Aggregated network, RCANet),完成行分割和列分割。我们在ICDAR2013数据集上生成行和列掩码,并对模型进行评估。实验表明,该模型比基于检测表行列法的分割模型性能更好,平均准确率、召回率和F1值分别提高了2.08%、3.21%和2.45%。
RCANet: A Rows and Columns Aggregated Network for Table Structure Recognition
Most existing table structure recognition methods can be classified into two major categories: detecting table borders methods and detecting table rows and columns methods. The method of detecting the table borders can produce the imbalance between positive and negative samples, because the number of pixels in the table borders is very small. Although the method of detecting the rows and columns of the table avoids this imbalance, some studies simplify the prediction of rows and columns into column-by-column and row-by-row prediction, which creates a problem with large error tolerance. To solve this problem, two modules are proposed, called Rows Aggregated (RA) module and Columns Aggregated (CA) module. Firstly, the method of feature slicing and tiling is used to make approximate prediction for the rows and columns that solves the problem of large error tolerance. Secondly, the row and column information is further retrieved by calculating the attention maps of channels. Finally, we use RA and CA to build a semantic segmentation network, which is called Rows and Columns Aggregated Network (RCANet), to complete the rows segmentation and columns segmentation. We generate rows and columns masks on ICDAR2013 dataset and evaluate the model. Experiments show that the proposed model has better performance than the segmentation model based on detection table rows and columns method, and its average precision, recall and F1 value are 2.08%, 3.21% and 2.45% higher respectively.