{"title":"基于结构的深度学习文档图像表识别方法","authors":"Mengxi Zhou, R. Ramnath","doi":"10.1109/COMPSAC54236.2022.00105","DOIUrl":null,"url":null,"abstract":"In this paper, we present a nuanced exploration of deep-learning techniques (DL) for extracting structural infor-mation from document images generated from the digitization of business processes. The driving example presented is the extraction of columns and rows of tables using a simple stacked CNN architecture and a combination of ensemble techniques. In addition, the component models of the ensemble are diversified by training on datasets created by applying a “semantics-preserving” transformation on the base dataset. This “semantics-preserving” transformation also aims to alleviate hard recognition in certain noisy images commonly encountered in practice. Our experiments demonstrate how DL techniques can be applied and innovatively combined to measurably improve the accuracy of structure extraction.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Structure-Focused Deep Learning Approach for Table Recognition from Document Images\",\"authors\":\"Mengxi Zhou, R. Ramnath\",\"doi\":\"10.1109/COMPSAC54236.2022.00105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a nuanced exploration of deep-learning techniques (DL) for extracting structural infor-mation from document images generated from the digitization of business processes. The driving example presented is the extraction of columns and rows of tables using a simple stacked CNN architecture and a combination of ensemble techniques. In addition, the component models of the ensemble are diversified by training on datasets created by applying a “semantics-preserving” transformation on the base dataset. This “semantics-preserving” transformation also aims to alleviate hard recognition in certain noisy images commonly encountered in practice. Our experiments demonstrate how DL techniques can be applied and innovatively combined to measurably improve the accuracy of structure extraction.\",\"PeriodicalId\":330838,\"journal\":{\"name\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC54236.2022.00105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Structure-Focused Deep Learning Approach for Table Recognition from Document Images
In this paper, we present a nuanced exploration of deep-learning techniques (DL) for extracting structural infor-mation from document images generated from the digitization of business processes. The driving example presented is the extraction of columns and rows of tables using a simple stacked CNN architecture and a combination of ensemble techniques. In addition, the component models of the ensemble are diversified by training on datasets created by applying a “semantics-preserving” transformation on the base dataset. This “semantics-preserving” transformation also aims to alleviate hard recognition in certain noisy images commonly encountered in practice. Our experiments demonstrate how DL techniques can be applied and innovatively combined to measurably improve the accuracy of structure extraction.