{"title":"CDIA-DS:一个利用数据结构对复合文档图像进行高效重构的框架","authors":"Anand Gupta, Devendra Tiwari, Priyanshi Gupta, Ankit Kulshreshtha","doi":"10.1109/IC3.2016.7880258","DOIUrl":null,"url":null,"abstract":"With the advancement of image acquisition technology, extensive research is being conducted to convert images of paper documentation into an editable electronic format. Various techniques have been developed to extract either Text, Table or Figure region in a document image. However, our finding from past research suggests that these techniques do not deal with documents containing a combination of two or more such regions. Moreover, we believe that in order to facilitate document recreation, the extracted information requires organization in terms of its semantic layout and formatting. Therefore, we advocate the need of a combined technique for extracting each of these regions and need of structuring the extracted information efficiently. In this paper, we propose an efficient two-stage framework CDIA-DS (Compound Document Image Analysis-Data Structure) to cater the aforementioned needs. In the first stage, the regions in document image are identified, and classified in the form of Views (Text/Table/Figure). Views are then organized in the second stage through the proposed tree based structure comprising of leaf and parent nodes in the form of Views and Layouts (arrangement of one or more Views) respectively. In the end experiments are done, to examine the efficiency of CDIA-DS using the proposed data structure.","PeriodicalId":294210,"journal":{"name":"2016 Ninth International Conference on Contemporary Computing (IC3)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CDIA-DS: A framework for efficient reconstruction of compound document image using data structure\",\"authors\":\"Anand Gupta, Devendra Tiwari, Priyanshi Gupta, Ankit Kulshreshtha\",\"doi\":\"10.1109/IC3.2016.7880258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement of image acquisition technology, extensive research is being conducted to convert images of paper documentation into an editable electronic format. Various techniques have been developed to extract either Text, Table or Figure region in a document image. However, our finding from past research suggests that these techniques do not deal with documents containing a combination of two or more such regions. Moreover, we believe that in order to facilitate document recreation, the extracted information requires organization in terms of its semantic layout and formatting. Therefore, we advocate the need of a combined technique for extracting each of these regions and need of structuring the extracted information efficiently. In this paper, we propose an efficient two-stage framework CDIA-DS (Compound Document Image Analysis-Data Structure) to cater the aforementioned needs. In the first stage, the regions in document image are identified, and classified in the form of Views (Text/Table/Figure). Views are then organized in the second stage through the proposed tree based structure comprising of leaf and parent nodes in the form of Views and Layouts (arrangement of one or more Views) respectively. In the end experiments are done, to examine the efficiency of CDIA-DS using the proposed data structure.\",\"PeriodicalId\":294210,\"journal\":{\"name\":\"2016 Ninth International Conference on Contemporary Computing (IC3)\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Ninth International Conference on Contemporary Computing (IC3)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3.2016.7880258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Ninth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2016.7880258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CDIA-DS: A framework for efficient reconstruction of compound document image using data structure
With the advancement of image acquisition technology, extensive research is being conducted to convert images of paper documentation into an editable electronic format. Various techniques have been developed to extract either Text, Table or Figure region in a document image. However, our finding from past research suggests that these techniques do not deal with documents containing a combination of two or more such regions. Moreover, we believe that in order to facilitate document recreation, the extracted information requires organization in terms of its semantic layout and formatting. Therefore, we advocate the need of a combined technique for extracting each of these regions and need of structuring the extracted information efficiently. In this paper, we propose an efficient two-stage framework CDIA-DS (Compound Document Image Analysis-Data Structure) to cater the aforementioned needs. In the first stage, the regions in document image are identified, and classified in the form of Views (Text/Table/Figure). Views are then organized in the second stage through the proposed tree based structure comprising of leaf and parent nodes in the form of Views and Layouts (arrangement of one or more Views) respectively. In the end experiments are done, to examine the efficiency of CDIA-DS using the proposed data structure.