{"title":"使用OCR自动将文档转换为LATEX","authors":"Shashwat Pandey, Aditya Rohatgi","doi":"10.1109/iccica52458.2021.9697266","DOIUrl":null,"url":null,"abstract":"The process of transforming a physical document to a digital version leaves loose ends in several portions. There is a lack of solutions that offer end-to-end conversion of hard copies entailing images, graphs, tables, and other details into soft copies. To this end, we attempt to develop a computationally efficient algorithm to convert a document into its digital version through LATEX representations of the hard copy. Our research efforts take the problem of using OCR techniques into account for converting an image of a typesetted document into LATEX. This work serves as a proof of concept that equation layouts can be learned and individual character recognition is possible with not so sophisticated OCR techniques. The method we created to break the problem down step by step helped modularize and compartmentalize the tasks so that each can focus on the different types of issues that can occur at different levels of granularity.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using OCR to automate document conversion to LATEX\",\"authors\":\"Shashwat Pandey, Aditya Rohatgi\",\"doi\":\"10.1109/iccica52458.2021.9697266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The process of transforming a physical document to a digital version leaves loose ends in several portions. There is a lack of solutions that offer end-to-end conversion of hard copies entailing images, graphs, tables, and other details into soft copies. To this end, we attempt to develop a computationally efficient algorithm to convert a document into its digital version through LATEX representations of the hard copy. Our research efforts take the problem of using OCR techniques into account for converting an image of a typesetted document into LATEX. This work serves as a proof of concept that equation layouts can be learned and individual character recognition is possible with not so sophisticated OCR techniques. The method we created to break the problem down step by step helped modularize and compartmentalize the tasks so that each can focus on the different types of issues that can occur at different levels of granularity.\",\"PeriodicalId\":327193,\"journal\":{\"name\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccica52458.2021.9697266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using OCR to automate document conversion to LATEX
The process of transforming a physical document to a digital version leaves loose ends in several portions. There is a lack of solutions that offer end-to-end conversion of hard copies entailing images, graphs, tables, and other details into soft copies. To this end, we attempt to develop a computationally efficient algorithm to convert a document into its digital version through LATEX representations of the hard copy. Our research efforts take the problem of using OCR techniques into account for converting an image of a typesetted document into LATEX. This work serves as a proof of concept that equation layouts can be learned and individual character recognition is possible with not so sophisticated OCR techniques. The method we created to break the problem down step by step helped modularize and compartmentalize the tasks so that each can focus on the different types of issues that can occur at different levels of granularity.