Rustem Damirovich Saitgareev, B. R. Giniatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov
{"title":"Data Extraction from Similarly Structured Scanned Documents","authors":"Rustem Damirovich Saitgareev, B. R. Giniatullin, Vladislav Yurievich Toporov, Artur Aleksandrovich Atnagulov, Farid Radikovich Aglyamov","doi":"10.26907/1562-5419-2021-24-4-667-688","DOIUrl":null,"url":null,"abstract":"Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.","PeriodicalId":262909,"journal":{"name":"Russian Digital Libraries Journal","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Digital Libraries Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26907/1562-5419-2021-24-4-667-688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Currently, the major part of transmitted and stored data is unstructured, and the amount of unstructured data is growing rapidly each year, although it is hardly searchable, unqueryable, and its processing is not automated. At the same time, there is a growth of electronic document management systems. This paper proposes a solution for extracting data from paper documents considering their structure and layout based on document photos. By examining different approaches, including neural networks and plain algorithmic methods, we present their results and discuss them.