M. Gradinaru, Andrei Negru, C. Boiangiu, N. Tarbă, M. Voncila, Răzvan Adrian Deaconescu
{"title":"完整的OCR解决方案的图像分析的第二次世界大战文件","authors":"M. Gradinaru, Andrei Negru, C. Boiangiu, N. Tarbă, M. Voncila, Răzvan Adrian Deaconescu","doi":"10.1109/RoEduNet57163.2022.9921092","DOIUrl":null,"url":null,"abstract":"The field of Optical Character Recognition (OCR) consists of techniques that are mainly focused on document image analysis. Aside from generating significant speedups of everyday procedures, OCR has a considerable role in the preservation of historical sources of information. Most of the World War 2 (WW2) documents are of great importance, especially with applications in virtual archives, museums, and research. The situation asks for an efficient, yet not aggressive, transcribing method using OCR tools. This paper describes an approach in the context of the given problem. The focus is oriented towards extracting the information from documents affected by their age, but with simpler structures, mainly split into paragraphs, such as letters and military reports. The approach is based on combining the results of multiple OCR engines, with the final objective of achieving better performance compared to the individual performance of each engine.","PeriodicalId":302692,"journal":{"name":"2022 21st RoEduNet Conference: Networking in Education and Research (RoEduNet)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Complete OCR Solution for Image Analysis of World War 2 Documents\",\"authors\":\"M. Gradinaru, Andrei Negru, C. Boiangiu, N. Tarbă, M. Voncila, Răzvan Adrian Deaconescu\",\"doi\":\"10.1109/RoEduNet57163.2022.9921092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of Optical Character Recognition (OCR) consists of techniques that are mainly focused on document image analysis. Aside from generating significant speedups of everyday procedures, OCR has a considerable role in the preservation of historical sources of information. Most of the World War 2 (WW2) documents are of great importance, especially with applications in virtual archives, museums, and research. The situation asks for an efficient, yet not aggressive, transcribing method using OCR tools. This paper describes an approach in the context of the given problem. The focus is oriented towards extracting the information from documents affected by their age, but with simpler structures, mainly split into paragraphs, such as letters and military reports. The approach is based on combining the results of multiple OCR engines, with the final objective of achieving better performance compared to the individual performance of each engine.\",\"PeriodicalId\":302692,\"journal\":{\"name\":\"2022 21st RoEduNet Conference: Networking in Education and Research (RoEduNet)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st RoEduNet Conference: Networking in Education and Research (RoEduNet)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RoEduNet57163.2022.9921092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st RoEduNet Conference: Networking in Education and Research (RoEduNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RoEduNet57163.2022.9921092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Complete OCR Solution for Image Analysis of World War 2 Documents
The field of Optical Character Recognition (OCR) consists of techniques that are mainly focused on document image analysis. Aside from generating significant speedups of everyday procedures, OCR has a considerable role in the preservation of historical sources of information. Most of the World War 2 (WW2) documents are of great importance, especially with applications in virtual archives, museums, and research. The situation asks for an efficient, yet not aggressive, transcribing method using OCR tools. This paper describes an approach in the context of the given problem. The focus is oriented towards extracting the information from documents affected by their age, but with simpler structures, mainly split into paragraphs, such as letters and military reports. The approach is based on combining the results of multiple OCR engines, with the final objective of achieving better performance compared to the individual performance of each engine.