光学字符识别引导图像超分辨率

Proceedings of the 22nd ACM Symposium on Document Engineering Pub Date : 2022-09-20 DOI:10.1145/3558100.3563841

Philipp Hildebrandt, Maximilian Schulze, S. Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich

{"title":"光学字符识别引导图像超分辨率","authors":"Philipp Hildebrandt, Maximilian Schulze, S. Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich","doi":"10.1145/3558100.3563841","DOIUrl":null,"url":null,"abstract":"Recognizing disturbed text in real-life images is a difficult problem, as information that is missing due to low resolution or out-of-focus text has to be recreated. Combining text super-resolution and optical character recognition deep learning models can be a valuable tool to enlarge and enhance text images for better readability, as well as recognize text automatically afterwards. We achieve improved peak signal-to-noise ratio and text recognition accuracy scores over a state-of-the-art text super-resolution model TBSRN on the real-world low-resolution dataset TextZoom while having a smaller theoretical model size due to the usage of quantization techniques. In addition, we show how different training strategies influence the performance of the resulting model.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optical character recognition guided image super resolution\",\"authors\":\"Philipp Hildebrandt, Maximilian Schulze, S. Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich\",\"doi\":\"10.1145/3558100.3563841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognizing disturbed text in real-life images is a difficult problem, as information that is missing due to low resolution or out-of-focus text has to be recreated. Combining text super-resolution and optical character recognition deep learning models can be a valuable tool to enlarge and enhance text images for better readability, as well as recognize text automatically afterwards. We achieve improved peak signal-to-noise ratio and text recognition accuracy scores over a state-of-the-art text super-resolution model TBSRN on the real-world low-resolution dataset TextZoom while having a smaller theoretical model size due to the usage of quantization techniques. In addition, we show how different training strategies influence the performance of the resulting model.\",\"PeriodicalId\":146244,\"journal\":{\"name\":\"Proceedings of the 22nd ACM Symposium on Document Engineering\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3558100.3563841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3558100.3563841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

识别真实图像中的干扰文本是一个难题，因为由于低分辨率或失焦文本而丢失的信息必须重新创建。结合文本超分辨率和光学字符识别的深度学习模型可以放大和增强文本图像以提高可读性，并在之后自动识别文本。我们在现实世界的低分辨率数据集TextZoom上实现了比最先进的文本超分辨率模型TBSRN更高的峰值信噪比和文本识别精度分数，同时由于使用了量化技术，具有更小的理论模型大小。此外，我们还展示了不同的训练策略如何影响最终模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optical character recognition guided image super resolution

Recognizing disturbed text in real-life images is a difficult problem, as information that is missing due to low resolution or out-of-focus text has to be recreated. Combining text super-resolution and optical character recognition deep learning models can be a valuable tool to enlarge and enhance text images for better readability, as well as recognize text automatically afterwards. We achieve improved peak signal-to-noise ratio and text recognition accuracy scores over a state-of-the-art text super-resolution model TBSRN on the real-world low-resolution dataset TextZoom while having a smaller theoretical model size due to the usage of quantization techniques. In addition, we show how different training strategies influence the performance of the resulting model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd ACM Symposium on Document Engineering

自引率

0.00%

发文量