{"title":"扫描文件中网格线检测和去除的通用方法","authors":"Romain Karpinski, A. Belaïd","doi":"10.1109/ASAR.2018.8480217","DOIUrl":null,"url":null,"abstract":"The detection and extraction of writing grid lines (WGL) in document images is an important task for a wide variety of systems. It is a pre-processing operation that tries to clean up the document image to make the recognition process easier. A lot of work has been proposed for staff line extraction in the context of Optical Music Recognition. Two competitions have been recently proposed in the 2011 and the 2013 ICDAR/GREC conferences. The method proposed in this paper aims to remove WGL without degrading the content. The whole method is based on the estimation of line_space (inter) and line_height and the use of run-length segments to locate WGL points. These points are then grouped together to form larger lines. Missing points are estimated by using a linear model and the context of other adjacent lines. We show that our method does not rely on the writing nature: printed or handwritten nor the language: musical symbols, Latin or Arabic writings. The results obtained are close to the state-of-the-art on not deformed documents. Furthermore, our method performs better than the ones that we have tested (at our disposal) on our image grid datasets.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generic method for grid line detection and removal in scanned documents\",\"authors\":\"Romain Karpinski, A. Belaïd\",\"doi\":\"10.1109/ASAR.2018.8480217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection and extraction of writing grid lines (WGL) in document images is an important task for a wide variety of systems. It is a pre-processing operation that tries to clean up the document image to make the recognition process easier. A lot of work has been proposed for staff line extraction in the context of Optical Music Recognition. Two competitions have been recently proposed in the 2011 and the 2013 ICDAR/GREC conferences. The method proposed in this paper aims to remove WGL without degrading the content. The whole method is based on the estimation of line_space (inter) and line_height and the use of run-length segments to locate WGL points. These points are then grouped together to form larger lines. Missing points are estimated by using a linear model and the context of other adjacent lines. We show that our method does not rely on the writing nature: printed or handwritten nor the language: musical symbols, Latin or Arabic writings. The results obtained are close to the state-of-the-art on not deformed documents. Furthermore, our method performs better than the ones that we have tested (at our disposal) on our image grid datasets.\",\"PeriodicalId\":165564,\"journal\":{\"name\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAR.2018.8480217\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generic method for grid line detection and removal in scanned documents
The detection and extraction of writing grid lines (WGL) in document images is an important task for a wide variety of systems. It is a pre-processing operation that tries to clean up the document image to make the recognition process easier. A lot of work has been proposed for staff line extraction in the context of Optical Music Recognition. Two competitions have been recently proposed in the 2011 and the 2013 ICDAR/GREC conferences. The method proposed in this paper aims to remove WGL without degrading the content. The whole method is based on the estimation of line_space (inter) and line_height and the use of run-length segments to locate WGL points. These points are then grouped together to form larger lines. Missing points are estimated by using a linear model and the context of other adjacent lines. We show that our method does not rely on the writing nature: printed or handwritten nor the language: musical symbols, Latin or Arabic writings. The results obtained are close to the state-of-the-art on not deformed documents. Furthermore, our method performs better than the ones that we have tested (at our disposal) on our image grid datasets.