D. Pant, Dibyendu Talukder, Aaditeshwar Seth, Dinesh Pant, Rohit Singh, Brejesh Dua, Rachit Pandey, Srirama Maruthi, M. Johri, Chetan Arora
{"title":"Robust OCR Pipeline for Automated Digitization of Mother and Child Protection Cards in India","authors":"D. Pant, Dibyendu Talukder, Aaditeshwar Seth, Dinesh Pant, Rohit Singh, Brejesh Dua, Rachit Pandey, Srirama Maruthi, M. Johri, Chetan Arora","doi":"10.1145/3608114","DOIUrl":null,"url":null,"abstract":"The Universal Immunization Programme in India has a mandate to fully vaccinate all of India’s 27 million children born annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and significant time and resources. In our article, we focus on Optical Character Recognition– (OCR) based automated digitization of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for the MCP cards, which is available a priori, we register the card images and perform OCR on the extracted region of interest (ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global–local alignment technique that first approximates the ROI using global homography and then refines using a local homography resulting in improved accuracy. Our pipeline gives a character level accuracy of 98.73% on our dataset against 75.02% by Google Cloud Vision and 79.26% by Azure OCR. We also describe our field testing experience, where the digitized MCP card images were used to provide useful features on the smartphone application for health workers to conduct vaccination sessions.","PeriodicalId":238057,"journal":{"name":"ACM Journal on Computing and Sustainable Societies","volume":"472 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Sustainable Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3608114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Universal Immunization Programme in India has a mandate to fully vaccinate all of India’s 27 million children born annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and significant time and resources. In our article, we focus on Optical Character Recognition– (OCR) based automated digitization of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for the MCP cards, which is available a priori, we register the card images and perform OCR on the extracted region of interest (ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global–local alignment technique that first approximates the ROI using global homography and then refines using a local homography resulting in improved accuracy. Our pipeline gives a character level accuracy of 98.73% on our dataset against 75.02% by Google Cloud Vision and 79.26% by Azure OCR. We also describe our field testing experience, where the digitized MCP card images were used to provide useful features on the smartphone application for health workers to conduct vaccination sessions.