Shreddr: pipelined paper digitization for low-resource organizations

ACM DEV '12 Pub Date : 2012-03-11 DOI:10.1145/2160601.2160605

Kuang Chen, Akshay Kannan, Yoriyasu Yano, J. Hellerstein, Tapan S. Parikh

{"title":"Shreddr: pipelined paper digitization for low-resource organizations","authors":"Kuang Chen, Akshay Kannan, Yoriyasu Yano, J. Hellerstein, Tapan S. Parikh","doi":"10.1145/2160601.2160605","DOIUrl":null,"url":null,"abstract":"For low-resource organizations working in developing regions, infrastructure and capacity for data collection have not kept pace with the increasing demand for accurate and timely data. Despite continued emphasis and investment, many data collection efforts still suffer from delays, inefficiency and difficulties maintaining quality. Data is often still \"stuck\" on paper forms, making it unavailable for decision-makers and operational staff. We apply techniques from computer vision, database systems and machine learning, and leverage new infrastructure -- online workers and mobile connectivity -- to redesign data entry with high data quality. Shreddr delivers self-serve, low-cost and on-demand data entry service allowing low-resource organizations to quickly transform stacks of paper into structured electronic records through a novel combination of optimizations: batch processing and compression techniques from database systems, automatic document processing using computer vision, and value verification through crowd-sourcing. In this paper, we describe Shreddr's design and implementation, and measure system performance with a large-scale evaluation in Mali, where Shreddr was used to enter over a million values from 36,819 pages. Within this case study, we found that Shreddr can significantly decrease the effort and cost of data entry, while maintaining a high level of quality.","PeriodicalId":153059,"journal":{"name":"ACM DEV '12","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM DEV '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2160601.2160605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 53

Abstract

For low-resource organizations working in developing regions, infrastructure and capacity for data collection have not kept pace with the increasing demand for accurate and timely data. Despite continued emphasis and investment, many data collection efforts still suffer from delays, inefficiency and difficulties maintaining quality. Data is often still "stuck" on paper forms, making it unavailable for decision-makers and operational staff. We apply techniques from computer vision, database systems and machine learning, and leverage new infrastructure -- online workers and mobile connectivity -- to redesign data entry with high data quality. Shreddr delivers self-serve, low-cost and on-demand data entry service allowing low-resource organizations to quickly transform stacks of paper into structured electronic records through a novel combination of optimizations: batch processing and compression techniques from database systems, automatic document processing using computer vision, and value verification through crowd-sourcing. In this paper, we describe Shreddr's design and implementation, and measure system performance with a large-scale evaluation in Mali, where Shreddr was used to enter over a million values from 36,819 pages. Within this case study, we found that Shreddr can significantly decrease the effort and cost of data entry, while maintaining a high level of quality.

查看原文本刊更多论文

Shreddr:低资源组织的流水线纸张数字化

对于在发展中地区工作的资源匮乏的组织来说，数据收集的基础设施和能力跟不上对准确和及时数据日益增长的需求。尽管继续重视和投资，但许多数据收集工作仍然存在延迟、低效率和难以保持质量的问题。数据通常仍然“卡在”纸质表格上，使决策者和业务人员无法获得这些数据。我们运用计算机视觉、数据库系统和机器学习技术，并利用新的基础设施——在线工作人员和移动连接——重新设计高质量的数据输入。Shreddr提供自助服务，低成本和按需数据输入服务，允许资源不足的组织通过优化的新颖组合将成堆的纸张快速转换为结构化的电子记录:数据库系统的批处理和压缩技术，使用计算机视觉的自动文档处理，以及通过众包的价值验证。在本文中，我们描述了Shreddr的设计和实现，并通过在马里进行的大规模评估来衡量系统性能，其中使用Shreddr从36,819页中输入超过一百万个值。在这个案例研究中，我们发现Shreddr可以显著减少数据输入的工作量和成本，同时保持高质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM DEV '12

自引率

0.00%

发文量