{"title":"使用CTC网络、WFST语言模型和姓氏校正的1990年美国人口普查表格识别","authors":"Huaigu Cao, Stephen Rawls, P. Natarajan","doi":"10.1109/ICDAR.2017.163","DOIUrl":null,"url":null,"abstract":"This paper presents a system for transcribing 1990 US census forms. Extraction of information from census forms is useful for creating a genealogy database and better archiving census forms. We trained CTC/LSTM-RNN networks as our OCR engine. We solved the major challenge in language modeling by defining syntactical constraints with WFST language models. We made two major technical contributions in this paper. Firstly, 1990 US census forms were automatically transcribed with compelling accuracy for the first time using our system, which can be useful in downstream study in information extracted from census forms. Secondly, we designed a novel post-processing algorithm that improved the recognition accuracy of surnames significantly.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"518 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"1990 US Census Form Recognition Using CTC Network, WFST Language Model, and Surname Correction\",\"authors\":\"Huaigu Cao, Stephen Rawls, P. Natarajan\",\"doi\":\"10.1109/ICDAR.2017.163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a system for transcribing 1990 US census forms. Extraction of information from census forms is useful for creating a genealogy database and better archiving census forms. We trained CTC/LSTM-RNN networks as our OCR engine. We solved the major challenge in language modeling by defining syntactical constraints with WFST language models. We made two major technical contributions in this paper. Firstly, 1990 US census forms were automatically transcribed with compelling accuracy for the first time using our system, which can be useful in downstream study in information extracted from census forms. Secondly, we designed a novel post-processing algorithm that improved the recognition accuracy of surnames significantly.\",\"PeriodicalId\":433676,\"journal\":{\"name\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"518 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2017.163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
1990 US Census Form Recognition Using CTC Network, WFST Language Model, and Surname Correction
This paper presents a system for transcribing 1990 US census forms. Extraction of information from census forms is useful for creating a genealogy database and better archiving census forms. We trained CTC/LSTM-RNN networks as our OCR engine. We solved the major challenge in language modeling by defining syntactical constraints with WFST language models. We made two major technical contributions in this paper. Firstly, 1990 US census forms were automatically transcribed with compelling accuracy for the first time using our system, which can be useful in downstream study in information extracted from census forms. Secondly, we designed a novel post-processing algorithm that improved the recognition accuracy of surnames significantly.