{"title":"面向改进表单处理的错误源分析","authors":"U. Bhattacharya, Bikash Shaw, S. K. Parui","doi":"10.1109/ICIT.2006.30","DOIUrl":null,"url":null,"abstract":"Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.","PeriodicalId":161120,"journal":{"name":"9th International Conference on Information Technology (ICIT'06)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of Error Sources Towards Improved Form Processing\",\"authors\":\"U. Bhattacharya, Bikash Shaw, S. K. Parui\",\"doi\":\"10.1109/ICIT.2006.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.\",\"PeriodicalId\":161120,\"journal\":{\"name\":\"9th International Conference on Information Technology (ICIT'06)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th International Conference on Information Technology (ICIT'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2006.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Information Technology (ICIT'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2006.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of Error Sources Towards Improved Form Processing
Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.