A. Downton, A. Tams, G. Wells, A. C. Holmes, S. Lucas, G. Beccaloni, M. Scoble, G. S. Robinson
{"title":"Constructing Web-based legacy index card archives-architectural design issues and initial data acquisition","authors":"A. Downton, A. Tams, G. Wells, A. C. Holmes, S. Lucas, G. Beccaloni, M. Scoble, G. S. Robinson","doi":"10.1109/ICDAR.2001.953908","DOIUrl":null,"url":null,"abstract":"Presents a progress report (after 1 year of a 3 year project) on the overall design for a flexible archive conversion system, intended eventually for widespread use as a tool to convert legacy typescript and handwritten archive card indexes into Internet-accessible and searchable databases. The VIADOCS system is being developed and evaluated on a demonstrator archive of 30,000 pyraloid moth cards at the UK Natural History Museum, and has already demonstrated a successful and efficient mechanism for image acquisition using a modified bank cheque scanner. Document image processing and analysis techniques, defined by an XML validating document type definition (DTD), are being used to correct defects in the acquired images and parse card sequences to match the hierarchical taxonomy of pyraloid moth species. Parsed data is processed by offline OCR engines augmented by field-specific subject dictionaries to produce a 'draft' online archive. This archive will then be validated interactively via a Web browser as it is used. It is hoped eventually to provide an efficient and configurable legacy archive document conversion system not only for the Natural History Museum, but also for all museums, libraries and archives where there is a need to interrogate legacy documents via computer.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Presents a progress report (after 1 year of a 3 year project) on the overall design for a flexible archive conversion system, intended eventually for widespread use as a tool to convert legacy typescript and handwritten archive card indexes into Internet-accessible and searchable databases. The VIADOCS system is being developed and evaluated on a demonstrator archive of 30,000 pyraloid moth cards at the UK Natural History Museum, and has already demonstrated a successful and efficient mechanism for image acquisition using a modified bank cheque scanner. Document image processing and analysis techniques, defined by an XML validating document type definition (DTD), are being used to correct defects in the acquired images and parse card sequences to match the hierarchical taxonomy of pyraloid moth species. Parsed data is processed by offline OCR engines augmented by field-specific subject dictionaries to produce a 'draft' online archive. This archive will then be validated interactively via a Web browser as it is used. It is hoped eventually to provide an efficient and configurable legacy archive document conversion system not only for the Natural History Museum, but also for all museums, libraries and archives where there is a need to interrogate legacy documents via computer.