K. Maly, S. Zeil, M. Zubair, A. Amrou, A. Aazhar, N. Ratkal
{"title":"用于元数据提取系统的可脚本化统计Oracle","authors":"K. Maly, S. Zeil, M. Zubair, A. Amrou, A. Aazhar, N. Ratkal","doi":"10.1109/QSIC.2007.9","DOIUrl":null,"url":null,"abstract":"An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.","PeriodicalId":136227,"journal":{"name":"Seventh International Conference on Quality Software (QSIC 2007)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Scriptable, Statistical Oracle for a Metadata Extraction System\",\"authors\":\"K. Maly, S. Zeil, M. Zubair, A. Amrou, A. Aazhar, N. Ratkal\",\"doi\":\"10.1109/QSIC.2007.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.\",\"PeriodicalId\":136227,\"journal\":{\"name\":\"Seventh International Conference on Quality Software (QSIC 2007)\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Quality Software (QSIC 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QSIC.2007.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Quality Software (QSIC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QSIC.2007.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Scriptable, Statistical Oracle for a Metadata Extraction System
An oracle is described for dynamic validation of an application (metadata extraction from scanned documents) where a moderate failure rate is acceptable provided that instances of failures during operation can be identified. The oracle combines a variety of deterministic tests and statistical tests based upon characteristics of the document collection on which the system operates. Because this system must adapt to a variety of document collections with different characteristics, a scripting language is developed that binds combinations of tests to the metadata fields expected in a given document collection. The suitability of the oracle is demonstrated by an experiment measuring its ability to mimic human judgments as to which of several alternate outputs for the same document would be preferred.