{"title":"评估生物识别质量评估算法的注意事项","authors":"Torsten Schlett;Christian Rathgeb;Juan Tapia;Christoph Busch","doi":"10.1109/TBIOM.2023.3336513","DOIUrl":null,"url":null,"abstract":"Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. “Error versus Discard Characteristic” (EDC) plots, and “partial Area Under Curve” (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the “False Non Match Rate” (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples’ lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real face image and fingerprint quality assessment data, with a focus on general modality-independent conclusions for EDC evaluations. Various EDC alternatives are discussed as well. Open source evaluation software is provided at \n<uri>https://github.com/dasec/quality-assessment-evaluation</uri>\n. Will be made available upon acceptance.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 1","pages":"54-67"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10330743","citationCount":"0","resultStr":"{\"title\":\"Considerations on the Evaluation of Biometric Quality Assessment Algorithms\",\"authors\":\"Torsten Schlett;Christian Rathgeb;Juan Tapia;Christoph Busch\",\"doi\":\"10.1109/TBIOM.2023.3336513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. “Error versus Discard Characteristic” (EDC) plots, and “partial Area Under Curve” (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the “False Non Match Rate” (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples’ lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real face image and fingerprint quality assessment data, with a focus on general modality-independent conclusions for EDC evaluations. Various EDC alternatives are discussed as well. Open source evaluation software is provided at \\n<uri>https://github.com/dasec/quality-assessment-evaluation</uri>\\n. Will be made available upon acceptance.\",\"PeriodicalId\":73307,\"journal\":{\"name\":\"IEEE transactions on biometrics, behavior, and identity science\",\"volume\":\"6 1\",\"pages\":\"54-67\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10330743\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on biometrics, behavior, and identity science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10330743/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10330743/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
质量评估算法可用于估计生物识别样本对生物识别的效用。研究人员通常使用 "误差与丢弃特征"(EDC)图及其曲线的 "部分曲线下面积"(pAUC)值来评估此类质量评估算法的预测性能。EDC 曲线取决于错误类型(如 "错误非匹配率"(FNMR))、质量评估算法、生物特征识别系统、一组与生物特征样本对相对应的比较以及与起始错误相对应的比较分数阈值。在计算 EDC 曲线时,根据相关样本的最低质量分数逐步舍弃对比样本,然后计算剩余对比样本的误差。此外,计算 pAUC 值时还必须选择一个丢弃分数限制或范围,然后用它对质量评估算法进行定量排序。本文讨论并分析了这种质量评估算法评估的各种细节,包括一般 EDC 特性、基于硬误差下限和软误差上限的 pAUC 值的可解释性改进、相对排名而非离散排名的使用、逐步曲线插值与线性曲线插值,以及将质量分数归一化为 [0, 100] 整数范围。我们还分析了基于 pAUC 值的定量质量评估算法排名在不同 pAUC 丢弃分数限制和起始误差条件下的稳定性,得出的结论是应首选较高的 pAUC 丢弃分数限制。分析既使用了合成数据,也使用了真实的人脸图像和指纹质量评估数据,重点是为 EDC 评估得出与模式无关的一般性结论。此外,还讨论了各种 EDC 替代方案。开放源码评估软件见 https://github.com/dasec/quality-assessment-evaluation。一经接受,即可使用。
Considerations on the Evaluation of Biometric Quality Assessment Algorithms
Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition. “Error versus Discard Characteristic” (EDC) plots, and “partial Area Under Curve” (pAUC) values of curves therein, are generally used by researchers to evaluate the predictive performance of such quality assessment algorithms. An EDC curve depends on an error type such as the “False Non Match Rate” (FNMR), a quality assessment algorithm, a biometric recognition system, a set of comparisons each corresponding to a biometric sample pair, and a comparison score threshold corresponding to a starting error. To compute an EDC curve, comparisons are progressively discarded based on the associated samples’ lowest quality scores, and the error is computed for the remaining comparisons. Additionally, a discard fraction limit or range must be selected to compute pAUC values, which can then be used to quantitatively rank quality assessment algorithms. This paper discusses and analyses various details for this kind of quality assessment algorithm evaluation, including general EDC properties, interpretability improvements for pAUC values based on a hard lower error limit and a soft upper error limit, the use of relative instead of discrete rankings, stepwise vs. linear curve interpolation, and normalisation of quality scores to a [0, 100] integer range. We also analyse the stability of quantitative quality assessment algorithm rankings based on pAUC values across varying pAUC discard fraction limits and starting errors, concluding that higher pAUC discard fraction limits should be preferred. The analyses are conducted both with synthetic data and with real face image and fingerprint quality assessment data, with a focus on general modality-independent conclusions for EDC evaluations. Various EDC alternatives are discussed as well. Open source evaluation software is provided at
https://github.com/dasec/quality-assessment-evaluation
. Will be made available upon acceptance.