Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics

Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security Pub Date : 2017-04-02 DOI:10.1145/3052973.3053032

Simon Eberz, Kasper Bonne Rasmussen, Vincent Lenders, I. Martinovic

{"title":"Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics","authors":"Simon Eberz, Kasper Bonne Rasmussen, Vincent Lenders, I. Martinovic","doi":"10.1145/3052973.3053032","DOIUrl":null,"url":null,"abstract":"In recent years, behavioral biometrics have become a popular approach to support continuous authentication systems. Most generally, a continuous authentication system can make two types of errors: false rejects and false accepts. Based on this, the most commonly reported metrics to evaluate systems are the False Reject Rate (FRR) and False Accept Rate (FAR). However, most papers only report the mean of these measures with little attention paid to their distribution. This is problematic as systematic errors allow attackers to perpetually escape detection while random errors are less severe. Using 16 biometric datasets we show that these systematic errors are very common in the wild. We show that some biometrics (such as eye movements) are particularly prone to systematic errors, while others (such as touchscreen inputs) show more even error distributions. Our results also show that the inclusion of some distinctive features lowers average error rates but significantly increases the prevalence of systematic errors. As such, blind optimization of the mean EER (through feature engineering or selection) can sometimes lead to lower security. Following this result we propose the Gini Coefficient (GC) as an additional metric to accurately capture different error distributions. We demonstrate the usefulness of this measure both to compare different systems and to guide researchers during feature selection. In addition to the selection of features and classifiers, some non- functional machine learning methodologies also affect error rates. The most notable examples of this are the selection of training data and the attacker model used to develop the negative class. 13 out of the 25 papers we analyzed either include imposter data in the negative class or randomly sample training data from the entire dataset, with a further 6 not giving any information on the methodology used. Using real-world data we show that both of these decisions lead to significant underestimation of error rates by 63% and 81%, respectively. This is an alarming result, as it suggests that researchers are either unaware of the magnitude of these effects or might even be purposefully attempting to over-optimize their EER without actually improving the system.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3052973.3053032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 74

Abstract

In recent years, behavioral biometrics have become a popular approach to support continuous authentication systems. Most generally, a continuous authentication system can make two types of errors: false rejects and false accepts. Based on this, the most commonly reported metrics to evaluate systems are the False Reject Rate (FRR) and False Accept Rate (FAR). However, most papers only report the mean of these measures with little attention paid to their distribution. This is problematic as systematic errors allow attackers to perpetually escape detection while random errors are less severe. Using 16 biometric datasets we show that these systematic errors are very common in the wild. We show that some biometrics (such as eye movements) are particularly prone to systematic errors, while others (such as touchscreen inputs) show more even error distributions. Our results also show that the inclusion of some distinctive features lowers average error rates but significantly increases the prevalence of systematic errors. As such, blind optimization of the mean EER (through feature engineering or selection) can sometimes lead to lower security. Following this result we propose the Gini Coefficient (GC) as an additional metric to accurately capture different error distributions. We demonstrate the usefulness of this measure both to compare different systems and to guide researchers during feature selection. In addition to the selection of features and classifiers, some non- functional machine learning methodologies also affect error rates. The most notable examples of this are the selection of training data and the attacker model used to develop the negative class. 13 out of the 25 papers we analyzed either include imposter data in the negative class or randomly sample training data from the entire dataset, with a further 6 not giving any information on the methodology used. Using real-world data we show that both of these decisions lead to significant underestimation of error rates by 63% and 81%, respectively. This is an alarming result, as it suggests that researchers are either unaware of the magnitude of these effects or might even be purposefully attempting to over-optimize their EER without actually improving the system.

查看原文本刊更多论文

评估行为生物识别技术的持续认证:挑战和度量

近年来，行为生物识别技术已成为支持连续身份验证系统的一种流行方法。通常，连续身份验证系统会产生两种类型的错误:错误拒绝和错误接受。基于此，最常报告的评估系统的指标是错误拒绝率(FRR)和错误接受率(FAR)。然而，大多数论文只报道了这些指标的平均值，很少关注它们的分布。这是有问题的，因为系统错误允许攻击者永远逃避检测，而随机错误则不那么严重。通过使用16个生物特征数据集，我们发现这些系统误差在野外非常普遍。我们表明，一些生物识别技术(如眼球运动)特别容易出现系统错误，而其他生物识别技术(如触摸屏输入)则显示出更均匀的错误分布。我们的研究结果还表明，包含一些独特的特征降低了平均错误率，但显著增加了系统错误的发生率。因此，平均EER的盲目优化(通过特征工程或选择)有时会导致安全性降低。根据这个结果，我们提出基尼系数(GC)作为一个额外的度量来准确地捕捉不同的误差分布。我们证明了这一措施的有用性，既可以比较不同的系统，也可以在特征选择过程中指导研究人员。除了特征和分类器的选择外，一些非功能机器学习方法也会影响错误率。最值得注意的例子是训练数据的选择和用于开发负类的攻击者模型。在我们分析的25篇论文中，有13篇要么在负类中包含冒名顶替数据，要么从整个数据集中随机抽样训练数据，另外6篇没有提供任何关于所使用方法的信息。使用真实世界的数据，我们表明这两种决策分别导致误差率被严重低估了63%和81%。这是一个令人担忧的结果，因为它表明研究人员要么没有意识到这些影响的严重性，要么甚至可能有目的地试图过度优化他们的EER，而没有真正改善系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

自引率

0.00%

发文量