Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics

Simon Eberz, Kasper Bonne Rasmussen, Vincent Lenders, I. Martinovic
{"title":"Evaluating Behavioral Biometrics for Continuous Authentication: Challenges and Metrics","authors":"Simon Eberz, Kasper Bonne Rasmussen, Vincent Lenders, I. Martinovic","doi":"10.1145/3052973.3053032","DOIUrl":null,"url":null,"abstract":"In recent years, behavioral biometrics have become a popular approach to support continuous authentication systems. Most generally, a continuous authentication system can make two types of errors: false rejects and false accepts. Based on this, the most commonly reported metrics to evaluate systems are the False Reject Rate (FRR) and False Accept Rate (FAR). However, most papers only report the mean of these measures with little attention paid to their distribution. This is problematic as systematic errors allow attackers to perpetually escape detection while random errors are less severe. Using 16 biometric datasets we show that these systematic errors are very common in the wild. We show that some biometrics (such as eye movements) are particularly prone to systematic errors, while others (such as touchscreen inputs) show more even error distributions. Our results also show that the inclusion of some distinctive features lowers average error rates but significantly increases the prevalence of systematic errors. As such, blind optimization of the mean EER (through feature engineering or selection) can sometimes lead to lower security. Following this result we propose the Gini Coefficient (GC) as an additional metric to accurately capture different error distributions. We demonstrate the usefulness of this measure both to compare different systems and to guide researchers during feature selection. In addition to the selection of features and classifiers, some non- functional machine learning methodologies also affect error rates. The most notable examples of this are the selection of training data and the attacker model used to develop the negative class. 13 out of the 25 papers we analyzed either include imposter data in the negative class or randomly sample training data from the entire dataset, with a further 6 not giving any information on the methodology used. Using real-world data we show that both of these decisions lead to significant underestimation of error rates by 63% and 81%, respectively. This is an alarming result, as it suggests that researchers are either unaware of the magnitude of these effects or might even be purposefully attempting to over-optimize their EER without actually improving the system.","PeriodicalId":20540,"journal":{"name":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3052973.3053032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 74

Abstract

In recent years, behavioral biometrics have become a popular approach to support continuous authentication systems. Most generally, a continuous authentication system can make two types of errors: false rejects and false accepts. Based on this, the most commonly reported metrics to evaluate systems are the False Reject Rate (FRR) and False Accept Rate (FAR). However, most papers only report the mean of these measures with little attention paid to their distribution. This is problematic as systematic errors allow attackers to perpetually escape detection while random errors are less severe. Using 16 biometric datasets we show that these systematic errors are very common in the wild. We show that some biometrics (such as eye movements) are particularly prone to systematic errors, while others (such as touchscreen inputs) show more even error distributions. Our results also show that the inclusion of some distinctive features lowers average error rates but significantly increases the prevalence of systematic errors. As such, blind optimization of the mean EER (through feature engineering or selection) can sometimes lead to lower security. Following this result we propose the Gini Coefficient (GC) as an additional metric to accurately capture different error distributions. We demonstrate the usefulness of this measure both to compare different systems and to guide researchers during feature selection. In addition to the selection of features and classifiers, some non- functional machine learning methodologies also affect error rates. The most notable examples of this are the selection of training data and the attacker model used to develop the negative class. 13 out of the 25 papers we analyzed either include imposter data in the negative class or randomly sample training data from the entire dataset, with a further 6 not giving any information on the methodology used. Using real-world data we show that both of these decisions lead to significant underestimation of error rates by 63% and 81%, respectively. This is an alarming result, as it suggests that researchers are either unaware of the magnitude of these effects or might even be purposefully attempting to over-optimize their EER without actually improving the system.
评估行为生物识别技术的持续认证:挑战和度量
近年来,行为生物识别技术已成为支持连续身份验证系统的一种流行方法。通常,连续身份验证系统会产生两种类型的错误:错误拒绝和错误接受。基于此,最常报告的评估系统的指标是错误拒绝率(FRR)和错误接受率(FAR)。然而,大多数论文只报道了这些指标的平均值,很少关注它们的分布。这是有问题的,因为系统错误允许攻击者永远逃避检测,而随机错误则不那么严重。通过使用16个生物特征数据集,我们发现这些系统误差在野外非常普遍。我们表明,一些生物识别技术(如眼球运动)特别容易出现系统错误,而其他生物识别技术(如触摸屏输入)则显示出更均匀的错误分布。我们的研究结果还表明,包含一些独特的特征降低了平均错误率,但显著增加了系统错误的发生率。因此,平均EER的盲目优化(通过特征工程或选择)有时会导致安全性降低。根据这个结果,我们提出基尼系数(GC)作为一个额外的度量来准确地捕捉不同的误差分布。我们证明了这一措施的有用性,既可以比较不同的系统,也可以在特征选择过程中指导研究人员。除了特征和分类器的选择外,一些非功能机器学习方法也会影响错误率。最值得注意的例子是训练数据的选择和用于开发负类的攻击者模型。在我们分析的25篇论文中,有13篇要么在负类中包含冒名顶替数据,要么从整个数据集中随机抽样训练数据,另外6篇没有提供任何关于所使用方法的信息。使用真实世界的数据,我们表明这两种决策分别导致误差率被严重低估了63%和81%。这是一个令人担忧的结果,因为它表明研究人员要么没有意识到这些影响的严重性,要么甚至可能有目的地试图过度优化他们的EER,而没有真正改善系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信