Robust Performance Metrics for Authentication Systems

Proceedings 2019 Network and Distributed System Security Symposium Pub Date : 2019-01-01 DOI:10.14722/ndss.2019.23351

Shridatt Sugrim, Can Liu, Meghan McLean, J. Lindqvist

{"title":"Robust Performance Metrics for Authentication Systems","authors":"Shridatt Sugrim, Can Liu, Meghan McLean, J. Lindqvist","doi":"10.14722/ndss.2019.23351","DOIUrl":null,"url":null,"abstract":"Research has produced many types of authentication systems that use machine learning. However, there is no consistent approach for reporting performance metrics and the reported metrics are inadequate. In this work, we show that several of the common metrics used for reporting performance, such as maximum accuracy (ACC), equal error rate (EER) and area under the ROC curve (AUROC), are inherently flawed. These common metrics hide the details of the inherent tradeoffs a system must make when implemented. Our findings show that current metrics give no insight into how system performance degrades outside the ideal conditions in which they were designed. We argue that adequate performance reporting must be provided to enable meaningful evaluation and that current, commonly used approaches fail in this regard. We present the unnormalized frequency count of scores (FCS) to demonstrate the mathematical underpinnings that lead to these failures and show how they can be avoided. The FCS can be used to augment the performance reporting to enable comparison across systems in a visual way. When reported with the Receiver Operating Characteristics curve (ROC), these two metrics provide a solution to the limitations of currently reported metrics. Finally, we show how to use the FCS and ROC metrics to evaluate and compare different authentication systems.","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2019.23351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

Research has produced many types of authentication systems that use machine learning. However, there is no consistent approach for reporting performance metrics and the reported metrics are inadequate. In this work, we show that several of the common metrics used for reporting performance, such as maximum accuracy (ACC), equal error rate (EER) and area under the ROC curve (AUROC), are inherently flawed. These common metrics hide the details of the inherent tradeoffs a system must make when implemented. Our findings show that current metrics give no insight into how system performance degrades outside the ideal conditions in which they were designed. We argue that adequate performance reporting must be provided to enable meaningful evaluation and that current, commonly used approaches fail in this regard. We present the unnormalized frequency count of scores (FCS) to demonstrate the mathematical underpinnings that lead to these failures and show how they can be avoided. The FCS can be used to augment the performance reporting to enable comparison across systems in a visual way. When reported with the Receiver Operating Characteristics curve (ROC), these two metrics provide a solution to the limitations of currently reported metrics. Finally, we show how to use the FCS and ROC metrics to evaluate and compare different authentication systems.

查看原文本刊更多论文

身份验证系统的健壮性能指标

研究已经产生了许多使用机器学习的认证系统。然而，没有一致的方法来报告性能指标，报告的指标是不充分的。在这项工作中，我们展示了用于报告性能的几个常用指标，如最大准确性(ACC)，等错误率(EER)和ROC曲线下面积(AUROC)，本质上是有缺陷的。这些通用指标隐藏了系统在实现时必须进行的内在权衡的细节。我们的发现表明，当前的度量标准无法洞察系统性能在设计理想条件之外是如何下降的。我们认为，必须提供充分的绩效报告，以便进行有意义的评估，而目前常用的方法在这方面失败了。我们提出了非标准化的分数频率计数(FCS)，以展示导致这些失败的数学基础，并展示如何避免这些失败。FCS可用于增强性能报告，以便以可视化的方式跨系统进行比较。当与受试者工作特征曲线(ROC)一起报告时，这两个指标为当前报告的指标的局限性提供了解决方案。最后，我们展示了如何使用FCS和ROC指标来评估和比较不同的身份验证系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2019 Network and Distributed System Security Symposium

自引率

0.00%

发文量