Data forensic techniques using Benford's law and Zipf's law for keystroke dynamics

A. Iorliam, A. Ho, N. Poh, Santosh Tirunagari, Patrick A. H. Bours
{"title":"Data forensic techniques using Benford's law and Zipf's law for keystroke dynamics","authors":"A. Iorliam, A. Ho, N. Poh, Santosh Tirunagari, Patrick A. H. Bours","doi":"10.1109/IWBF.2015.7110238","DOIUrl":null,"url":null,"abstract":"The selection and application of biometrics traits for authentication and identification have recently attracted a significant amount of research interest. In this paper we investigate the use of keystroke data to distinguish between humans using keystroke biometric systems and non-humans for auditing application. Recently, Benford's Law and Zipf's Law, which are both discrete Power law probability distributions, have been effectively used to detect fraud and discriminate between genuine data and fake/tampered data. As such, our motivation is to apply the Benford's Law and Zipf's Law on keystroke data and to determine whether they follow these laws and discriminate between humans using keystroke biometric systems from non-humans. From the results, we observe that, the latency values of the keystroke data from humans actually follow the Benford's law and Zipf's law, but not the duration values. This implies that, latency values from humans would follow the two laws, whereas the latency values from non-humans would deviate from the Benford's law and Zipf's law. Even though, the duration values from humans deviates from the Benford's law, they do follow a pattern that we can develop an accurate model for the duration values. We perform experiments using the benchmark data set developed by Killourhy and Maxion, CMU [1] and obtain divergences of 0.0008, 0.029 and 0.05 for the keyup-keydown (latency), keydown-keydown, and duration of the keystroke data, respectively. Moreover, P-value's of 0.7770, 0.6230 and 0.0160 are obtained for the keyup-keydown (latency), keydown-keydown, and duration of the keystroke data, respectively. We observe that the latency (which is the time elapsed between release of the first key and pressing down of the next key) is one of the most important features used by administrators for auditing purposes to detect anomalies during their employees logging into their company system.","PeriodicalId":416816,"journal":{"name":"3rd International Workshop on Biometrics and Forensics (IWBF 2015)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"3rd International Workshop on Biometrics and Forensics (IWBF 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBF.2015.7110238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The selection and application of biometrics traits for authentication and identification have recently attracted a significant amount of research interest. In this paper we investigate the use of keystroke data to distinguish between humans using keystroke biometric systems and non-humans for auditing application. Recently, Benford's Law and Zipf's Law, which are both discrete Power law probability distributions, have been effectively used to detect fraud and discriminate between genuine data and fake/tampered data. As such, our motivation is to apply the Benford's Law and Zipf's Law on keystroke data and to determine whether they follow these laws and discriminate between humans using keystroke biometric systems from non-humans. From the results, we observe that, the latency values of the keystroke data from humans actually follow the Benford's law and Zipf's law, but not the duration values. This implies that, latency values from humans would follow the two laws, whereas the latency values from non-humans would deviate from the Benford's law and Zipf's law. Even though, the duration values from humans deviates from the Benford's law, they do follow a pattern that we can develop an accurate model for the duration values. We perform experiments using the benchmark data set developed by Killourhy and Maxion, CMU [1] and obtain divergences of 0.0008, 0.029 and 0.05 for the keyup-keydown (latency), keydown-keydown, and duration of the keystroke data, respectively. Moreover, P-value's of 0.7770, 0.6230 and 0.0160 are obtained for the keyup-keydown (latency), keydown-keydown, and duration of the keystroke data, respectively. We observe that the latency (which is the time elapsed between release of the first key and pressing down of the next key) is one of the most important features used by administrators for auditing purposes to detect anomalies during their employees logging into their company system.
使用本福德定律和齐夫定律的击键动力学数据取证技术
生物特征特征的选择和应用在身份验证和识别方面已经引起了大量的研究兴趣。在本文中,我们研究了使用击键数据来区分人类使用击键生物识别系统和非人类审计应用程序。近年来,本福德定律和齐夫定律都是离散的幂律概率分布,已被有效地用于欺诈检测和区分真实数据和伪造/篡改数据。因此,我们的动机是将本福德定律和齐夫定律应用于击键数据,并确定它们是否遵循这些定律,并区分使用击键生物识别系统的人类和非人类。从结果中我们观察到,人类击键数据的延迟值实际上遵循Benford定律和Zipf定律,而不是持续时间值。这意味着,来自人类的延迟值将遵循这两个定律,而来自非人类的延迟值将偏离Benford定律和Zipf定律。尽管人类的持续时间值偏离了本福德定律,但它们确实遵循一种模式,我们可以为持续时间值开发一个准确的模型。我们使用Killourhy和Maxion, CMU[1]开发的基准数据集进行实验,得到了按键数据的上键-下键(延迟)、下键-下键和持续时间的散度分别为0.0008、0.029和0.05。keyup-keydown (latency)、keydown-keydown和按键数据持续时间的p值分别为0.7770、0.6230和0.0160。我们观察到,延迟(释放第一个键和按下下一个键之间所经过的时间)是管理员用于审计目的的最重要的特性之一,用于检测员工登录到公司系统期间的异常情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信