Unsupervised Anomaly Detection in Sequential Process Data

Okan Bulut, Guher Gorgun, Surina He
{"title":"Unsupervised Anomaly Detection in Sequential Process Data","authors":"Okan Bulut, Guher Gorgun, Surina He","doi":"10.1027/2151-2604/a000558","DOIUrl":null,"url":null,"abstract":"Abstract: In this study, we present three types of unsupervised anomaly detection to identify anomalous test-takers based on their action sequences in problem-solving tasks. The first method relies on the use of the Isolation Forest algorithm to detect anomalous test-takers based on raw action sequences extracted from process data. The second method transforms raw action sequences into contextual embeddings using the Bidirectional Encoder Representations from Transformers (BERT) model and then applies the Isolation Forest algorithm to detect anomalous test-takers. The third method follows the same procedure as the second method, but it includes an intermediary step of dimensionality reduction for the contextual embeddings before applying the Isolation Forest algorithm for detecting anomalous cases. To compare the outcomes of the three methods, we analyze the log files from test-takers in the US sample ( n = 2,021) who completed the problem-solving in technology-rich environments (PSTRE) section of the Programme for the International Assessment of Adult Competencies (PIAAC) 2012 assessment. The results indicated that different groups of test-takers were flagged as anomalous depending on the representation (raw action sequences vs. contextual embeddings) and dimensionality of action sequences. Also, when the contextual embeddings were used, a larger number of test-takers were flagged by the Isolation Forest algorithm, indicating the sensitivity of this algorithm to the dimensionality of input data.","PeriodicalId":263823,"journal":{"name":"Zeitschrift für Psychologie","volume":"13 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zeitschrift für Psychologie","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1027/2151-2604/a000558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract: In this study, we present three types of unsupervised anomaly detection to identify anomalous test-takers based on their action sequences in problem-solving tasks. The first method relies on the use of the Isolation Forest algorithm to detect anomalous test-takers based on raw action sequences extracted from process data. The second method transforms raw action sequences into contextual embeddings using the Bidirectional Encoder Representations from Transformers (BERT) model and then applies the Isolation Forest algorithm to detect anomalous test-takers. The third method follows the same procedure as the second method, but it includes an intermediary step of dimensionality reduction for the contextual embeddings before applying the Isolation Forest algorithm for detecting anomalous cases. To compare the outcomes of the three methods, we analyze the log files from test-takers in the US sample ( n = 2,021) who completed the problem-solving in technology-rich environments (PSTRE) section of the Programme for the International Assessment of Adult Competencies (PIAAC) 2012 assessment. The results indicated that different groups of test-takers were flagged as anomalous depending on the representation (raw action sequences vs. contextual embeddings) and dimensionality of action sequences. Also, when the contextual embeddings were used, a larger number of test-takers were flagged by the Isolation Forest algorithm, indicating the sensitivity of this algorithm to the dimensionality of input data.
顺序过程数据中的无监督异常检测
摘要:在本研究中,我们提出了三种无监督异常检测方法,以根据考生在解决问题任务中的动作序列来识别异常考生。第一种方法基于从过程数据中提取的原始动作序列,使用隔离森林算法来检测异常应试者。第二种方法使用双向编码器变换器表示(BERT)模型将原始动作序列转换为上下文嵌入,然后应用隔离林算法检测异常应试者。第三种方法的步骤与第二种方法相同,但它在应用 Isolation Forest 算法检测异常情况之前,还包括一个对上下文嵌入进行降维的中间步骤。为了比较三种方法的结果,我们分析了美国样本(n = 2,021)中完成 2012 年成人能力国际评估项目(PIAAC)中 "在技术丰富的环境中解决问题"(PSTRE)部分的应试者的日志文件。结果表明,根据动作序列的表现形式(原始动作序列与上下文嵌入)和维度,不同的应试者群体被标记为异常。此外,当使用上下文嵌入时,有更多的应试者被隔离森林算法标记,这表明该算法对输入数据的维度非常敏感。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信