新指标和LSH算法的无监督,实时异常检测在多方面的数据流

IF 5.4 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Samira Khodabandehlou , Alireza Hashemi Golpayegani
{"title":"新指标和LSH算法的无监督,实时异常检测在多方面的数据流","authors":"Samira Khodabandehlou ,&nbsp;Alireza Hashemi Golpayegani","doi":"10.1016/j.jestch.2025.102119","DOIUrl":null,"url":null,"abstract":"<div><div>Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"69 ","pages":"Article 102119"},"PeriodicalIF":5.4000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams\",\"authors\":\"Samira Khodabandehlou ,&nbsp;Alireza Hashemi Golpayegani\",\"doi\":\"10.1016/j.jestch.2025.102119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.</div></div>\",\"PeriodicalId\":48609,\"journal\":{\"name\":\"Engineering Science and Technology-An International Journal-Jestech\",\"volume\":\"69 \",\"pages\":\"Article 102119\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Science and Technology-An International Journal-Jestech\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215098625001740\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625001740","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

鉴于电子市场上的大量在线交易,我们如何在无人监督的情况下发现欺诈交易者和可疑行为?我们能在恒定的时间和记忆中发现它们吗?由于多方面数据流的规模和复杂性,电子市场中的欺诈检测越来越具有挑战性。本研究介绍了SATrade,这是一种无监督和可扩展的方法,用于在大型多方面数据流中进行实时异常检测。该方法提出了两种新的位置敏感哈希(LSH)函数:高斯投影以保持数值距离和抗碰撞线性哈希以防止分类数据的维数增加。主要贡献包括Collusiveness度量,它通过统计发散分析来检测群体异常,以及RR-ISF,它优先考虑罕见的突发模式。指数衰减机制(λ)确保无需再训练即可适应不断发展的欺诈策略,而PCA处理特征相关性。在5个真实数据集的广泛实验中,使用合成和真实标签,SATrade实现了99%的AUC, 93%的F-measure和0.2 ms/record延迟,这是6种基线方法的显着改进。该框架的可解释性允许追踪异常的欺诈行为,如突然的订单峰值。每条记录的恒定内存消耗为0.25 MB,并且具有线性可扩展性,因此SATrade适用于高频环境和在线平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信