{"title":"新指标和LSH算法的无监督,实时异常检测在多方面的数据流","authors":"Samira Khodabandehlou , Alireza Hashemi Golpayegani","doi":"10.1016/j.jestch.2025.102119","DOIUrl":null,"url":null,"abstract":"<div><div>Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"69 ","pages":"Article 102119"},"PeriodicalIF":5.4000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams\",\"authors\":\"Samira Khodabandehlou , Alireza Hashemi Golpayegani\",\"doi\":\"10.1016/j.jestch.2025.102119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.</div></div>\",\"PeriodicalId\":48609,\"journal\":{\"name\":\"Engineering Science and Technology-An International Journal-Jestech\",\"volume\":\"69 \",\"pages\":\"Article 102119\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Science and Technology-An International Journal-Jestech\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215098625001740\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625001740","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams
Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.
期刊介绍:
Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology.
The scope of JESTECH includes a wide spectrum of subjects including:
-Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing)
-Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences)
-Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)