Integrating granular computing with density estimation for anomaly detection in high-dimensional heterogeneous data

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2024-10-18 DOI:10.1016/j.ins.2024.121566

Baiyang Chen , Zhong Yuan , Dezhong Peng , Xiaoliang Chen , Hongmei Chen , Yingke Chen

{"title":"Integrating granular computing with density estimation for anomaly detection in high-dimensional heterogeneous data","authors":"Baiyang Chen , Zhong Yuan , Dezhong Peng , Xiaoliang Chen , Hongmei Chen , Yingke Chen","doi":"10.1016/j.ins.2024.121566","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting anomalies in complex data is crucial for knowledge discovery and data mining across a wide range of applications. While density-based methods are effective for handling varying data densities and diverse distributions, they often struggle with accurately estimating densities in heterogeneous, uncertain data and capturing interdependencies among features in high-dimensional spaces. This paper proposes a fuzzy granule density-based anomaly detection algorithm (GDAD) for heterogeneous data. Specifically, GDAD first partitions high-dimensional attributes into subspaces based on their interdependencies and employs fuzzy information granules to represent data. The core of the method is the definition of fuzzy granule density, which leverages local neighborhood information alongside global density patterns and effectively characterizes anomalies in data. Each object is then assigned a fuzzy granule density-based anomaly factor, reflecting its likelihood of being anomalous. Through extensive experimentation on various real-world datasets, GDAD has demonstrated superior performance, matching or surpassing existing state-of-the-art methods. GDAD's integration of granular computing with density estimation provides a practical framework for anomaly detection in high-dimensional heterogeneous data.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121566"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014804","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Detecting anomalies in complex data is crucial for knowledge discovery and data mining across a wide range of applications. While density-based methods are effective for handling varying data densities and diverse distributions, they often struggle with accurately estimating densities in heterogeneous, uncertain data and capturing interdependencies among features in high-dimensional spaces. This paper proposes a fuzzy granule density-based anomaly detection algorithm (GDAD) for heterogeneous data. Specifically, GDAD first partitions high-dimensional attributes into subspaces based on their interdependencies and employs fuzzy information granules to represent data. The core of the method is the definition of fuzzy granule density, which leverages local neighborhood information alongside global density patterns and effectively characterizes anomalies in data. Each object is then assigned a fuzzy granule density-based anomaly factor, reflecting its likelihood of being anomalous. Through extensive experimentation on various real-world datasets, GDAD has demonstrated superior performance, matching or surpassing existing state-of-the-art methods. GDAD's integration of granular computing with density estimation provides a practical framework for anomaly detection in high-dimensional heterogeneous data.

查看原文本刊更多论文

将粒度计算与密度估计相结合，在高维异构数据中进行异常检测

检测复杂数据中的异常情况对于知识发现和数据挖掘的广泛应用至关重要。虽然基于密度的方法能有效处理不同的数据密度和多样化分布，但它们往往难以准确估计异构、不确定数据中的密度，也难以捕捉高维空间中特征之间的相互依存关系。本文针对异构数据提出了一种基于模糊颗粒密度的异常检测算法（GDAD）。具体来说，GDAD 首先根据高维属性之间的相互依赖性将其划分为若干子空间，然后采用模糊信息颗粒来表示数据。该方法的核心是模糊颗粒密度的定义，它利用局部邻域信息和全局密度模式，有效地描述数据中的异常情况。然后为每个对象分配一个基于模糊颗粒密度的异常因子，以反映其异常的可能性。通过在各种真实数据集上的广泛实验，GDAD 显示出卓越的性能，与现有的先进方法不相上下，甚至有过之而无不及。GDAD 将颗粒计算与密度估计相结合，为高维异构数据的异常检测提供了一个实用的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.