Integrating granular computing with density estimation for anomaly detection in high-dimensional heterogeneous data

IF 8.1 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Baiyang Chen , Zhong Yuan , Dezhong Peng , Xiaoliang Chen , Hongmei Chen , Yingke Chen
{"title":"Integrating granular computing with density estimation for anomaly detection in high-dimensional heterogeneous data","authors":"Baiyang Chen ,&nbsp;Zhong Yuan ,&nbsp;Dezhong Peng ,&nbsp;Xiaoliang Chen ,&nbsp;Hongmei Chen ,&nbsp;Yingke Chen","doi":"10.1016/j.ins.2024.121566","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting anomalies in complex data is crucial for knowledge discovery and data mining across a wide range of applications. While density-based methods are effective for handling varying data densities and diverse distributions, they often struggle with accurately estimating densities in heterogeneous, uncertain data and capturing interdependencies among features in high-dimensional spaces. This paper proposes a fuzzy granule density-based anomaly detection algorithm (GDAD) for heterogeneous data. Specifically, GDAD first partitions high-dimensional attributes into subspaces based on their interdependencies and employs fuzzy information granules to represent data. The core of the method is the definition of fuzzy granule density, which leverages local neighborhood information alongside global density patterns and effectively characterizes anomalies in data. Each object is then assigned a fuzzy granule density-based anomaly factor, reflecting its likelihood of being anomalous. Through extensive experimentation on various real-world datasets, GDAD has demonstrated superior performance, matching or surpassing existing state-of-the-art methods. GDAD's integration of granular computing with density estimation provides a practical framework for anomaly detection in high-dimensional heterogeneous data.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121566"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014804","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Detecting anomalies in complex data is crucial for knowledge discovery and data mining across a wide range of applications. While density-based methods are effective for handling varying data densities and diverse distributions, they often struggle with accurately estimating densities in heterogeneous, uncertain data and capturing interdependencies among features in high-dimensional spaces. This paper proposes a fuzzy granule density-based anomaly detection algorithm (GDAD) for heterogeneous data. Specifically, GDAD first partitions high-dimensional attributes into subspaces based on their interdependencies and employs fuzzy information granules to represent data. The core of the method is the definition of fuzzy granule density, which leverages local neighborhood information alongside global density patterns and effectively characterizes anomalies in data. Each object is then assigned a fuzzy granule density-based anomaly factor, reflecting its likelihood of being anomalous. Through extensive experimentation on various real-world datasets, GDAD has demonstrated superior performance, matching or surpassing existing state-of-the-art methods. GDAD's integration of granular computing with density estimation provides a practical framework for anomaly detection in high-dimensional heterogeneous data.
将粒度计算与密度估计相结合,在高维异构数据中进行异常检测
检测复杂数据中的异常情况对于知识发现和数据挖掘的广泛应用至关重要。虽然基于密度的方法能有效处理不同的数据密度和多样化分布,但它们往往难以准确估计异构、不确定数据中的密度,也难以捕捉高维空间中特征之间的相互依存关系。本文针对异构数据提出了一种基于模糊颗粒密度的异常检测算法(GDAD)。具体来说,GDAD 首先根据高维属性之间的相互依赖性将其划分为若干子空间,然后采用模糊信息颗粒来表示数据。该方法的核心是模糊颗粒密度的定义,它利用局部邻域信息和全局密度模式,有效地描述数据中的异常情况。然后为每个对象分配一个基于模糊颗粒密度的异常因子,以反映其异常的可能性。通过在各种真实数据集上的广泛实验,GDAD 显示出卓越的性能,与现有的先进方法不相上下,甚至有过之而无不及。GDAD 将颗粒计算与密度估计相结合,为高维异构数据的异常检测提供了一个实用的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信