数据挖掘中复杂依赖结构的激励:以气候异常检测为例

S. Kao, A. Ganguly, K. Steinhaeuser
{"title":"数据挖掘中复杂依赖结构的激励:以气候异常检测为例","authors":"S. Kao, A. Ganguly, K. Steinhaeuser","doi":"10.1109/ICDMW.2009.37","DOIUrl":null,"url":null,"abstract":"While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate\",\"authors\":\"S. Kao, A. Ganguly, K. Steinhaeuser\",\"doi\":\"10.1109/ICDMW.2009.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.\",\"PeriodicalId\":351078,\"journal\":{\"name\":\"2009 IEEE International Conference on Data Mining Workshops\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2009.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

数据挖掘旨在从海量、高维的数据集中发现隐藏的知识,而对时间、空间、变量之间的依赖结构的重视程度较低。与使用概率密度函数对单个变量建模类似,现在可以通过应用copula在数学上描述完整的依赖空间。采用copula可以构造多元联合概率分布,而不受特定类型边际分布的约束。一些常见的假设,如变量之间的正态性和独立性,也可以被解除。本研究提供了相关结构的基本介绍和说明,旨在探讨copula在一般数据挖掘中的潜在适用性。水文气候异常检测的实例研究表明,变量间的依赖程度会影响多变量异常的频率。适当的多变量阈值可以通过基于公式的方法确定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate
While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信