数据挖掘中复杂依赖结构的激励:以气候异常检测为例

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.37

S. Kao, A. Ganguly, K. Steinhaeuser

{"title":"数据挖掘中复杂依赖结构的激励:以气候异常检测为例","authors":"S. Kao, A. Ganguly, K. Steinhaeuser","doi":"10.1109/ICDMW.2009.37","DOIUrl":null,"url":null,"abstract":"While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate\",\"authors\":\"S. Kao, A. Ganguly, K. Steinhaeuser\",\"doi\":\"10.1109/ICDMW.2009.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.\",\"PeriodicalId\":351078,\"journal\":{\"name\":\"2009 IEEE International Conference on Data Mining Workshops\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2009.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

数据挖掘旨在从海量、高维的数据集中发现隐藏的知识，而对时间、空间、变量之间的依赖结构的重视程度较低。与使用概率密度函数对单个变量建模类似，现在可以通过应用copula在数学上描述完整的依赖空间。采用copula可以构造多元联合概率分布，而不受特定类型边际分布的约束。一些常见的假设，如变量之间的正态性和独立性，也可以被解除。本研究提供了相关结构的基本介绍和说明，旨在探讨copula在一般数据挖掘中的潜在适用性。水文气候异常检测的实例研究表明，变量间的依赖程度会影响多变量异常的频率。适当的多变量阈值可以通过基于公式的方法确定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Motivating Complex Dependence Structures in Data Mining: A Case Study with Anomaly Detection in Climate

While data mining aims to identify hidden knowledge from massive and high dimensional datasets, the importance of dependence structure among time, space, and between different variables is less emphasized. Analogous to the use of probability density functions in modeling individual variables, it is now possible to characterize the complete dependence space mathematically through the application of copulas. By adopting copulas, the multivariate joint probability distribution can be constructed without constraint to specific types of marginal distributions. Some common assumptions, like normality and independence between variables, can also be relieved. This study provides fundamental introduction and illustration of dependence structure, aimed at the potential applicability of copulas in general data mining. The case study in hydro-climatic anomaly detection shows that the frequency of multivariate anomalies is affected by the dependence level between variables. The appropriate multivariate thresholds can be determined through a copula-based approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量