Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献_第3页

Depth-Based Novelty Detection and Its Application to Taxonomic Research 基于深度的新颖性检测及其在分类学研究中的应用

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.10

Yixin Chen, H. Bart, Xin Dang, Hanxiang Peng

{"title":"Depth-Based Novelty Detection and Its Application to Taxonomic Research","authors":"Yixin Chen, H. Bart, Xin Dang, Hanxiang Peng","doi":"10.1109/ICDM.2007.10","DOIUrl":"https://doi.org/10.1109/ICDM.2007.10","url":null,"abstract":"It is estimated that less than 10 percent of the world's species have been described, yet species are being lost daily due to human destruction of natural habitats. The job of describing the earth's remaining species is exacerbated by the shrinking number of practicing taxonomists and the very slow pace of traditional taxonomic research. In this article, we tackle, from a novelty detection perspective, one of the most important and challenging research objectives in taxonomy - new species identification. We propose a unique and efficient novelty detection framework based on statistical depth functions. Statistical depth functions provide from the \"deepest\" point a \"center-outward ordering\" of multidimensional data. In this sense, they can detect observations that appear extreme relative to the rest of the observations, i.e., novelty. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. We propose a novel statistical depth, the kernelized spatial depth (KSD) that generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. Observations with depth values less than a threshold are declared as novel. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. We give an upper bound on the false alarm probability of a depth-based detector, which can be used to determine the threshold. Experimental study demonstrates its excellent potential in new species discovery.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134360682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Co-ranking Authors and Documents in a Heterogeneous Network 异构网络中作者和文献的共同排名

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.57

Ding Zhou, Sergey A. Orshanskiy, H. Zha, C. Lee Giles

引用次数: 269

Statistical Learning Algorithm for Tree Similarity 树相似度的统计学习算法

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.38

A. Takasu, Daiji Fukagawa, T. Akutsu

引用次数: 12

Semi-supervised Document Clustering via Active Learning with Pairwise Constraints 基于两两约束主动学习的半监督文档聚类

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.79

Ruizhang Huang, Wai Lam

引用次数: 31

Efficient Discovery of Frequent Approximate Sequential Patterns 频繁近似序列模式的有效发现

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.75

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu

引用次数: 39

Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases 事务数据库中频繁容错模式的统计信息挖掘

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.48

Ardian Kristanto Poernomo, V. Gopalkrishnan

{"title":"Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases","authors":"Ardian Kristanto Poernomo, V. Gopalkrishnan","doi":"10.1109/ICDM.2007.48","DOIUrl":"https://doi.org/10.1109/ICDM.2007.48","url":null,"abstract":"Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such \"faults\" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term \"pattern\" is distinguished from \"item- set\" as referring to a pair (tidset times itemset). Unlike classical frequent patterns, the number of FFT- patterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-of- view; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFT- patterns. Then we present our approach - the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128659311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

On Meta-Learning Rule Learning Heuristics 元学习规则学习启发式研究

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.51

Frederik Janssen, Johannes Fürnkranz

引用次数: 14

Bayesian Folding-In with Dirichlet Kernels for PLSI PLSI的Dirichlet核贝叶斯折叠

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.15

Alexander Hinneburg, H. Gabriel, André Gohr

引用次数: 12

Language-Independent Set Expansion of Named Entities Using the Web 使用Web的与语言无关的命名实体集扩展

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/icdm.2007.104

Richard C. Wang, William W. Cohen

引用次数: 218

Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery 子维基元检测:一种有效的广义多元模式发现算法

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.52

David C. Minnen, C. Isbell, Irfan Essa, Thad Starner

引用次数: 87