2013 IEEE 13th International Conference on Data Mining最新文献_第2页

Feature Transformation with Class Conditional Decorrelation 类条件解相关的特征变换

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.43

Xu-Yao Zhang, Kaizhu Huang, Cheng-Lin Liu

{"title":"Feature Transformation with Class Conditional Decorrelation","authors":"Xu-Yao Zhang, Kaizhu Huang, Cheng-Lin Liu","doi":"10.1109/ICDM.2013.43","DOIUrl":"https://doi.org/10.1109/ICDM.2013.43","url":null,"abstract":"The well-known feature transformation model of Fisher linear discriminant analysis (FDA) can be decomposed into an equivalent two-step approach: whitening followed by principal component analysis (PCA) in the whitened space. By proving that whitening is the optimal linear transformation to the Euclidean space in the sense of minimum log-determinant divergence, we propose a transformation model called class conditional decor relation (CCD). The objective of CCD is to diagonalize the covariance matrices of different classes simultaneously, which is efficiently optimized using a modified Jacobi method. CCD is effective to find the common principal components among multiple classes. After CCD, the variables become class conditionally uncorrelated, which will benefit the subsequent classification tasks. Combining CCD with the nearest class mean (NCM) classification model can significantly improve the classification accuracy. Experiments on 15 small-scale datasets and one large-scale dataset (with 3755 classes) demonstrate the scalability of CCD for different applications. We also discuss the potential applications of CCD for other problems such as Gaussian mixture models and classifier ensemble learning.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Statistical Inference of Protein "LEGO Bricks" 蛋白质“乐高积木”的统计推断

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.73

A. S. Konagurthu, L. Allison, D. Abramson, Peter James Stuckey, A. Lesk

{"title":"Statistical Inference of Protein \"LEGO Bricks\"","authors":"A. S. Konagurthu, L. Allison, D. Abramson, Peter James Stuckey, A. Lesk","doi":"10.1109/ICDM.2013.73","DOIUrl":"https://doi.org/10.1109/ICDM.2013.73","url":null,"abstract":"Proteins are biomolecules of life. They fold into a great variety of three-dimensional (3D) shapes. Underlying these folding patterns are many recurrent structural fragments or building blocks (analogous to 'LEGO® bricks'). This paper reports an innovative statistical inference approach to discover a comprehensive dictionary of protein structural building blocks from a large corpus of experimentally determined protein structures. Our approach is built on the Bayesian and information theoretic criterion of minimum message length. To the best of our knowledge, this work is the first systematic and rigorous treatment of a very important data mining problem that arises in the cross-disciplinary area of structural bioinformatics. The quality of the dictionary we find is demonstrated by its explanatory power - any protein within the corpus of known 3D structures can be dissected into successive regions assigned to fragments from this dictionary. This induces a novel one-dimensional representation of three-dimensional protein folding patterns, suitable for application of the rich repertoire of character-string processing algorithms, for rapid identification of folding patterns of newly determined structures. This paper presents the details of the methodology used to infer the dictionary of building blocks, and is supported by illustrative examples to demonstrate its effectiveness and utility.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120959181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Efficient Online Sequence Prediction with Side Information 有效的在线序列预测与侧信息

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.31

Han Xiao, C. Eckert

引用次数: 7

Influence and Profit: Two Sides of the Coin 影响力和利润:硬币的两面

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.40

Yuqing Zhu, Zaixin Lu, Yuanjun Bi, Weili Wu, Yiwei Jiang, Deying Li

{"title":"Influence and Profit: Two Sides of the Coin","authors":"Yuqing Zhu, Zaixin Lu, Yuanjun Bi, Weili Wu, Yiwei Jiang, Deying Li","doi":"10.1109/ICDM.2013.40","DOIUrl":"https://doi.org/10.1109/ICDM.2013.40","url":null,"abstract":"Influence maximization problem is to find a set of seeds in social networks such that the cascade influence is maximized. Traditional models assume all nodes are willing to spread the influence once they are influenced, and they ignore the disparity between influence and profit of a product. In this paper by considering the role that price plays in viral marketing, we propose price related (PR) frame that contains PR-I and PR-L models for classic IC and LT models respectively, which is a pioneer work. We find that influence and profit are like two sides of the coin, high price hinders the influence propagation and to enlarge the influence some sacrifice on profit is inevitable. We propose Balanced Influence and Profit (BIP) maximization problem. We prove the NP-hardness of BIP maximization under PR-I and PR-L model. Unlike influence maximization, the BIP objective function is not monotone. Despite the non-monotony, we show BIP objective function is sub modular under certain conditions. Two unbudgeted greedy algorithms separately are devised. We conduct simulations on real-world datasets and evaluate the superiority of our algorithms over existing ones.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

The Pairwise Gaussian Random Field for High-Dimensional Data Imputation 高维数据输入的成对高斯随机场

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.149

Zhuhua Cai, C. Jermaine, Zografoula Vagena, Dionysios Logothetis, L. Perez

{"title":"The Pairwise Gaussian Random Field for High-Dimensional Data Imputation","authors":"Zhuhua Cai, C. Jermaine, Zografoula Vagena, Dionysios Logothetis, L. Perez","doi":"10.1109/ICDM.2013.149","DOIUrl":"https://doi.org/10.1109/ICDM.2013.149","url":null,"abstract":"In this paper, we consider the problem of imputation (recovering missing values) in very high-dimensional data with an arbitrary covariance structure. The modern solution to this problem is the Gaussian Markov random field (GMRF). The problem with applying a GMRF to very high-dimensional data imputation is that while the GMRF model itself can be useful even for data having tens of thousands of dimensions, utilizing a GMRF requires access to a sparsified, inverse covariance matrix for the data. Computing this matrix using even state-of-the-art methods is very costly, as it typically requires first estimating the covariance matrix from the data (at a O(nm2) cost for m dimensions and n data points) and then performing a regularized inversion of the estimated covariance matrix, which is also very expensive. This is impractical for even moderately-sized, high-dimensional data sets. In this paper, we propose a very simple alternative to the GMRF called the pair wise Gaussian random field or PGRF for short. The PGRF is a graphical, factor-based model. Unlike traditional Gaussian or GMRF models, a PGRF does not require a covariance or correlation matrix as input. Instead, a PGRF takes as input a set of p (dimension, dimension) pairs for which the user suspects there might be a strong correlation or anti-correlation. This set of pairs defines the graphical structure of the model, with a simple Gaussian factor associated with each of the p (dimension, dimension) pairs. Using this structure, it is easy to perform simultaneous inference and imputation of the model. The key benefit of the approach is that the time required for the PGRF to perform inference is approximately linear with respect to p, where p will typically be much smaller than the number of entries in a m×m covariance or precision matrix.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132972497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Blocking Simple and Complex Contagion by Edge Removal 通过边缘移除阻止简单和复杂的传染

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.47

C. Kuhlman, Gaurav Tuli, S. Swarup, M. Marathe, S. Ravi

引用次数: 84

A Probabilistic Behavior Model for Discovering Unrecognized Knowledge 发现未识别知识的概率行为模型

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.65

Takeshi Kurashima, Tomoharu Iwata, Noriko Takaya, H. Sawada

引用次数: 0

Compression-Based Graph Mining Exploiting Structure Primitives 利用结构基元的基于压缩的图挖掘

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.56

Jing Feng, Xiao He, N. Hubig, C. Böhm, C. Plant

{"title":"Compression-Based Graph Mining Exploiting Structure Primitives","authors":"Jing Feng, Xiao He, N. Hubig, C. Böhm, C. Plant","doi":"10.1109/ICDM.2013.56","DOIUrl":"https://doi.org/10.1109/ICDM.2013.56","url":null,"abstract":"How can we retrieve information from sparse graphs? Traditional graph mining approaches focus on discovering dense patterns inside complex networks, for example modularity-based or cut-based methods. However, most real world data sets are very sparse. Nevertheless, traditional approaches tend to omit interesting sparse patterns like stars. In this paper, we propose a novel graph mining technique modeling the transitivity and the hub ness of a graph using structure primitives. We exploit these structure primitives for effective graph compression using the Minimum Description Length Principle. The compression rate is an unbiased measure for the transitivity or hub ness and therefore provides interesting insights into the structure of even very sparse graphs. Since real graphs can be composed of sub graphs of different structures, we propose a novel algorithm CXprime (Compression-based exploiting Primitives) for clustering graphs using our coding scheme as an objective function. In contrast to traditional graph clustering methods, our algorithm automatically recognizes different types of sub graphs without requiring the user to specify input parameters. Additionally we propose a novel link prediction algorithm based on the detected substructures, which increases the quality of former methods. Extensive experiments evaluate our algorithms on synthetic and real data.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122044064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Learning, Analyzing and Predicting Object Roles on Dynamic Networks 动态网络中对象角色的学习、分析和预测

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.95

Kang Li, Suxin Guo, Nan Du, Jing Gao, A. Zhang

{"title":"Learning, Analyzing and Predicting Object Roles on Dynamic Networks","authors":"Kang Li, Suxin Guo, Nan Du, Jing Gao, A. Zhang","doi":"10.1109/ICDM.2013.95","DOIUrl":"https://doi.org/10.1109/ICDM.2013.95","url":null,"abstract":"Dynamic networks are structures with objects and links between the objects that vary in time. Temporal information in dynamic networks can be used to reveal many important phenomena such as bursts of activities in social networks and human communication patterns in email networks. In this area, one very important problem is to understand dynamic patterns of object roles. For instance, will a user become a peripheral node in a social network? Could a website become a hub on the Internet? Will a gene be highly expressed in gene-gene interaction networks in the later stage of a cancer? In this paper, we propose a novel approach that identifies the role of each object, tracks the changes of object roles over time, and predicts the evolving patterns of the object roles in dynamic networks. In particular, a probability model is proposed to extract latent features of object roles from dynamic networks. The extracted latent features are discriminative in learning object roles and are capable of characterizing network structures. The probability model is then extended to learn the dynamic patterns and make predictions on object roles. We assess our method on two data sets on the tasks of exploring how users' importance and political interests evolve as time progresses on dynamic networks. Overall, the extensive experimental evaluations confirm the effectiveness of our approach for identifying, analyzing and predicting object roles on dynamic networks.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

A Novel Relational Learning-to-Rank Approach for Topic-Focused Multi-document Summarization 面向主题的多文档摘要的一种新的关系学习排序方法

2013 IEEE 13th International Conference on Data Mining Pub Date : 2013-12-01 DOI: 10.1109/ICDM.2013.38

Yadong Zhu, Yanyan Lan, J. Guo, Pan Du, Xueqi Cheng

{"title":"A Novel Relational Learning-to-Rank Approach for Topic-Focused Multi-document Summarization","authors":"Yadong Zhu, Yanyan Lan, J. Guo, Pan Du, Xueqi Cheng","doi":"10.1109/ICDM.2013.38","DOIUrl":"https://doi.org/10.1109/ICDM.2013.38","url":null,"abstract":"Topic-focused multi-document summarization aims to produce a summary over a set of documents and conveys the most important aspects of a given topic. Most existing extractive methods view the task as a multi-criteria ranking problem over sentences, where relevance, salience and diversity are three typical requirements. However, diversity is a challenging problem as it involves modeling the relationship between sentences during ranking, where traditional methods usually tackle it in a heuristic or implicit way. In this paper, we propose a novel relational learning-to-rank approach (R-LTR) to solve this problem. Relational learning-to-rank is a new learning framework which further incorporates relationships into traditional learning-to-rank in an elegant way. Specifically, the ranking function is defined as the combination of content-based score of individual sentence, and relation-based score between the current sentence and those already selected. On this basis, we propose to learn the ranking function by minimizing the likelihood loss based on Plackett-Luce model, which can naturally model the sequential ranking procedure of candidate sentences. Stochastic gradient descent is then employed to conduct the learning process, and the summary is predicted by the greedy selection procedure based on the learned ranking function. Finally, we conduct extensive experiments on benchmark data sets TAC2008 and TAC2009. Experimental results show that our approach can significantly outperform the state-of-the-art methods from both quantitative and qualitative aspects.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126930510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13