Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining最新文献_第2页

Learning Linear Dynamical Systems from Multivariate Time Series: A Matrix Factorization Based Framework 从多元时间序列学习线性动力系统:一个基于矩阵分解的框架

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2016-05-01 DOI: 10.1137/1.9781611974348.91

Zitao Liu, M. Hauskrecht

{"title":"Learning Linear Dynamical Systems from Multivariate Time Series: A Matrix Factorization Based Framework","authors":"Zitao Liu, M. Hauskrecht","doi":"10.1137/1.9781611974348.91","DOIUrl":"https://doi.org/10.1137/1.9781611974348.91","url":null,"abstract":"The linear dynamical system (LDS) model is arguably the most commonly used time series model for real-world engineering and financial applications due to its relative simplicity, mathematically predictable behavior, and the fact that exact inference and predictions for the model can be done efficiently. In this work, we propose a new generalized LDS framework, gLDS, for learning LDS models from a collection of multivariate time series (MTS) data based on matrix factorization, which is different from traditional EM learning and spectral learning algorithms. In gLDS, each MTS sequence is factorized as a product of a shared emission matrix and a sequence-specific (hidden) state dynamics, where an individual hidden state sequence is represented with the help of a shared transition matrix. One advantage of our generalized formulation is that various types of constraints can be easily incorporated into the learning process. Furthermore, we propose a novel temporal smoothing regularization approach for learning the LDS model, which stabilizes the model, its learning algorithm and predictions it makes. Experiments on several real-world MTS data show that (1) regular LDS models learned from gLDS are able to achieve better time series predictive performance than other LDS learning algorithms; (2) constraints can be directly integrated into the learning process to achieve special properties such as stability, low-rankness; and (3) the proposed temporal smoothing regularization encourages more stable and accurate predictions.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"22 1","pages":"810-818"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75317811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Binary Classifier Calibration Using an Ensemble of Linear Trend Estimation. 基于线性趋势估计集合的二值分类器标定。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2016-05-01 DOI: 10.1137/1.9781611974348.30

Mahdi Pakdaman Naeini, Gregory F Cooper

{"title":"Binary Classifier Calibration Using an Ensemble of Linear Trend Estimation.","authors":"Mahdi Pakdaman Naeini, Gregory F Cooper","doi":"10.1137/1.9781611974348.30","DOIUrl":"https://doi.org/10.1137/1.9781611974348.30","url":null,"abstract":"Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of linear trend estimation (ELiTE). ELiTE utilizes the recently proposed ℓ1 trend ltering signal approximation method [22] to find the mapping from uncalibrated classification scores to the calibrated probability estimates. ELiTE is designed to address the key limitations of the histogram binning-based calibration methods which are (1) the use of a piecewise constant form of the calibration mapping using bins, and (2) the assumption of independence of predicted probabilities for the instances that are located in different bins. The method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be applied with many existing classification models. We demonstrate the performance of ELiTE on real datasets for commonly used binary classification models. Experimental results show that the method outperforms several common binary-classifier calibration methods. In particular, ELiTE commonly performs statistically significantly better than the other methods, and never worse. Moreover, it is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is practically O(N log N) time, where N is the number of samples.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2016 ","pages":"261-269"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611974348.30","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34868574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework DPClass:一个有效而简洁的基于判别模式的分类框架

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2016-01-01 DOI: 10.1137/1.9781611974348.64

Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han

{"title":"DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework","authors":"Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han","doi":"10.1137/1.9781611974348.64","DOIUrl":"https://doi.org/10.1137/1.9781611974348.64","url":null,"abstract":"Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"22 1","pages":"567-575"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82143524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Tensor Spectral Clustering for Partitioning Higher-order Network Structures 高阶网络结构的张量谱聚类

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2015-02-17 DOI: 10.1137/1.9781611974010.14

Austin R. Benson, D. Gleich, J. Leskovec

{"title":"Tensor Spectral Clustering for Partitioning Higher-order Network Structures","authors":"Austin R. Benson, D. Gleich, J. Leskovec","doi":"10.1137/1.9781611974010.14","DOIUrl":"https://doi.org/10.1137/1.9781611974010.14","url":null,"abstract":"Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"25 1","pages":"118-126"},"PeriodicalIF":0.0,"publicationDate":"2015-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88094205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 108

Binary Classifier Calibration Using a Bayesian Non-Parametric Approach 基于贝叶斯非参数方法的二值分类器标定

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2015-01-01 DOI: 10.1137/1.9781611974010.24

Mahdi Pakdaman Naeini, G. Cooper, M. Hauskrecht

引用次数: 20

Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network 基于图正则元路径的异构信息网络转换回归

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2015-01-01 DOI: 10.1137/1.9781611974010.103

Mengting Wan, Yunbo Ouyang, Lance M. Kaplan, Jiawei Han

引用次数: 18

Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling. 基于重要性抽样的动态特征组加权分类不平衡数据流。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2014-04-01 DOI: 10.1137/1.9781611973440.83

Ke Wu, Andrea Edwards, Wei Fan, Jing Gao, Kun Zhang

{"title":"Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.","authors":"Ke Wu, Andrea Edwards, Wei Fan, Jing Gao, Kun Zhang","doi":"10.1137/1.9781611973440.83","DOIUrl":"https://doi.org/10.1137/1.9781611973440.83","url":null,"abstract":"Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2014 ","pages":"722-730"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.83","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32958341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages. 一个用于监督张量学习的对偶结构保持核及其在神经图像中的应用。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2014-01-01 DOI: 10.1137/1.9781611973440.15

Lifang He, Xiangnan Kong, Philip S Yu, Ann B Ragin, Zhifeng Hao, Xiaowei Yang

{"title":"DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages.","authors":"Lifang He, Xiangnan Kong, Philip S Yu, Ann B Ragin, Zhifeng Hao, Xiaowei Yang","doi":"10.1137/1.9781611973440.15","DOIUrl":"https://doi.org/10.1137/1.9781611973440.15","url":null,"abstract":"With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2014 ","pages":"127-135"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33263013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×. Turbo-SMT: 200倍加速耦合稀疏矩阵张量分解。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2014-01-01 DOI: 10.1137/1.9781611973440.14

Evangelos E Papalexakis, Christos Faloutsos, Tom M Mitchell, Partha Pratim Talukdar, Nicholas D Sidiropoulos, Brian Murphy

{"title":"Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×.","authors":"Evangelos E Papalexakis, Christos Faloutsos, Tom M Mitchell, Partha Pratim Talukdar, Nicholas D Sidiropoulos, Brian Murphy","doi":"10.1137/1.9781611973440.14","DOIUrl":"https://doi.org/10.1137/1.9781611973440.14","url":null,"abstract":"How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200×, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline. We apply TURBO-SMT to BRAINQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. TURBO-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2014 ","pages":"118-126"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34263891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

An Optimization-based Framework to Learn Conditional Random Fields for Multi-label Classification. 基于优化的多标签分类条件随机场学习框架。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2014-01-01 DOI: 10.1137/1.9781611973440.113

Mahdi Pakdaman Naeini, Iyad Batal, Zitao Liu, CharmGil Hong, Milos Hauskrecht

引用次数: 5