Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining最新文献

筛选
英文 中文
DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework DPClass:一个有效而简洁的基于判别模式的分类框架
Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han
{"title":"DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework","authors":"Jingbo Shang, Wenzhu Tong, Jian Peng, Jiawei Han","doi":"10.1137/1.9781611974348.64","DOIUrl":"https://doi.org/10.1137/1.9781611974348.64","url":null,"abstract":"Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82143524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Tensor Spectral Clustering for Partitioning Higher-order Network Structures 高阶网络结构的张量谱聚类
Austin R. Benson, D. Gleich, J. Leskovec
{"title":"Tensor Spectral Clustering for Partitioning Higher-order Network Structures","authors":"Austin R. Benson, D. Gleich, J. Leskovec","doi":"10.1137/1.9781611974010.14","DOIUrl":"https://doi.org/10.1137/1.9781611974010.14","url":null,"abstract":"Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88094205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach 基于贝叶斯非参数方法的二值分类器标定
Mahdi Pakdaman Naeini, G. Cooper, M. Hauskrecht
{"title":"Binary Classifier Calibration Using a Bayesian Non-Parametric Approach","authors":"Mahdi Pakdaman Naeini, G. Cooper, M. Hauskrecht","doi":"10.1137/1.9781611974010.24","DOIUrl":"https://doi.org/10.1137/1.9781611974010.24","url":null,"abstract":"Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88387226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network 基于图正则元路径的异构信息网络转换回归
Mengting Wan, Yunbo Ouyang, Lance M. Kaplan, Jiawei Han
{"title":"Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network","authors":"Mengting Wan, Yunbo Ouyang, Lance M. Kaplan, Jiawei Han","doi":"10.1137/1.9781611974010.103","DOIUrl":"https://doi.org/10.1137/1.9781611974010.103","url":null,"abstract":"A number of real-world networks are heterogeneous information networks, which are composed of different types of nodes and links. Numerical prediction in heterogeneous information networks is a challenging but significant area because network based information for unlabeled objects is usually limited to make precise estimations. In this paper, we consider a graph regularized meta-path based transductive regression model (Grempt), which combines the principal philosophies of typical graph-based transductive classification methods and transductive regression models designed for homogeneous networks. The computation of our method is time and space efficient and the precision of our model can be verified by numerical experiments.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75447730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Generalized Mixture Framework for Multi-label Classification. 多标签分类的广义混合框架
Charmgil Hong, Iyad Batal, Milos Hauskrecht
{"title":"A Generalized Mixture Framework for Multi-label Classification.","authors":"Charmgil Hong, Iyad Batal, Milos Hauskrecht","doi":"10.1137/1.9781611974010.80","DOIUrl":"10.1137/1.9781611974010.80","url":null,"abstract":"<p><p>We develop a novel probabilistic ensemble framework for multi-label classification that is based on the <i>mixtures-of-experts</i> architecture. In this framework, we combine multi-label classification models in the <i>classifier chains family</i> that decompose the class posterior distribution <i>P</i>(<i>Y</i><sub>1</sub>, …, <i>Y<sub>d</sub></i> |<b>X</b>) using a product of posterior distributions over components of the output space. Our approach captures different input-output and output-output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74134355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling. 基于重要性抽样的动态特征组加权分类不平衡数据流。
Ke Wu, Andrea Edwards, Wei Fan, Jing Gao, Kun Zhang
{"title":"Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.","authors":"Ke Wu,&nbsp;Andrea Edwards,&nbsp;Wei Fan,&nbsp;Jing Gao,&nbsp;Kun Zhang","doi":"10.1137/1.9781611973440.83","DOIUrl":"https://doi.org/10.1137/1.9781611973440.83","url":null,"abstract":"<p><p>Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.83","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32958341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages. 一个用于监督张量学习的对偶结构保持核及其在神经图像中的应用。
Lifang He, Xiangnan Kong, Philip S Yu, Ann B Ragin, Zhifeng Hao, Xiaowei Yang
{"title":"DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages.","authors":"Lifang He,&nbsp;Xiangnan Kong,&nbsp;Philip S Yu,&nbsp;Ann B Ragin,&nbsp;Zhifeng Hao,&nbsp;Xiaowei Yang","doi":"10.1137/1.9781611973440.15","DOIUrl":"https://doi.org/10.1137/1.9781611973440.15","url":null,"abstract":"<p><p>With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (<i>i.e</i>., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33263013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×. Turbo-SMT: 200倍加速耦合稀疏矩阵张量分解。
Evangelos E Papalexakis, Christos Faloutsos, Tom M Mitchell, Partha Pratim Talukdar, Nicholas D Sidiropoulos, Brian Murphy
{"title":"Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200×.","authors":"Evangelos E Papalexakis,&nbsp;Christos Faloutsos,&nbsp;Tom M Mitchell,&nbsp;Partha Pratim Talukdar,&nbsp;Nicholas D Sidiropoulos,&nbsp;Brian Murphy","doi":"10.1137/1.9781611973440.14","DOIUrl":"https://doi.org/10.1137/1.9781611973440.14","url":null,"abstract":"<p><p>How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the <i>Coupled Matrix-Tensor Factorization</i> (CMTF) problem. Can we accelerate <i>any</i> CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of <i>any</i> CMTF algorithm, by up to <i>200</i>×, along with an up to <i>65 fold</i> increase in sparsity, with comparable accuracy to the baseline. We apply TURBO-SMT to BRAINQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. TURBO-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34263891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
An Optimization-based Framework to Learn Conditional Random Fields for Multi-label Classification. 基于优化的多标签分类条件随机场学习框架。
Mahdi Pakdaman Naeini, Iyad Batal, Zitao Liu, CharmGil Hong, Milos Hauskrecht
{"title":"An Optimization-based Framework to Learn Conditional Random Fields for Multi-label Classification.","authors":"Mahdi Pakdaman Naeini,&nbsp;Iyad Batal,&nbsp;Zitao Liu,&nbsp;CharmGil Hong,&nbsp;Milos Hauskrecht","doi":"10.1137/1.9781611973440.113","DOIUrl":"https://doi.org/10.1137/1.9781611973440.113","url":null,"abstract":"<p><p>This paper studies multi-label classification problem in which data instances are associated with multiple, possibly high-dimensional, label vectors. This problem is especially challenging when labels are dependent and one cannot decompose the problem into a set of independent classification problems. To address the problem and properly represent label dependencies we propose and study a pairwise conditional random Field (CRF) model. We develop a new approach for learning the structure and parameters of the CRF from data. The approach maximizes the pseudo likelihood of observed labels and relies on the fast proximal gradient descend for learning the structure and limited memory BFGS for learning the parameters of the model. Empirical results on several datasets show that our approach outperforms several multi-label classification baselines, including recently published state-of-the-art methods.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611973440.113","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33263014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Discriminative Feature Selection for Uncertain Graph Classification. 不确定图分类的判别特征选择。
Xiangnan Kong, Philip S Yu, Xue Wang, Ann B Ragin
{"title":"Discriminative Feature Selection for Uncertain Graph Classification.","authors":"Xiangnan Kong,&nbsp;Philip S Yu,&nbsp;Xue Wang,&nbsp;Ann B Ragin","doi":"10.1137/1.9781611972832.10","DOIUrl":"https://doi.org/10.1137/1.9781611972832.10","url":null,"abstract":"Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the \"presence\" of linkages among the nodes. In many real-world applications, however, the linkage structure of the graphs is inherently uncertain. Therefore, existing measurements of interestingness based upon certain graphs are unable to capture the structural uncertainty in these applications effectively. In this paper, we study the problem of discriminative subgraph feature selection from uncertain graphs. This problem is challenging and different from conventional subgraph mining problems because both the structure of the graph objects and the discrimination score of each subgraph feature are uncertain. To address these challenges, we propose a novel discriminative subgraph feature selection method, Dug, which can find discriminative subgraph features in uncertain graphs based upon different statistical measures including expectation, median, mode and φ-probability. We first compute the probability distribution of the discrimination scores for each subgraph feature based on dynamic programming. Then a branch-and-bound algorithm is proposed to search for discriminative subgraphs efficiently. Extensive experiments on various neuroimaging applications (i.e., Alzheimers Disease, ADHD and HIV) have been performed to analyze the gain in performance by taking into account structural uncertainties in identifying discriminative subgraph features for graph classification.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611972832.10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33282393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信