2017 IEEE International Conference on Data Mining (ICDM)最新文献

HiMuV: Hierarchical Framework for Modeling Multi-modality Multi-resolution Data HiMuV:多模态多分辨率数据建模的分层框架

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-12-15 DOI: 10.1109/ICDM.2017.36

Jianbo Li, Jingrui He, Yada Zhu

{"title":"HiMuV: Hierarchical Framework for Modeling Multi-modality Multi-resolution Data","authors":"Jianbo Li, Jingrui He, Yada Zhu","doi":"10.1109/ICDM.2017.36","DOIUrl":"https://doi.org/10.1109/ICDM.2017.36","url":null,"abstract":"Many real-world applications are characterized by temporal data collected from multiple modalities, each sampled with a different resolution. Examples include manufacturing processes and financial market prediction. In these applications, an interesting observation is that within the same modality, we often have data from multiple views, thus naturally forming a 2-level hierarchy: with the multiple modalities on the top, and the multiple views at the bottom. For example, in aluminum smelting processes, the multiple modalities include power, noise, alumina feed, etc; and within the same modality such as power, the different views correspond to various voltage, current and resistance control signals and measured responses. For such applications, we aim to address the following challenge, i.e., how can we integrate such multi-modality multi-resolution data to effectively predict the targets of interest, such as bath temperature in aluminum smelting cell and the volatility in financial market. In this paper, for the first time, we simultaneously model the hierarchical data structure and the multi-resolution property via a novel framework named HiMuV. Different from existing work based on multiple views on a single level or a single resolution, the proposed framework is based on the key assumption that the information from different modalities is complementary, whereas the information within the same modality (across different views) is redundant in terms of predicting the targets of interest. Therefore, we introduce an optimization framework where the objective function contains both the prediction loss and a novel regularizer enforcing the consistency among different views within the same modality. To solve this optimization framework, we propose an iterative algorithm based on randomized block coordinate descent. Experimental results on synthetic data, benchmark data, and various real data sets from aluminum smelting processes, and stock market prediction demonstrate the effectiveness and efficiency of the proposed algorithm.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114565347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DPiSAX: Massively Distributed Partitioned iSAX DPiSAX:大规模分布式分区iSAX

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-18 DOI: 10.1109/ICDM.2017.151

D. Yagoubi, Reza Akbarinia, F. Masseglia, Themis Palpanas

引用次数: 49

Efficient Mining of Subsample-Stable Graph Patterns 子样本稳定图模式的高效挖掘

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-18 DOI: 10.1109/ICDM.2017.88

A. Buzmakov, S. Kuznetsov, A. Napoli

引用次数: 7

Data-Driven Immunization 数据驱动的免疫

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.71

Yao Zhang, A. Ramanathan, A. Vullikanti, L. Pullum, B. Prakash

{"title":"Data-Driven Immunization","authors":"Yao Zhang, A. Ramanathan, A. Vullikanti, L. Pullum, B. Prakash","doi":"10.1109/ICDM.2017.71","DOIUrl":"https://doi.org/10.1109/ICDM.2017.71","url":null,"abstract":"Given a contact network and coarse-grained diagnostic information like electronic Healthcare Reimbursement Claims (eHRC) data, can we develop efficient intervention policies to control an epidemic? Immunization is an important problem in multiple areas especially epidemiology and public health. However, most existing studies focus on developing pre-emptive strategies assuming prior epidemiological models. In practice, disease spread is usually complicated, hence assuming an underlying model may deviate from true spreading patterns, leading to possibly inaccurate interventions. Additionally, the abundance of health care surveillance data (like eHRC) makes it possible to study data-driven strategies without too many restrictive assumptions. Hence, such an approach can help public-health experts take more practical decisions. In this paper, we take into account propagation log and contact networks for controlling propagation. We formulate the novel and challenging Data-Driven Immunization problem without assuming classical epidemiological models. To solve it, we first propose an efficient sampling approach to align surveillance data with contact networks, then develop an efficient algorithm with the provably approximate guarantee for immunization. Finally, we show the effectiveness and scalability of our methods via extensive experiments on multiple datasets, and conduct case studies on nation-wide real medical surveillance data.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115268038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Multi-level Multi-task Learning for Modeling Cross-Scale Interactions in Nested Geospatial Data 嵌套地理空间数据中跨尺度交互建模的多层次多任务学习

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.154

Shuai Yuan, Jiayu Zhou, P. Tan, C. E. Fergus, T. Wagner, P. Soranno

{"title":"Multi-level Multi-task Learning for Modeling Cross-Scale Interactions in Nested Geospatial Data","authors":"Shuai Yuan, Jiayu Zhou, P. Tan, C. E. Fergus, T. Wagner, P. Soranno","doi":"10.1109/ICDM.2017.154","DOIUrl":"https://doi.org/10.1109/ICDM.2017.154","url":null,"abstract":"Predictive modeling of nested geospatial data is a challenging problem as the models must take into account potential interactions among variables defined at different spatial scales. These cross-scale interactions, as they are commonly known, are particularly important to understand relationships among ecological properties at macroscales. In this paper, we present a novel, multi-level multi-task learning framework for modeling nested geospatial data in the lake ecology domain. Specifically, we consider region-specific models to predict lake water quality from multi-scaled factors. Our framework enables distinct models to be developed for each region using both its local and regional information. The framework also allows information to be shared among the region-specific models through their common set of latent factors. Such information sharing helps to create more robust models especially for regions with limited or no training data. In addition, the framework can automatically determine cross-scale interactions between the regional variables and the local variables that are nested within them. Our experimental results show that the proposed framework outperforms all the baseline methods in at least 64% of the regions for 3 out of 4 lake water quality datasets evaluated in this study. Furthermore, the latent factors can be clustered to obtain a new set of regions that is more aligned with the response variables than the original regions that were defined a priori from the ecology domain.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"46 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117063919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning Doubly Stochastic Affinity Matrix via Davis-Kahan Theorem 利用Davis-Kahan定理学习双随机亲和矩阵

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.47

Jiwoong Park, Taejeong Kim

引用次数: 3

Multi-task Multi-modal Models for Collective Anomaly Detection 集体异常检测的多任务多模态模型

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.27

T. Idé, D. Phan, J. Kalagnanam

{"title":"Multi-task Multi-modal Models for Collective Anomaly Detection","authors":"T. Idé, D. Phan, J. Kalagnanam","doi":"10.1109/ICDM.2017.27","DOIUrl":"https://doi.org/10.1109/ICDM.2017.27","url":null,"abstract":"This paper proposes a new framework for anomaly detection when collectively monitoring many complex systems. The prerequisite for condition-based monitoring in industrial applications is the capability of (1) capturing multiple operational states, (2) managing many similar but different assets, and (3) providing insights into the internal relationship of the variables. To meet these criteria, we propose a multi-task learning approach based on a sparse mixture of sparse Gaussian graphical models (GGMs). Unlike existing fused- and group-lasso-based approaches, each task is represented by a sparse mixture of sparse GGMs, and can handle multi-modalities. We develop a variational inference algorithm combined with a novel sparse mixture weight selection algorithm. To handle issues in the conventional automatic relevance determination (ARD) approach, we propose a new ℓ0-regularized formulation that has guaranteed sparsity in mixture weights. We show that our framework eliminates well-known issues of numerical instability in the iterative procedure of mixture model learning. We also show better performance in anomaly detection tasks on real-world data sets. To the best of our knowledge, this is the first proposal of multi-task GGM learning allowing multi-modal distributions.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125427306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Recurrent Encoder-Decoder Networks for Time-Varying Dense Prediction 时变密集预测的循环编码器-解码器网络

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.156

Tao Zeng, Bian Wu, Jiayu Zhou, I. Davidson, Shuiwang Ji

{"title":"Recurrent Encoder-Decoder Networks for Time-Varying Dense Prediction","authors":"Tao Zeng, Bian Wu, Jiayu Zhou, I. Davidson, Shuiwang Ji","doi":"10.1109/ICDM.2017.156","DOIUrl":"https://doi.org/10.1109/ICDM.2017.156","url":null,"abstract":"Dense prediction is concerned with predicting a label for each of the input units, such as pixels of an image. Accurate dense prediction for time-varying inputs finds applications in a variety of domains, such as video analysis and medical imaging. Such tasks need to preserve both spatial and temporal structures that are consistent with the inputs. Despite the success of deep learning methods in a wide range of artificial intelligence tasks, time-varying dense prediction is still a less explored domain. Here, we proposed a general encoder-decoder network architecture that aims to addressing time-varying dense prediction problems. Given that there are both intra-image spatial structure information and temporal context information to be processed simultaneously in such tasks, we integrated fully convolutional networks (FCNs) with recurrent neural networks (RNNs) to build a recurrent encoder-decoder network. The proposed network is capable of jointly processing two types of information. Specifically, we developed convolutional RNN (CRNN) to allow dense sequence processing. More importantly, we designed CRNNbottleneck modules for alleviating the excessive computational cost incurred by carrying out multiple convolutions in the CRNN layer. This novel design is shown to be a critical innovation in building very flexible and efficient deep models for timevarying dense prediction. Altogether, the proposed model handles time-varying information with the CRNN layers and spatial structure information with the FCN architectures. The multiple heterogeneous modules can be integrated into the same network, which can be trained end-to-end to perform time-varying dense prediction. Experimental results showed that our model is able to capture both high-resolution spatial information and relatively low-resolution temporal information as compared to other existing models.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129936193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Learning with Inadequate and Incorrect Supervision 监督不充分和不正确的学习

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.110

Chen Gong, Hengmin Zhang, Jian Yang, D. Tao

{"title":"Learning with Inadequate and Incorrect Supervision","authors":"Chen Gong, Hengmin Zhang, Jian Yang, D. Tao","doi":"10.1109/ICDM.2017.110","DOIUrl":"https://doi.org/10.1109/ICDM.2017.110","url":null,"abstract":"Practically, we are often in the dilemma that the labeled data at hand are inadequate to train a reliable classifier, and more seriously, some of these labeled data may be mistakenly labeled due to the various human factors. Therefore, this paper proposes a novel semi-supervised learning paradigm that can handle both label insufficiency and label inaccuracy. To address label insufficiency, we use a graph to bridge the data points so that the label information can be propagated from the scarce labeled examples to unlabeled examples along the graph edges. To address label inaccuracy, Graph Trend Filtering (GTF) and Smooth Eigenbase Pursuit (SEP) are adopted to filter out the initial noisy labels. GTF penalizes the l_0 norm of label difference between connected examples in the graph and exhibits better local adaptivity than the traditional l_2 norm-based Laplacian smoother. SEP reconstructs the correct labels by emphasizing the leading eigenvectors of Laplacian matrix associated with small eigenvalues, as these eigenvectors reflect real label smoothness and carry rich class separation cues. We term our algorithm as \"Semi-supervised learning under Inadequate and Incorrect Supervision\" (SIIS). Thorough experimental results on image classification, text categorization, and speech recognition demonstrate that our SIIS is effective in label error correction, leading to superior performance to the state-of-the-art methods in the presence of label noise and label scarcity.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129120457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

AnySCAN: An Efficient Anytime Framework with Active Learning for Large-Scale Network Clustering 基于主动学习的大规模网络聚类框架

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.76

Weizhong Zhao, Gang Chen, Xiaowei Xu

引用次数: 10