2017 IEEE International Conference on Data Mining (ICDM)最新文献_第3页

Reputation-Based Ranking Systems and Their Resistance to Bribery 基于声誉的排名系统及其对贿赂的抵抗力

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.139

João Saúde, Guilherme Ramos, Carlos Caleiro, S. Kar

引用次数: 17

Scalable Hashing-Based Network Discovery 可扩展的基于哈希的网络发现

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.50

Tara Safavi, C. Sripada, Danai Koutra

{"title":"Scalable Hashing-Based Network Discovery","authors":"Tara Safavi, C. Sripada, Danai Koutra","doi":"10.1109/ICDM.2017.50","DOIUrl":"https://doi.org/10.1109/ICDM.2017.50","url":null,"abstract":"Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127870977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.67

Changchang Yin, B. Qian, Shilei Cao, Xiaoyu Li, Jishang Wei, Q. Zheng, I. Davidson

{"title":"Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation","authors":"Changchang Yin, B. Qian, Shilei Cao, Xiaoyu Li, Jishang Wei, Q. Zheng, I. Davidson","doi":"10.1109/ICDM.2017.67","DOIUrl":"https://doi.org/10.1109/ICDM.2017.67","url":null,"abstract":"Active learning aims to reduce manual labeling efforts by proactively selecting the most informative unlabeled instances to query. In real-world scenarios, it's often more practical to query a batch of instances rather than a single one at each iteration. To achieve this we need to keep not only the informativeness of the instances but also their diversity. Many heuristic methods have been proposed to tackle batch mode active learning problems, however, they suffer from two limitations which if addressed would significantly improve the query strategy. Firstly, the similarity amongst instances is simply calculated using the feature vectors rather than being jointly learned with the classification model. This weakens the accuracy of the diversity measurement. Secondly, these methods usually exploit the decision boundary by querying the data points close to it. However, this can be inefficient when the labeled set is too small to reveal the true boundary. In this paper, we address both limitations by proposing a deep neural network based algorithm. In the training phase, a pairwise deep network is not only trained to perform classification, but also to project data points into another space, where the similarity can be more precisely measured. In the query selection phase, the learner selects a set of instances that are maximally uncertain and minimally redundant (exploitation), as well as are most diverse from the labeled instances (exploration). We evaluate the effectiveness of the proposed method on a variety of classification tasks: MNIST classification, opinion polarity detection, and heart failure prediction. Our method outperforms the baselines with both higher classification accuracy and faster convergence rate.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117018557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Bayesian Optimization in Weakly Specified Search Space 弱指定搜索空间中的贝叶斯优化

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.44

Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, S. Venkatesh

{"title":"Bayesian Optimization in Weakly Specified Search Space","authors":"Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, S. Venkatesh","doi":"10.1109/ICDM.2017.44","DOIUrl":"https://doi.org/10.1109/ICDM.2017.44","url":null,"abstract":"Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO has successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization - if a space is too large, we can miss the optimum with a limited budget, on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of \"weakly specified\" search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. Wedevelop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and tworeal-world applications which demonstrate the benefits of our proposed approach.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132563881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Epidemic Forecasting Framework Combining Agent-Based Models and Smart Beam Particle Filtering 结合智能体模型和智能束粒子滤波的流行病预测框架

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.145

Farzaneh Tabataba, B. L. Lewis, M. Hosseinipour, F. Tabataba, S. Venkatramanan, Jiangzhuo Chen, D. Higdon, M. Marathe

{"title":"Epidemic Forecasting Framework Combining Agent-Based Models and Smart Beam Particle Filtering","authors":"Farzaneh Tabataba, B. L. Lewis, M. Hosseinipour, F. Tabataba, S. Venkatramanan, Jiangzhuo Chen, D. Higdon, M. Marathe","doi":"10.1109/ICDM.2017.145","DOIUrl":"https://doi.org/10.1109/ICDM.2017.145","url":null,"abstract":"Over the past decades, numerous techniques have been developed to forecast the temporal evolution of epidemic outbreaks. This paper proposes an approach that combines high resolution agent-based models using realistic social contact networks for simulating epidemic evolution with a particle filter based method for assimilation based forecasting. Agent-based modeling using realistic social contact networks provides two key advantages: (i) they capture the causal processes underlying the epidemic and hence are useful to understand the role of interventions on the course of the epidemics – typically time series models cannot capture this and as a result often do not perform well in such situations; (ii) they provide detailed forecast information – this allows us to produce forecast at high levels of temporal, spatial and social granularity. We also propose a new variation of particle filter technique called beam search particle filtering. The modification allows us to more efficiently search the parameter space which is necessitated by the fact that agent-based techniques are computationally expensive. We illustrate our methodology on the synthetic dataset of Ebola provided as a part of the NSF/NIH Ebola forecasting challenge. Our results show the efficacy of the proposed approach and suggest that agent-based causal models can be combined with filtering techniques to yield a new class of assimilation models for infectious disease forecasting.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134190198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Risk Control of Best Arm Identification in Multi-armed Bandits via Successive Rejects 基于连续拒绝的多武装盗匪最佳武器识别风险控制

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.153

Xiaotian Yu, Irwin King, Michael R. Lyu

引用次数: 3

Benchmark Generator for Dynamic Overlapping Communities in Networks 网络中动态重叠社区的基准生成器

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.51

Neha Sengupta, M. Hamann, D. Wagner

引用次数: 9

Multi-task Survival Analysis 多任务生存分析

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.58

Lu Wang, Yan Li, Jiayu Zhou, D. Zhu, Jieping Ye

{"title":"Multi-task Survival Analysis","authors":"Lu Wang, Yan Li, Jiayu Zhou, D. Zhu, Jieping Ye","doi":"10.1109/ICDM.2017.58","DOIUrl":"https://doi.org/10.1109/ICDM.2017.58","url":null,"abstract":"Collecting labeling information of time-to-event analysis is naturally very time consuming, i.e., one has to wait for the occurrence of the event of interest, which may not always be observed for every instance. By taking advantage of censored instances, survival analysis methods internally consider more samples than standard regression methods, which partially alleviates this data insufficiency problem. Whereas most existing survival analysis models merely focus on a single survival prediction task, when there are multiple related survival prediction tasks, we may benefit from the tasks relatedness. Simultaneously learning multiple related tasks, multi-task learning (MTL) provides a paradigm to alleviate data insufficiency by bridging data from all tasks and improves generalization performance of all tasks involved. Even though MTL has been extensively studied, there is no existing work investigating MTL for survival analysis. In this paper, we propose a novel multi-task survival analysis framework that takes advantage of both censored instances and task relatedness. Specifically, based on two common used task relatedness assumptions, i.e., low-rank assumption and cluster structure assumption, we formulate two concrete models, COX-TRACE and COX-cCMTL, under the proposed framework, respectively. We develop efficient algorithms and demonstrate the performance of the proposed multi-task survival analysis models on the The Cancer Genome Atlas (TCGA) dataset. Our results show that the proposed approaches can significantly improve the prediction performance in survival analysis and can also discover some inherent relationships among different cancer types.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131961448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Synchronization-Inspired Co-Clustering and Its Application to Gene Expression Data 同步启发的共聚类及其在基因表达数据中的应用

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.141

Junming Shao, Chongming Gao, Weishan Zeng, Jingkuan Song, Qinli Yang

引用次数: 10

Dynamic Propagation Rates: New Dimension to Viral Marketing in Online Social Networks 动态传播率:在线社交网络病毒式营销的新维度

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.132

Tianyi Pan, Alan Kuhnle, Xiang Li, M. Thai

{"title":"Dynamic Propagation Rates: New Dimension to Viral Marketing in Online Social Networks","authors":"Tianyi Pan, Alan Kuhnle, Xiang Li, M. Thai","doi":"10.1109/ICDM.2017.132","DOIUrl":"https://doi.org/10.1109/ICDM.2017.132","url":null,"abstract":"Online Social Networks (OSNs) are effective platforms for viral marketing. Due to their importance, viral marketing related problems in OSNs have been extensively studied in the past decade. However, none of the existing works can cope with the situation that the propagation rate dynamically increases for popular topics, as they all assume known propagation rates. In this paper, to better describe realistic information propagation in OSNs, we propose a novel model, Dynamic Influence Propagation (DIP), that allows propagation rate to change during the diffusion. We then define a new research problem: Threshold Activation Problem under DIP (TAP-DIP) to study the impact of DIP. TAP-DIP adds extra complexity on the already #P-hard TAP problem. Despite it hardness, we are able to approximate TAP-DIP with O(log|V|) ratio. Sitting in the core of our algorithm are the Lipschitz optimization technique and a novel solution to the general version of TAP, the Multi-TAP problem. Using various real OSN datasets, we experimentally demonstrate the impact of DIP and that our solution not only generates high-quality seed sets when being aware of the rate increase, but also is scalable.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129680229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4