João Saúde, Guilherme Ramos, Carlos Caleiro, S. Kar
{"title":"Reputation-Based Ranking Systems and Their Resistance to Bribery","authors":"João Saúde, Guilherme Ramos, Carlos Caleiro, S. Kar","doi":"10.1109/ICDM.2017.139","DOIUrl":"https://doi.org/10.1109/ICDM.2017.139","url":null,"abstract":"We study bribery resistance properties in two classes of reputation-based ranking systems, where the rankings are computed by weighting the rates given by users with their reputations. In the first class, the rankings are the result of the aggregation of all the ratings, and all users are provided with the same ranking for each item. In the second class, there is a first step that clusters users by their rating pattern similarities, and then the rankings are computed cluster-wise. Hence, for each item, there is a different ranking for distinct clusters. We study the setting where the seller of each item can bribe users to rate the item, if they did not rate it before, or to increase their previous rating on the item. We model bribing strategies under these ranking scenarios and explore under which conditions it is profitable to bribe a user, presenting, in several cases, the optimal bribing strategies. By computing dedicated rankings to each cluster, we show that bribing, in general, is not as profitable as in the simpler without clustering. Finally, we illustrate our results with experiments using real data.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129801800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Hashing-Based Network Discovery","authors":"Tara Safavi, C. Sripada, Danai Koutra","doi":"10.1109/ICDM.2017.50","DOIUrl":"https://doi.org/10.1109/ICDM.2017.50","url":null,"abstract":"Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127870977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changchang Yin, B. Qian, Shilei Cao, Xiaoyu Li, Jishang Wei, Q. Zheng, I. Davidson
{"title":"Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation","authors":"Changchang Yin, B. Qian, Shilei Cao, Xiaoyu Li, Jishang Wei, Q. Zheng, I. Davidson","doi":"10.1109/ICDM.2017.67","DOIUrl":"https://doi.org/10.1109/ICDM.2017.67","url":null,"abstract":"Active learning aims to reduce manual labeling efforts by proactively selecting the most informative unlabeled instances to query. In real-world scenarios, it's often more practical to query a batch of instances rather than a single one at each iteration. To achieve this we need to keep not only the informativeness of the instances but also their diversity. Many heuristic methods have been proposed to tackle batch mode active learning problems, however, they suffer from two limitations which if addressed would significantly improve the query strategy. Firstly, the similarity amongst instances is simply calculated using the feature vectors rather than being jointly learned with the classification model. This weakens the accuracy of the diversity measurement. Secondly, these methods usually exploit the decision boundary by querying the data points close to it. However, this can be inefficient when the labeled set is too small to reveal the true boundary. In this paper, we address both limitations by proposing a deep neural network based algorithm. In the training phase, a pairwise deep network is not only trained to perform classification, but also to project data points into another space, where the similarity can be more precisely measured. In the query selection phase, the learner selects a set of instances that are maximally uncertain and minimally redundant (exploitation), as well as are most diverse from the labeled instances (exploration). We evaluate the effectiveness of the proposed method on a variety of classification tasks: MNIST classification, opinion polarity detection, and heart failure prediction. Our method outperforms the baselines with both higher classification accuracy and faster convergence rate.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117018557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, S. Venkatesh
{"title":"Bayesian Optimization in Weakly Specified Search Space","authors":"Vu Nguyen, Sunil Gupta, Santu Rana, Cheng Li, S. Venkatesh","doi":"10.1109/ICDM.2017.44","DOIUrl":"https://doi.org/10.1109/ICDM.2017.44","url":null,"abstract":"Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO has successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization - if a space is too large, we can miss the optimum with a limited budget, on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of \"weakly specified\" search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. Wedevelop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and tworeal-world applications which demonstrate the benefits of our proposed approach.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132563881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farzaneh Tabataba, B. L. Lewis, M. Hosseinipour, F. Tabataba, S. Venkatramanan, Jiangzhuo Chen, D. Higdon, M. Marathe
{"title":"Epidemic Forecasting Framework Combining Agent-Based Models and Smart Beam Particle Filtering","authors":"Farzaneh Tabataba, B. L. Lewis, M. Hosseinipour, F. Tabataba, S. Venkatramanan, Jiangzhuo Chen, D. Higdon, M. Marathe","doi":"10.1109/ICDM.2017.145","DOIUrl":"https://doi.org/10.1109/ICDM.2017.145","url":null,"abstract":"Over the past decades, numerous techniques have been developed to forecast the temporal evolution of epidemic outbreaks. This paper proposes an approach that combines high resolution agent-based models using realistic social contact networks for simulating epidemic evolution with a particle filter based method for assimilation based forecasting. Agent-based modeling using realistic social contact networks provides two key advantages: (i) they capture the causal processes underlying the epidemic and hence are useful to understand the role of interventions on the course of the epidemics – typically time series models cannot capture this and as a result often do not perform well in such situations; (ii) they provide detailed forecast information – this allows us to produce forecast at high levels of temporal, spatial and social granularity. We also propose a new variation of particle filter technique called beam search particle filtering. The modification allows us to more efficiently search the parameter space which is necessitated by the fact that agent-based techniques are computationally expensive. We illustrate our methodology on the synthetic dataset of Ebola provided as a part of the NSF/NIH Ebola forecasting challenge. Our results show the efficacy of the proposed approach and suggest that agent-based causal models can be combined with filtering techniques to yield a new class of assimilation models for infectious disease forecasting.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134190198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk Control of Best Arm Identification in Multi-armed Bandits via Successive Rejects","authors":"Xiaotian Yu, Irwin King, Michael R. Lyu","doi":"10.1109/ICDM.2017.153","DOIUrl":"https://doi.org/10.1109/ICDM.2017.153","url":null,"abstract":"Best arm identification in stochastic Multi-Armed Bandits (MAB) has become an essential variant in the research line of bandits for decision-making problems. In previous work, the best arm usually refers to an arm with the highest expected payoff in a given decision-arm set. However, in many practical scenarios, it would be more important and desirable to incorporate the risk of an arm into the best decision. In this paper, motivated by practical applications with risk via bandits, we investigate the problem of Risk Control of Best Arm Identification (RCBAI) in stochastic MAB. Based on the technique of Successive Rejects (SR), we show that the error resulting from the mean-variance estimation is sub-Gamma by setting mild assumptions on stochastic payoffs of arms. Besides, we develop an algorithm named as RCMAB. SR, and derive an upper bound for the probability of error for RCBAI in stochastic MAB. We demonstrate the superiority of the RCMAB. SR algorithm in synthetic datasets, and then apply the RCMAB. SR algorithm in financial data for yearly investments to show its superiority for practical applications.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117297602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmark Generator for Dynamic Overlapping Communities in Networks","authors":"Neha Sengupta, M. Hamann, D. Wagner","doi":"10.1109/ICDM.2017.51","DOIUrl":"https://doi.org/10.1109/ICDM.2017.51","url":null,"abstract":"We describe a dynamic graph generator with overlapping communities that is capable of simulating community scale events while at the same time maintaining crucial graph properties. Such a benchmark generator is useful to measure and compare the responsiveness and efficiency of dynamic community detection algorithms. Since the generator allows the user to tune multiple parameters, it can also be used to test the robustness of a community detection algorithm across a spectrum of inputs. In an experimental evaluation, we demonstrate the generator's performance and show that graph properties are indeed maintained over time. Further, we show that standard community detection algorithms are able to find the generated community structure. To the best of our knowledge, this is the first time that all of the above have been combined into one benchmark generator, and this work constitutes an important building block for the development of efficient and reliable dynamic, overlapping community detection algorithms.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126205489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task Survival Analysis","authors":"Lu Wang, Yan Li, Jiayu Zhou, D. Zhu, Jieping Ye","doi":"10.1109/ICDM.2017.58","DOIUrl":"https://doi.org/10.1109/ICDM.2017.58","url":null,"abstract":"Collecting labeling information of time-to-event analysis is naturally very time consuming, i.e., one has to wait for the occurrence of the event of interest, which may not always be observed for every instance. By taking advantage of censored instances, survival analysis methods internally consider more samples than standard regression methods, which partially alleviates this data insufficiency problem. Whereas most existing survival analysis models merely focus on a single survival prediction task, when there are multiple related survival prediction tasks, we may benefit from the tasks relatedness. Simultaneously learning multiple related tasks, multi-task learning (MTL) provides a paradigm to alleviate data insufficiency by bridging data from all tasks and improves generalization performance of all tasks involved. Even though MTL has been extensively studied, there is no existing work investigating MTL for survival analysis. In this paper, we propose a novel multi-task survival analysis framework that takes advantage of both censored instances and task relatedness. Specifically, based on two common used task relatedness assumptions, i.e., low-rank assumption and cluster structure assumption, we formulate two concrete models, COX-TRACE and COX-cCMTL, under the proposed framework, respectively. We develop efficient algorithms and demonstrate the performance of the proposed multi-task survival analysis models on the The Cancer Genome Atlas (TCGA) dataset. Our results show that the proposed approaches can significantly improve the prediction performance in survival analysis and can also discover some inherent relationships among different cancer types.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131961448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junming Shao, Chongming Gao, Weishan Zeng, Jingkuan Song, Qinli Yang
{"title":"Synchronization-Inspired Co-Clustering and Its Application to Gene Expression Data","authors":"Junming Shao, Chongming Gao, Weishan Zeng, Jingkuan Song, Qinli Yang","doi":"10.1109/ICDM.2017.141","DOIUrl":"https://doi.org/10.1109/ICDM.2017.141","url":null,"abstract":"In this paper, we propose a new synchronization-inspired co-clustering algorithm by dynamic simulation, called CoSync, which aims to discover biologically relevant subgroups embedding in a given gene expression data matrix. The basic idea is to view a gene expression data matrix as a dynamical system, and the weighted two-sided interactions are imposed on each element of the matrix from both aspects of genes and conditions, resulting in the values of all element in a co-cluster synchronizing together. Experiments show that our algorithm allows uncovering high-quality co-clusterings embedded in gene expression data sets and has its superiority over many state-of-the-art algorithms.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Propagation Rates: New Dimension to Viral Marketing in Online Social Networks","authors":"Tianyi Pan, Alan Kuhnle, Xiang Li, M. Thai","doi":"10.1109/ICDM.2017.132","DOIUrl":"https://doi.org/10.1109/ICDM.2017.132","url":null,"abstract":"Online Social Networks (OSNs) are effective platforms for viral marketing. Due to their importance, viral marketing related problems in OSNs have been extensively studied in the past decade. However, none of the existing works can cope with the situation that the propagation rate dynamically increases for popular topics, as they all assume known propagation rates. In this paper, to better describe realistic information propagation in OSNs, we propose a novel model, Dynamic Influence Propagation (DIP), that allows propagation rate to change during the diffusion. We then define a new research problem: Threshold Activation Problem under DIP (TAP-DIP) to study the impact of DIP. TAP-DIP adds extra complexity on the already #P-hard TAP problem. Despite it hardness, we are able to approximate TAP-DIP with O(log|V|) ratio. Sitting in the core of our algorithm are the Lipschitz optimization technique and a novel solution to the general version of TAP, the Multi-TAP problem. Using various real OSN datasets, we experimentally demonstrate the impact of DIP and that our solution not only generates high-quality seed sets when being aware of the rate increase, but also is scalable.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129680229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}