2014 IEEE International Conference on Data Mining最新文献

Fast and Exact Monitoring of Co-Evolving Data Streams 快速和精确的监测共同发展的数据流

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.62

Yasuko Matsubara, Yasushi Sakurai, N. Ueda, Masatoshi Yoshikawa

{"title":"Fast and Exact Monitoring of Co-Evolving Data Streams","authors":"Yasuko Matsubara, Yasushi Sakurai, N. Ueda, Masatoshi Yoshikawa","doi":"10.1109/ICDM.2014.62","DOIUrl":"https://doi.org/10.1109/ICDM.2014.62","url":null,"abstract":"Given a huge stream of multiple co-evolving sequences, such as motion capture and web-click logs, how can we find meaningful patterns and spot anomalies? Our aim is to monitor data streams statistically, and find sub sequences that have the characteristics of a given hidden Markov model (HMM). For example, consider an online web-click stream, where massive amounts of access logs of millions of users are continuously generated every second. So how can we find meaningful building blocks and typical access patterns such as weekday/weekend patterns, and also, detect anomalies and intrusions? In this paper, we propose Stream Scan, a fast and exact algorithm for monitoring multiple co-evolving data streams. Our method has the following advantages: (a) it is effective, leading to novel discoveries and surprising outliers, (b) it is exact, and we theoretically prove that Stream Scan guarantees the exactness of the output, (c) it is fast, and requires O (1) time and space per time-tick. Our experiments on 67GB of real data illustrate that Stream Scan does indeed detect the qualifying subsequence patterns correctly and that it can offer great improvements in speed (up to 479,000 times) over its competitors.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115129689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Towards Scalable and Accurate Online Feature Selection for Big Data 面向大数据的可扩展、准确的在线特征选择

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1145/2976744

Kui Yu, Xindong Wu, W. Ding, J. Pei

引用次数: 143

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.123

Geoffrey I. Webb

引用次数: 17

Janus -- Analytics-Driven Transition Planner Janus——分析驱动的过渡规划师

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.79

Manasi Belhe, K. Shrivastava, M. Natu, V. Sadaphal

引用次数: 0

Topic Models with Topic Ordering Regularities for Topic Segmentation 主题分割中具有主题顺序规律的主题模型

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.49

Lan Du, John K. Pate, Mark Johnson

引用次数: 5

Classification by CUT: Clearance under Threshold 按CUT分类:阈值下的清除率

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.75

Ryan McBride, Ke Wang, Wenyuan Li

{"title":"Classification by CUT: Clearance under Threshold","authors":"Ryan McBride, Ke Wang, Wenyuan Li","doi":"10.1109/ICDM.2014.75","DOIUrl":"https://doi.org/10.1109/ICDM.2014.75","url":null,"abstract":"Identifying bad objects hidden amidst many good objects is important for public safety and decision-making. These problems are complicated in that the cost of leaving a bad object unidentified may not be specified easily, making it difficult to apply existing cost-sensitive classification that depends on knowing a cost matrix or cost distribution. A compelling case for this \"illusive cost\" issue is presented in our project of identifying contaminated transformers with an industrial partner. To address this problem, we present an alternative formulation of cost-sensitive classification, Clearance Under Threshold (CUT) Classification. Given a training set, CUT classification is to partition the attribute space such that a partition is cleared if the probability of a future object in this partition being bad is less than a user-specified threshold. The goal is to clear many low-risk objects so that users can more effectively target high-risk objects. We present a solution to this problem and evaluate it on a case study for clearing contaminated transformers and on public benchmarks from UC Irvine's Machine Learning Repository. According to the experiments, our algorithms performed far better than the baselines derived from previous classification approaches.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122860724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Anomaly Detection Using the Poisson Process Limit for Extremes 利用泊松过程极值极限进行异常检测

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.12

Stijn Luca, P. Karsmakers, B. Vanrumste

引用次数: 9

Social Role Identification via Dual Uncertainty Minimization Regularization 基于双不确定性最小化正则化的社会角色识别

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.31

Yu Cheng, Ankit Agrawal, A. Choudhary, Huan Liu, Zhang Tao

{"title":"Social Role Identification via Dual Uncertainty Minimization Regularization","authors":"Yu Cheng, Ankit Agrawal, A. Choudhary, Huan Liu, Zhang Tao","doi":"10.1109/ICDM.2014.31","DOIUrl":"https://doi.org/10.1109/ICDM.2014.31","url":null,"abstract":"In this paper, we study a challenging problem of inferring individuals' role and statuses in a professional social network, which is of central importance in workforce optimization and human capital management. Realizing the natural setting of social nodes associated with dual view information, i.e., The local node characteristics and the global network influence, we present a novel model that explores graph regularization techniques and integrates such information to achieve improved prediction performance. In particular, our prediction model is built upon the graph transductive learning framework that encodes an uncertainty regularization term in the conventional empirical risk minimization principle. Through taking advantage of the information from both the local profile and the global network characteristics, the final inference of the role or statues achieves minimum an empirical loss on the labeled set, as well as a minimum uncertainty on the unlabeled social nodes. We perform extensive empirical study using real-world data and compare with representative peer approaches. The experimental results on three real social network data sets show that the proposed model greatly outperforms a number of baseline models and is able to effectively infer in a wide range of scenarios.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128376810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Exploiting Heterogeneous Human Mobility Patterns for Intelligent Bus Routing 利用异质人类移动模式实现智能公交路线

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.138

Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, Junjie Wu

{"title":"Exploiting Heterogeneous Human Mobility Patterns for Intelligent Bus Routing","authors":"Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, Junjie Wu","doi":"10.1109/ICDM.2014.138","DOIUrl":"https://doi.org/10.1109/ICDM.2014.138","url":null,"abstract":"Optimal planning for public transportation is one of the keys to sustainable development and better quality of life in urban areas. Compared to private transportation, public transportation uses road space more efficiently and produces fewer accidents and emissions. In this paper, we focus on the identification and optimization of flawed bus routes to improve utilization efficiency of public transportation services, according to people's real demand for public transportation. To this end, we first provide an integrated mobility pattern analysis between the location traces of taxicabs and the mobility records in bus transactions. Based on mobility patterns, we propose a localized transportation mode choice model, with which we can accurately predict the bus travel demand for different bus routing. This model is then used for bus routing optimization which aims to convert as many people from private transportation to public transportation as possible given budget constraints on the bus route modification. We also leverage the model to identify region pairs with flawed bus routes, which are effectively optimized using our approach. To validate the effectiveness of the proposed methods, extensive studies are performed on real world data collected in Beijing which contains 19 million taxi trips and 10 million bus trips.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133412809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression 基于朴素贝叶斯的加速逻辑回归有效预调节器

2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.53

Nayyar Zaidi, Mark James Carman, J. Cerquides, Geoffrey I. Webb

{"title":"Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression","authors":"Nayyar Zaidi, Mark James Carman, J. Cerquides, Geoffrey I. Webb","doi":"10.1109/ICDM.2014.53","DOIUrl":"https://doi.org/10.1109/ICDM.2014.53","url":null,"abstract":"We propose an alternative parameterization of Logistic Regression (LR) for the categorical data, multi-class setting. LR optimizes the conditional log-likelihood over the training data and is based on an iterative optimization procedure to tune this objective function. The optimization procedure employed may be sensitive to scale and hence an effective pre-conditioning method is recommended. Many problems in machine learning involve arbitrary scales or categorical data (where simple standardization of features is not applicable). The problem can be alleviated by using optimization routines that are invariant to scale such as (second-order) Newton methods. However, computing and inverting the Hessian is a costly procedure and not feasible for big data. Thus one must often rely on first-order methods such as gradient descent (GD), stochastic gradient descent (SGD) or approximate second-order such as quasi-Newton (QN) routines, which are not invariant to scale. This paper proposes a simple yet effective pre-conditioner for speeding-up LR based on naive Bayes conditional probability estimates. The idea is to scale each attribute by the log of the conditional probability of that attribute given the class. This formulation substantially speeds-up LR's convergence. It also provides a weighted naive Bayes formulation which yields an effective framework for hybrid generative-discriminative classification.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15