2014 IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Fast and Exact Monitoring of Co-Evolving Data Streams 快速和精确的监测共同发展的数据流
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.62
Yasuko Matsubara, Yasushi Sakurai, N. Ueda, Masatoshi Yoshikawa
{"title":"Fast and Exact Monitoring of Co-Evolving Data Streams","authors":"Yasuko Matsubara, Yasushi Sakurai, N. Ueda, Masatoshi Yoshikawa","doi":"10.1109/ICDM.2014.62","DOIUrl":"https://doi.org/10.1109/ICDM.2014.62","url":null,"abstract":"Given a huge stream of multiple co-evolving sequences, such as motion capture and web-click logs, how can we find meaningful patterns and spot anomalies? Our aim is to monitor data streams statistically, and find sub sequences that have the characteristics of a given hidden Markov model (HMM). For example, consider an online web-click stream, where massive amounts of access logs of millions of users are continuously generated every second. So how can we find meaningful building blocks and typical access patterns such as weekday/weekend patterns, and also, detect anomalies and intrusions? In this paper, we propose Stream Scan, a fast and exact algorithm for monitoring multiple co-evolving data streams. Our method has the following advantages: (a) it is effective, leading to novel discoveries and surprising outliers, (b) it is exact, and we theoretically prove that Stream Scan guarantees the exactness of the output, (c) it is fast, and requires O (1) time and space per time-tick. Our experiments on 67GB of real data illustrate that Stream Scan does indeed detect the qualifying subsequence patterns correctly and that it can offer great improvements in speed (up to 479,000 times) over its competitors.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115129689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Towards Scalable and Accurate Online Feature Selection for Big Data 面向大数据的可扩展、准确的在线特征选择
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1145/2976744
Kui Yu, Xindong Wu, W. Ding, J. Pei
{"title":"Towards Scalable and Accurate Online Feature Selection for Big Data","authors":"Kui Yu, Xindong Wu, W. Ding, J. Pei","doi":"10.1145/2976744","DOIUrl":"https://doi.org/10.1145/2976744","url":null,"abstract":"Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate On Line Approach for feature selection. With a theoretical analysis on a low bound on the pair wise correlations between features in the currently selected feature subset, SAOLA employs novel online pair wise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. An empirical study using a series of benchmark real data sets shows that SAOLA is scalable on data sets of extremely high dimensionality, and has superior performance over the state-of-the-art feature selection methods.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 143
Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data 与普遍的看法相反,增量离散化可以是可靠的,计算效率高,对流数据非常有用
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.123
Geoffrey I. Webb
{"title":"Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data","authors":"Geoffrey I. Webb","doi":"10.1109/ICDM.2014.123","DOIUrl":"https://doi.org/10.1109/ICDM.2014.123","url":null,"abstract":"Discretization of streaming data has received surprisingly little attention. This might be because streaming data require incremental discretization with cut points that may vary over time and this is perceived as undesirable. We argue, to the contrary, that it can be desirable for a discretization to evolve in synchronization with an evolving data stream, even when the learner assumes that attribute values' meanings remain invariant over time. We examine the issues associated with discretization in the context of distribution drift and develop computationally efficient incremental discretization algorithms. We show that discretization can reduce the error of a classical incremental learner and that allowing a discretization to drift in synchronization with distribution drift can further reduce error.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115412083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Janus -- Analytics-Driven Transition Planner Janus——分析驱动的过渡规划师
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.79
Manasi Belhe, K. Shrivastava, M. Natu, V. Sadaphal
{"title":"Janus -- Analytics-Driven Transition Planner","authors":"Manasi Belhe, K. Shrivastava, M. Natu, V. Sadaphal","doi":"10.1109/ICDM.2014.79","DOIUrl":"https://doi.org/10.1109/ICDM.2014.79","url":null,"abstract":"In this paper, we address the problem of transition of IT operations from one service provider to another. We present analytics-driven solutions to generate a transition plan while addressing various aspects such as coverage, risk, time, and cost. We model the IT operations through graphs and use the well defined problems in graph theory to build solutions for transition planner. We demonstrate the proof-of-concept of proposed ideas using a real-world case-study.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123132471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic Models with Topic Ordering Regularities for Topic Segmentation 主题分割中具有主题顺序规律的主题模型
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.49
Lan Du, John K. Pate, Mark Johnson
{"title":"Topic Models with Topic Ordering Regularities for Topic Segmentation","authors":"Lan Du, John K. Pate, Mark Johnson","doi":"10.1109/ICDM.2014.49","DOIUrl":"https://doi.org/10.1109/ICDM.2014.49","url":null,"abstract":"Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"09 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127314902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Classification by CUT: Clearance under Threshold 按CUT分类:阈值下的清除率
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.75
Ryan McBride, Ke Wang, Wenyuan Li
{"title":"Classification by CUT: Clearance under Threshold","authors":"Ryan McBride, Ke Wang, Wenyuan Li","doi":"10.1109/ICDM.2014.75","DOIUrl":"https://doi.org/10.1109/ICDM.2014.75","url":null,"abstract":"Identifying bad objects hidden amidst many good objects is important for public safety and decision-making. These problems are complicated in that the cost of leaving a bad object unidentified may not be specified easily, making it difficult to apply existing cost-sensitive classification that depends on knowing a cost matrix or cost distribution. A compelling case for this \"illusive cost\" issue is presented in our project of identifying contaminated transformers with an industrial partner. To address this problem, we present an alternative formulation of cost-sensitive classification, Clearance Under Threshold (CUT) Classification. Given a training set, CUT classification is to partition the attribute space such that a partition is cleared if the probability of a future object in this partition being bad is less than a user-specified threshold. The goal is to clear many low-risk objects so that users can more effectively target high-risk objects. We present a solution to this problem and evaluate it on a case study for clearing contaminated transformers and on public benchmarks from UC Irvine's Machine Learning Repository. According to the experiments, our algorithms performed far better than the baselines derived from previous classification approaches.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122860724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Anomaly Detection Using the Poisson Process Limit for Extremes 利用泊松过程极值极限进行异常检测
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.12
Stijn Luca, P. Karsmakers, B. Vanrumste
{"title":"Anomaly Detection Using the Poisson Process Limit for Extremes","authors":"Stijn Luca, P. Karsmakers, B. Vanrumste","doi":"10.1109/ICDM.2014.12","DOIUrl":"https://doi.org/10.1109/ICDM.2014.12","url":null,"abstract":"Anomaly detection starts from a model of normal behavior and classifies departures from this model as anomalies. This paper introduces a statistical non-parametric approach for anomaly detection that is based on a multivariate extension of the Poisson point process model for univariate extremes. The method is demonstrated on both a synthetic and a real-world data set, the latter being an unbalanced data set of acceleration data collected from movements of 7 pediatric patients suffering from epilepsy that is previously studied in [1]. The positive predictive values could be improved with an increase up to 12.9% (and a mean of 7%) while the sensitivity scores stayed unaltered. The proposed method was also shown to outperform an one-class SVM classifier. Because the Poisson point process model of extremes is able to combine information on the number of excesses over a fixed threshold with that on the excess values, a powerful model to detect anomalies is obtained that can be of high value in many applications.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"39 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114151355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Social Role Identification via Dual Uncertainty Minimization Regularization 基于双不确定性最小化正则化的社会角色识别
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.31
Yu Cheng, Ankit Agrawal, A. Choudhary, Huan Liu, Zhang Tao
{"title":"Social Role Identification via Dual Uncertainty Minimization Regularization","authors":"Yu Cheng, Ankit Agrawal, A. Choudhary, Huan Liu, Zhang Tao","doi":"10.1109/ICDM.2014.31","DOIUrl":"https://doi.org/10.1109/ICDM.2014.31","url":null,"abstract":"In this paper, we study a challenging problem of inferring individuals' role and statuses in a professional social network, which is of central importance in workforce optimization and human capital management. Realizing the natural setting of social nodes associated with dual view information, i.e., The local node characteristics and the global network influence, we present a novel model that explores graph regularization techniques and integrates such information to achieve improved prediction performance. In particular, our prediction model is built upon the graph transductive learning framework that encodes an uncertainty regularization term in the conventional empirical risk minimization principle. Through taking advantage of the information from both the local profile and the global network characteristics, the final inference of the role or statues achieves minimum an empirical loss on the labeled set, as well as a minimum uncertainty on the unlabeled social nodes. We perform extensive empirical study using real-world data and compare with representative peer approaches. The experimental results on three real social network data sets show that the proposed model greatly outperforms a number of baseline models and is able to effectively infer in a wide range of scenarios.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128376810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Exploiting Heterogeneous Human Mobility Patterns for Intelligent Bus Routing 利用异质人类移动模式实现智能公交路线
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.138
Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, Junjie Wu
{"title":"Exploiting Heterogeneous Human Mobility Patterns for Intelligent Bus Routing","authors":"Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, Junjie Wu","doi":"10.1109/ICDM.2014.138","DOIUrl":"https://doi.org/10.1109/ICDM.2014.138","url":null,"abstract":"Optimal planning for public transportation is one of the keys to sustainable development and better quality of life in urban areas. Compared to private transportation, public transportation uses road space more efficiently and produces fewer accidents and emissions. In this paper, we focus on the identification and optimization of flawed bus routes to improve utilization efficiency of public transportation services, according to people's real demand for public transportation. To this end, we first provide an integrated mobility pattern analysis between the location traces of taxicabs and the mobility records in bus transactions. Based on mobility patterns, we propose a localized transportation mode choice model, with which we can accurately predict the bus travel demand for different bus routing. This model is then used for bus routing optimization which aims to convert as many people from private transportation to public transportation as possible given budget constraints on the bus route modification. We also leverage the model to identify region pairs with flawed bus routes, which are effectively optimized using our approach. To validate the effectiveness of the proposed methods, extensive studies are performed on real world data collected in Beijing which contains 19 million taxi trips and 10 million bus trips.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133412809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression 基于朴素贝叶斯的加速逻辑回归有效预调节器
2014 IEEE International Conference on Data Mining Pub Date : 2014-12-14 DOI: 10.1109/ICDM.2014.53
Nayyar Zaidi, Mark James Carman, J. Cerquides, Geoffrey I. Webb
{"title":"Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression","authors":"Nayyar Zaidi, Mark James Carman, J. Cerquides, Geoffrey I. Webb","doi":"10.1109/ICDM.2014.53","DOIUrl":"https://doi.org/10.1109/ICDM.2014.53","url":null,"abstract":"We propose an alternative parameterization of Logistic Regression (LR) for the categorical data, multi-class setting. LR optimizes the conditional log-likelihood over the training data and is based on an iterative optimization procedure to tune this objective function. The optimization procedure employed may be sensitive to scale and hence an effective pre-conditioning method is recommended. Many problems in machine learning involve arbitrary scales or categorical data (where simple standardization of features is not applicable). The problem can be alleviated by using optimization routines that are invariant to scale such as (second-order) Newton methods. However, computing and inverting the Hessian is a costly procedure and not feasible for big data. Thus one must often rely on first-order methods such as gradient descent (GD), stochastic gradient descent (SGD) or approximate second-order such as quasi-Newton (QN) routines, which are not invariant to scale. This paper proposes a simple yet effective pre-conditioner for speeding-up LR based on naive Bayes conditional probability estimates. The idea is to scale each attribute by the log of the conditional probability of that attribute given the class. This formulation substantially speeds-up LR's convergence. It also provides a weighted naive Bayes formulation which yields an effective framework for hybrid generative-discriminative classification.","PeriodicalId":321600,"journal":{"name":"2014 IEEE International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信