2017 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
A Machine Learning Approach to Non-uniform Spatial Downscaling of Climate Variables 气候变量非均匀空间降尺度的机器学习方法
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.49
Soukayna Mouatadid, S. Easterbrook, A. Erler
{"title":"A Machine Learning Approach to Non-uniform Spatial Downscaling of Climate Variables","authors":"Soukayna Mouatadid, S. Easterbrook, A. Erler","doi":"10.1109/ICDMW.2017.49","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.49","url":null,"abstract":"This study presents a scalable and robust approach to spatial downscaling in the context of climate downscaling. We explore the ability of four techniques to downscale a climate variable to a given location of interest. As an example, we focus on downscaling daily mean air temperature at twelve stations located across the topographically complex province of British Columbia, Canada. The techniques include multi-linear regression (MLR), artificial neural networks (ANN), extreme learning machines (ELM) and long-short term memory networks (LSTM). Our method based on LSTM generalizes well to different locations and leads to higher downscaling accuracy compared to MLR and ELM. The performance of the models is measured based on statistical metrics, including the coefficient of determination, and the root mean square error.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121161221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
An CNN-LSTM Attention Approach to Understanding User Query Intent from Online Health Communities 基于CNN-LSTM的在线健康社区用户查询意图理解方法
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.62
Ruichu Cai, Binjun Zhu, Lei Ji, Tianyong Hao, Jun Yan, Wenyin Liu
{"title":"An CNN-LSTM Attention Approach to Understanding User Query Intent from Online Health Communities","authors":"Ruichu Cai, Binjun Zhu, Lei Ji, Tianyong Hao, Jun Yan, Wenyin Liu","doi":"10.1109/ICDMW.2017.62","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.62","url":null,"abstract":"Understanding user query intent is a crucial task to Question-Answering area. With the development of online health services, online health communities generate huge amount of valuable medical Question-Answering data, where user intention can be mined. However, the queries posted by common users have many domain concepts and colloquial expressions, which make the understanding of user intents very difficult. In this paper, we try to find and predict user intent from the realistic medical text queries. A CNN-LSTM attention model is proposed to predict user intents, and an unsupervised clustering method is applied to mine user intent taxonomy. The CNN-LSTM attention model has a CNN encoders and a Bi-LSTM attention encoder. The two encoder can capture both of global semantic expression and local phrase-level information from an original medical text query, which helps the intent prediction. We also utilize extra knowledge like part-of-speech tags and named entity tags to enrich feature information. Based on the experiments on a health community query intent(HCQI) dataset, we compare our model with baseline models and experiment results demonstrate the effectiveness of our model.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"478 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Improving Multivariate Time Series Forecasting with Random Walks with Restarts on Causality Graphs 因果图上带重启随机游走的多元时间序列预测改进
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.127
Piotr Przymus, Youssef Hmamouche, Alain Casali, L. Lakhal
{"title":"Improving Multivariate Time Series Forecasting with Random Walks with Restarts on Causality Graphs","authors":"Piotr Przymus, Youssef Hmamouche, Alain Casali, L. Lakhal","doi":"10.1109/ICDMW.2017.127","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.127","url":null,"abstract":"Forecasting models that utilize multiple predictors are gaining popularity in a variety of fields. In some cases they allow constructing more precise forecasting models, leveraging the predictive potential of many variables. Unfortunately, in practice we do not know which observed predictors have a direct impact on the target variable. Moreover, adding unrelated variables may diminish the quality of forecasts. Thus, constructing a set of predictor variables that can be used in a forecast model is one of the greatest challenges in forecasting. We propose a new selection model for predictor variables based on the directed causality graph and a modification of the random walk with restarts model. Experiments conducted using the two popular macroeconomics sets, from the US and Australia, show that this simple and scalable approach performs well compared to other well established methods.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Dependency Anomaly Detection for Heterogeneous Time Series: A Granger-Lasso Approach 异构时间序列的依赖异常检测:一种Granger-Lasso方法
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.155
Sahar Behzadi, K. Hlaváčková-Schindler, C. Plant
{"title":"Dependency Anomaly Detection for Heterogeneous Time Series: A Granger-Lasso Approach","authors":"Sahar Behzadi, K. Hlaváčková-Schindler, C. Plant","doi":"10.1109/ICDMW.2017.155","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.155","url":null,"abstract":"The special characteristics of time series data, such as their high dimensionality and complex dependencies between variables make the problem of detecting anomalies in time series very challenging. Anomalies and more precisely dependency anomalies ensue from the temporal causal depen-dencies. Furthermore the graphical Granger causal models provide an appropriate environment to capture all the temporal dependencies in Gaussian time series. However many production systems are characterized by a high degree of complex stochastic processes consisting of heterogeneous time series. Considering this situation discovery of dependency anomalies would be more challenging since almost all the current algorithms are dealing with the homogeneous cases. Granger-Lasso algorithm is a well-known L1 penalization algorithm which copes with the temporal causality detection only for Gaussian time series. Inspired by this algorithm and considering the incremental heterogeneous time series generated in many different industries, we propose a modification for Granger-Lasso algorithm in the sense that it would be applicable for a larger class of heterogeneous time series. To introduce this algorithm we are motivated by generalized linear models. Moreover based on the proposed algorithm for discovery temporal dependencies we introduce its application in anomaly detection considering time series followed by distributions from exponential family, e.g. Poisson, binomial or multinomial distribution. The Granger-Lasso procedure is solved by using least square cost function with Lasso penalty for appropriately transformed input time series. The experimental results illustrate the performance and efficiency of the proposed algorithm on the synthetic and other datasets. We evaluated the proposed method on causality testing on different examples.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"54 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114091541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High Performance Graph Data Management and Mining with X10 高性能图形数据管理和挖掘与X10
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.135
Miyuru Dayarathna
{"title":"High Performance Graph Data Management and Mining with X10","authors":"Miyuru Dayarathna","doi":"10.1109/ICDMW.2017.135","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.135","url":null,"abstract":"Graph data management and mining in HPC environments has been a widely discussed issue in recent times. In this talk I will describe the use of Partitioned Global Address Space languages for graph data mining and management. I will first discuss the rationale behind X10 based graph libraries and graph database benchmarks using ScaleGraph and XGDBench as examples. Next, I will take Acacia which is completely developed with X10 language as an example system and describe our experience with implementing such high performance system with X10. In this talk I will describe how RDF processing and Streaming extensions have been implemented in Acacia. Finally, I will highlight some of the notable areas which need further attention in future.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123393436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Morisita Index: Anomaly Behaviour Detection for Large Scale Tracking Data with Spatio-Temporal Marks Meta-Morisita索引:带时空标记的大规模跟踪数据异常行为检测
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.95
Zhao Yang, N. Japkowicz
{"title":"Meta-Morisita Index: Anomaly Behaviour Detection for Large Scale Tracking Data with Spatio-Temporal Marks","authors":"Zhao Yang, N. Japkowicz","doi":"10.1109/ICDMW.2017.95","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.95","url":null,"abstract":"In this paper, we propose a work flow for processing and analysing large-scale tracking data with spatio-temporal marks that uses an infrastructure for machine learning methods based on a meta-data representation of point patterns. The tracking log (IP address) of cyber security devices usually maps to geolocation and timestamp, such data is called spatiotemporal data. Existing spatio-temporal analysis methods do not include a specific mechanism for analysing meta-data (point pattern information) generated from large-scale tracking data with spatio-temporal marks. In this work, we extend a spatial point pattern analysis method (the Morisita Index) with metadata analysis, which includes anomaly behaviour detection and unsupervised learning to support spatio-temporal data analysis (on both physical and cyber data) and demonstrate its practical use. The resulting work flow has a robust capability to detect anomalies among large-scale tracking data with spatio-temporal marks using meta-data based on point pattern analysis and returns visualized reports to end users.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"48 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125723963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Steered Microaggregation: A Unified Primitive for Anonymization of Data Sets and Data Streams 导向微聚合:数据集和数据流匿名化的统一原语
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.141
J. Domingo-Ferrer, Jordi Soria-Comas
{"title":"Steered Microaggregation: A Unified Primitive for Anonymization of Data Sets and Data Streams","authors":"J. Domingo-Ferrer, Jordi Soria-Comas","doi":"10.1109/ICDMW.2017.141","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.141","url":null,"abstract":"The data anonymization landscape has become quite complex in the last decades. On the methodology side, the statistical disclosure control methods designed in official statistics have been supplemented by a number of privacy models proposed by computer scientists. On the data side, static data sets now coexist with big data, and particularly data streams. In the quest for a unified and conceptually simple anonymization approach, we present here a primitive called steered microaggregation that can be tailored to enforce various privacy models both on static data sets and also on data streams. This type of microaggregation relies on adding artificial attributes that are properly initialized and weighted in order to steer the microaggregation process into meeting certain desired constraints. Although not limited to these, we demonstrate the potential of steered microaggregation by showing how it can be used to achieve t-closeness in the context of static data sets and to achieve k-anonymity of data streams while controlling tuple reordering.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125732982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Feasible Direction Method for Optimization Problem with Orthogonal Constraint in Feature Selection 特征选择中正交约束优化问题的可行方向方法
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.114
Jianyu Miao, Yong Shi, Lingfeng Niu
{"title":"A Feasible Direction Method for Optimization Problem with Orthogonal Constraint in Feature Selection","authors":"Jianyu Miao, Yong Shi, Lingfeng Niu","doi":"10.1109/ICDMW.2017.114","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.114","url":null,"abstract":"Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Since acquiring labeled data is particularly expensive in both time and effort, unsupervised feature selection on unlabeled data has recently gained considerable attention. Without label information, unsupervised feature selection needs alternative criteria to define feature relevance. We propose a novel unsupervised feature selection model, which embeds feature selection into nonnegative spectral clustering. A tailored optimization algorithm based on Alternating Direction Method of Multipliers (ADMM) is designed to solve the proposed model. Many previous unsupervised feature selection methods used singular value decompose (SVD) to handle the subproblem with orthogonal constraint. Generally, the scale of the matrix in feature selection is significantly big, the computation of SVD will be very slow or even infeasible. To address this issue, we propose to use a feasible direction method to efficiently solve the subproblem with orthogonal constraint. The experimental study shows that we can obtain better performance compared with the state of the art methods.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129280638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized Anonymization for Set-Valued Data by Partial Suppression 集值数据的部分抑制个性化匿名化
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.142
Takuma Nakagawa, Hiromi Arai, Hiroshi Nakagawa
{"title":"Personalized Anonymization for Set-Valued Data by Partial Suppression","authors":"Takuma Nakagawa, Hiromi Arai, Hiroshi Nakagawa","doi":"10.1109/ICDMW.2017.142","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.142","url":null,"abstract":"Set-valued data is comprised of records that are sets of items, such as goods purchased by each individual. Methods of publishing and widely utilizing set-valued data while protecting personal information have been extensively studied in the field of privacy-preserving data publishing. Until now, basic models such as k-anonymity or km-anonymity could not cope with attribute inference by an adversary with background knowledge of the records. On the other hand, the ρ-uncertainty model makes it possible to prevent attribute inference with a confidence value above a certain level in set-valued data. However, even in that case, there is the problem that items to be protected have to be designated in advance. In this research, we propose a new model that can provide more suitable privacy protection for each individual by protecting different items designated for each record distinctively and build a heuristic algorithm to achieve this guarantee using partial suppression. In addition, considering the problem that the computational complexity of the algorithm increases combinatorially with increasing data size, we introduce the concept of probabilistic relaxation of privacy guarantee. Finally, we show the experimental results of evaluating the performance of the algorithms using real-world datasets.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128005357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring Transfer Learning for Crime Prediction 探索犯罪预测的迁移学习
2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.165
Xiangyu Zhao, Jiliang Tang
{"title":"Exploring Transfer Learning for Crime Prediction","authors":"Xiangyu Zhao, Jiliang Tang","doi":"10.1109/ICDMW.2017.165","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.165","url":null,"abstract":"Crime prediction plays a crucial role in addressing crime, violence, conflict and insecurity in cities to promote good governance, appropriate urban planning and management. Plenty efforts have been made on developing crime prediction models by leveraging demographic data, but they failed to capture the dynamic nature of crimes in urban. Recently, with the development of new techniques for collecting and integrating fine-grained crime-related datasets, there is a potential to obtain better understandings about the dynamics of crimes and advance crime prediction. However, for a city, it is hard to build a uniform framework for all boroughs due to the uneven distribution of data. To this end, in this paper, we exploit spatio-temporal patterns in urban data in one borough in a city, and then leverage transfer learning techniques to reinforce the crime prediction of other boroughs. Specifically, we first validate the existence of spatio-temporal patterns in urban crime. Then we extract the crime-related features from cross-domain datasets. Finally we propose a novel transfer learning framework to integrate these features and model spatio-temporal patterns for crime prediction.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115960258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信