2014 IEEE International Conference on Data Mining Workshop最新文献

筛选
英文 中文
Real-time Dynamic Visualization Techniques for Massive Geospatial Data 海量地理空间数据的实时动态可视化技术
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.174
Zhou Ya’nan, H. Xiaodong, Li Jun
{"title":"Real-time Dynamic Visualization Techniques for Massive Geospatial Data","authors":"Zhou Ya’nan, H. Xiaodong, Li Jun","doi":"10.1109/ICDMW.2014.174","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.174","url":null,"abstract":"Visualization is a research hotspot in Digital Earth. Available Digital Earth platforms, such as Google Earth and World Wind could only display the prebuilt tiles of geospatial data statically, due to inability to deal with massive geospatial data (especially the remotely sensed imagery) and to adapt for diversified configurations of visualization in real time. In this paper, we propose a complete technical solution for real-time dynamic visualization of massive geospatial data, and mainly focus on the following aspects: a) On the rendering nodes, we build a pyramid model-'Fish File' to store a single remotely sensed imagery, to improve imagery reading efficiency, b) On the visualization server, we adopt a 'distributed storage, centralized management' strategy to organize and manage massive geospatial data, and further introduce a 'data-performance-consistent' schedule scheme to speed the response of servers, c) On the clients, the slicing cache storage and cache update mechanism are proposed for rapid switchover of map layers and geospatial analysis. Following the solution, we constructed a visualization platform. And we compare the performance of our platform with that of the state-of-the-art in rendering of nodes, in response of servers, and in efficiency of platforms, and show screenshots of real-time dynamic visualization.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Ensemble and Application in HST Dataset 聚类集成及其在HST数据集中的应用
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.143
Wenchao Xiao, Yan Yang, Hongjun Wang, Yingge Xu
{"title":"Clustering Ensemble and Application in HST Dataset","authors":"Wenchao Xiao, Yan Yang, Hongjun Wang, Yingge Xu","doi":"10.1109/ICDMW.2014.143","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.143","url":null,"abstract":"Clustering ensemble is an important part of ensemble learning. It aims to study and integrate multiple clustering results from different clustering algorithms or same algorithm with different initial parameters for the same dataset. CHAMELEON is a hierarchical clustering algorithm which can discover natural clusters of different shapes and sizes as the result of its merging decision dynamically adapts to the different clustering model characterized. Inspired by the idea of CHAMELEON, the paper proposes a novel clustering ensemble model including semi-supervised method and discusses its application in fault diagnosis of high speed train (HST) running gear. The model is divided into three phases. Phase 1 is constructing a sparse graph through similarity matrix which aggregates multiple clustering results. Phase 2 is partitioning the sparse graph (vertex = object, edge weight = similarity) into a large number of relatively small sub-clusters. Phase 3 is obtaining the final clustering partition by merging these sub-clusters repeatedly. The experimental results demonstrate that our method out-performs some of state-of-the-art ensemble algorithms regarding the accuracy and stability and recognizes fault patterns of HST running gear effectively.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Identifying Team Style in Soccer Using Formations Learned from Spatiotemporal Tracking Data 利用从时空跟踪数据中学习的阵型来识别足球中的团队风格
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.167
Alina Bialkowski, P. Lucey, Peter Carr, Yisong Yue, S. Sridharan, I. Matthews
{"title":"Identifying Team Style in Soccer Using Formations Learned from Spatiotemporal Tracking Data","authors":"Alina Bialkowski, P. Lucey, Peter Carr, Yisong Yue, S. Sridharan, I. Matthews","doi":"10.1109/ICDMW.2014.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.167","url":null,"abstract":"To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., Shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"53 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Joint Propagation and Refinement for Mining Opinion Words and Targets 意见词与目标挖掘的联合传播与细化
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.66
Qiyun Zhao, Hao Wang, Pin Lv
{"title":"Joint Propagation and Refinement for Mining Opinion Words and Targets","authors":"Qiyun Zhao, Hao Wang, Pin Lv","doi":"10.1109/ICDMW.2014.66","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.66","url":null,"abstract":"This paper proposes a novel Joint Propagation and Refinement (JPR) method to extract opinion words and targets. We adopt a growing heuristic method to extract new opinion words and targets in two parallel processes: propagation and refinement. In the propagation process, we generate the candidate sets of opinion words and targets and construct Sentiment Graph Model (SGM) to evaluate the relations between opinion words and targets. We employ statistical word co-occurrence and dependency patterns to identify these relations. In addition, we discover new patterns by the newly extracted opinion words and targets, which can capture opinion relations more precisely in the case of informal texts. In the refinement process, we prune false results and update model iteratively. We employ Automatic Rule Refinement (ARR) to refine the rules of extraction, which means to refine the rule to extract false results. By using false results pruning and ARR process, we can efficiently alleviate the error propagation problem in traditional bootstrapping based methods. Experimental results on both English and Chinese datasets demonstrate the effectiveness of our method.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122154012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Discretizing Numerical Attributes in Decision Tree for Big Data Analysis 面向大数据分析的决策树数值属性离散化
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.103
Yiqun Zhang, Yiu-ming Cheung
{"title":"Discretizing Numerical Attributes in Decision Tree for Big Data Analysis","authors":"Yiqun Zhang, Yiu-ming Cheung","doi":"10.1109/ICDMW.2014.103","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.103","url":null,"abstract":"The decision tree induction learning is a typical machine learning approach which has been extensively applied for data mining and knowledge discovery. For numerical data and mixed data, discretization is an essential pre-processing step of decision tree learning. However, when coping with big data, most of the existing discretization approaches will not be quite efficient from the practical viewpoint. Accordingly, we propose a new discretization method based on windowing and hierarchical clustering to improve the performance of conventional decision tree for big data analysis. The proposed method not only provides a faster process of discretizing numerical attributes with the competent classification accuracy, but also reduces the size of the decision tree. Experiments show the efficacy of the proposed method on the real data sets.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121856879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
GlucoGuide: An Intelligent Type-2 Diabetes Solution Using Data Mining and Mobile Computing GlucoGuide:使用数据挖掘和移动计算的智能2型糖尿病解决方案
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.177
Yan Luo, C. Ling, Jody Schuurman, R. Petrella
{"title":"GlucoGuide: An Intelligent Type-2 Diabetes Solution Using Data Mining and Mobile Computing","authors":"Yan Luo, C. Ling, Jody Schuurman, R. Petrella","doi":"10.1109/ICDMW.2014.177","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.177","url":null,"abstract":"Type-2 Diabetes (T2D) is a dreadful disease affecting hundreds of millions of people worldwide, and is linked and worsen by unhealthy lifestyles. However, managing T2D effectively with lifestyle change remains highly challenging for both T2D patients and doctors. In this paper, we proposed, built, and evaluated a personalized diabetes recommendation system, called Gluco Guide for T2D patients. Gluco Guide conveniently aggregates a variety of lifestyle data via medical sensors and mobile devices, mines the data with a novel data-mining framework, and outputs personalized and timely recommendations to patients aimed to control their blood glucose level. To evaluate its clinical efficiency, we conducted a three-month clinical trial on human subjects. Due to the high cost and complexity of trials on human, a small but representative subject group was involved. Two standard laboratory blood tests for diabetes were used before and after the trial. The results are quite remarkable. Generally speaking, Gluco Guide amounted to turning an early diabetic patient to be pre-diabetic, and pre-diabetic to non-diabetic, in only 3-months.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122355207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Mining of Training Samples for Multiple Learning Machines in Computer-Aided Detection of Lesions in CT Images 基于多学习机的CT图像病灶计算机辅助检测训练样本挖掘
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.111
Kenji Suzuki
{"title":"Mining of Training Samples for Multiple Learning Machines in Computer-Aided Detection of Lesions in CT Images","authors":"Kenji Suzuki","doi":"10.1109/ICDMW.2014.111","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.111","url":null,"abstract":"Optimal selection of training samples is very difficult when multiple learning machines are used in classification. We investigated an approach to mining of training samples for multiple learning machines in computer-aided detection of lesions. Our approach starts from \"weakness\" analysis of a seed machine-learning (ML) model trained for a given task. The weakness is analyzed in the receiver-operating-characteristic (ROC) space in classification. The most to least \"difficult\" samples for the seed model are \"mined\" by dividing samples into N groups by the ROC scores. N ML models are trained with the mined N groups of training samples in an ensemble manner. We tested our approach in classification between 25 lesions and 489 non-lesions. Our ML ensemble trained with the mined samples achieved a performance higher than did an ML ensemble with manually selected training samples.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125504754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clinical Decision Making: A Framework for Predicting Rx Response 临床决策:预测Rx反应的框架
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.154
Aarti Sathyanarayana, Jyotishman Pathak, R. McCoy, S. Romero-Brufau, Maryam Panaziahar, J. Srivastava
{"title":"Clinical Decision Making: A Framework for Predicting Rx Response","authors":"Aarti Sathyanarayana, Jyotishman Pathak, R. McCoy, S. Romero-Brufau, Maryam Panaziahar, J. Srivastava","doi":"10.1109/ICDMW.2014.154","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.154","url":null,"abstract":"Over seventy percent of Americans take at least one form of prescription medication, with twenty percent taking more than five. The numbers emphasize how important it is for clinicians to understand the effects of the medication and whether these medications are effective. In this paper we propose a data driven framework to predict the effectiveness of medication on a patient, specifically in the case of diabetes. Our dataset contains claims data from 1.5 million patients. A heuristic was established to evaluate the \"effectiveness\" of Metformin using a set of three criteria. Decision trees and random forests were used to create prediction models on the training data and select features. The model was able to correctly predict whether a patient responded well to the medication with approximately 80% accuracy and an F1-measure of approximately 90%.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Web Service QoS Prediction Approach in Mobile Internet Environments 移动互联网环境下的Web服务QoS预测方法
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.27
Lubao Wang, Qibo Sun, Shangguang Wang, You Ma, Jinliang Xu, Jinglin Li
{"title":"Web Service QoS Prediction Approach in Mobile Internet Environments","authors":"Lubao Wang, Qibo Sun, Shangguang Wang, You Ma, Jinliang Xu, Jinglin Li","doi":"10.1109/ICDMW.2014.27","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.27","url":null,"abstract":"Existing many Web service QoS prediction approaches are very accurate in Internet environments, however they cannot provide accurate prediction values in Mobile Internet environments since QoS values of Web services have great volatility. In this paper, we propose an accurate Web service QoS prediction approach by weakening the volatility of QoS data from Web services in Mobile Internet environments. This approach contains three process, i.e., QoS preprocessing, user similarity computing, and QoS predicting. We have implemented our proposed approach with experiment based on real world and synthetic datasets. The results show that our approach outperforms other approaches in Mobile Internet environments.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117322716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Cost-Effective User Monitoring for Popularity Prediction of Online User-Generated Content 基于用户生成内容流行度预测的高性价比用户监测
2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI: 10.1109/ICDMW.2014.72
Mengmeng Yang, Kai Chen, Zhongchen Miao, Xiaokang Yang
{"title":"Cost-Effective User Monitoring for Popularity Prediction of Online User-Generated Content","authors":"Mengmeng Yang, Kai Chen, Zhongchen Miao, Xiaokang Yang","doi":"10.1109/ICDMW.2014.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.72","url":null,"abstract":"In this paper, we study on the popularity prediction of online user-generated contents, where high quality predictions give us much more flexibility and preparing time in deploying limited resources (such as advertising budget, monitoring capacity) into more popular contents. However the high retrieval cost of data used in prediction is a big challenge due to the large amount of users and contents involved. We propose a notion that higher popularity user-generated contents can be predicted by concentrating on fewer but informative users, as we notice the fact that contents generated by those users tend to become popular while that which are generated by the rest users do not. We develop a cost-effective popularity prediction framework to fulfil online prediction. It contains 3 modules: (a) online data retrieving, (b) informative users selection and (c) popularity prediction. A hybrid user selection algorithm and several popularity prediction algorithms/improvements are presented, and their performance are evaluated and compared using (a) the selected users' generated data and (b) all users' generated data, retrieved from Sina Weibo Micro blogger. The best prediction algorithm reaches a 78% accuracy at the time of 24 hours after publishing time when level width Nl equals 500. And the best combination of prediction and selection algorithms performs only about 7% worse on dataset of 2000 users than on dataset of all users (about 4.46 million).","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124410639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信