{"title":"Real-time Dynamic Visualization Techniques for Massive Geospatial Data","authors":"Zhou Ya’nan, H. Xiaodong, Li Jun","doi":"10.1109/ICDMW.2014.174","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.174","url":null,"abstract":"Visualization is a research hotspot in Digital Earth. Available Digital Earth platforms, such as Google Earth and World Wind could only display the prebuilt tiles of geospatial data statically, due to inability to deal with massive geospatial data (especially the remotely sensed imagery) and to adapt for diversified configurations of visualization in real time. In this paper, we propose a complete technical solution for real-time dynamic visualization of massive geospatial data, and mainly focus on the following aspects: a) On the rendering nodes, we build a pyramid model-'Fish File' to store a single remotely sensed imagery, to improve imagery reading efficiency, b) On the visualization server, we adopt a 'distributed storage, centralized management' strategy to organize and manage massive geospatial data, and further introduce a 'data-performance-consistent' schedule scheme to speed the response of servers, c) On the clients, the slicing cache storage and cache update mechanism are proposed for rapid switchover of map layers and geospatial analysis. Following the solution, we constructed a visualization platform. And we compare the performance of our platform with that of the state-of-the-art in rendering of nodes, in response of servers, and in efficiency of platforms, and show screenshots of real-time dynamic visualization.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Ensemble and Application in HST Dataset","authors":"Wenchao Xiao, Yan Yang, Hongjun Wang, Yingge Xu","doi":"10.1109/ICDMW.2014.143","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.143","url":null,"abstract":"Clustering ensemble is an important part of ensemble learning. It aims to study and integrate multiple clustering results from different clustering algorithms or same algorithm with different initial parameters for the same dataset. CHAMELEON is a hierarchical clustering algorithm which can discover natural clusters of different shapes and sizes as the result of its merging decision dynamically adapts to the different clustering model characterized. Inspired by the idea of CHAMELEON, the paper proposes a novel clustering ensemble model including semi-supervised method and discusses its application in fault diagnosis of high speed train (HST) running gear. The model is divided into three phases. Phase 1 is constructing a sparse graph through similarity matrix which aggregates multiple clustering results. Phase 2 is partitioning the sparse graph (vertex = object, edge weight = similarity) into a large number of relatively small sub-clusters. Phase 3 is obtaining the final clustering partition by merging these sub-clusters repeatedly. The experimental results demonstrate that our method out-performs some of state-of-the-art ensemble algorithms regarding the accuracy and stability and recognizes fault patterns of HST running gear effectively.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alina Bialkowski, P. Lucey, Peter Carr, Yisong Yue, S. Sridharan, I. Matthews
{"title":"Identifying Team Style in Soccer Using Formations Learned from Spatiotemporal Tracking Data","authors":"Alina Bialkowski, P. Lucey, Peter Carr, Yisong Yue, S. Sridharan, I. Matthews","doi":"10.1109/ICDMW.2014.167","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.167","url":null,"abstract":"To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., Shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"53 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Propagation and Refinement for Mining Opinion Words and Targets","authors":"Qiyun Zhao, Hao Wang, Pin Lv","doi":"10.1109/ICDMW.2014.66","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.66","url":null,"abstract":"This paper proposes a novel Joint Propagation and Refinement (JPR) method to extract opinion words and targets. We adopt a growing heuristic method to extract new opinion words and targets in two parallel processes: propagation and refinement. In the propagation process, we generate the candidate sets of opinion words and targets and construct Sentiment Graph Model (SGM) to evaluate the relations between opinion words and targets. We employ statistical word co-occurrence and dependency patterns to identify these relations. In addition, we discover new patterns by the newly extracted opinion words and targets, which can capture opinion relations more precisely in the case of informal texts. In the refinement process, we prune false results and update model iteratively. We employ Automatic Rule Refinement (ARR) to refine the rules of extraction, which means to refine the rule to extract false results. By using false results pruning and ARR process, we can efficiently alleviate the error propagation problem in traditional bootstrapping based methods. Experimental results on both English and Chinese datasets demonstrate the effectiveness of our method.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122154012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discretizing Numerical Attributes in Decision Tree for Big Data Analysis","authors":"Yiqun Zhang, Yiu-ming Cheung","doi":"10.1109/ICDMW.2014.103","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.103","url":null,"abstract":"The decision tree induction learning is a typical machine learning approach which has been extensively applied for data mining and knowledge discovery. For numerical data and mixed data, discretization is an essential pre-processing step of decision tree learning. However, when coping with big data, most of the existing discretization approaches will not be quite efficient from the practical viewpoint. Accordingly, we propose a new discretization method based on windowing and hierarchical clustering to improve the performance of conventional decision tree for big data analysis. The proposed method not only provides a faster process of discretizing numerical attributes with the competent classification accuracy, but also reduces the size of the decision tree. Experiments show the efficacy of the proposed method on the real data sets.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121856879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GlucoGuide: An Intelligent Type-2 Diabetes Solution Using Data Mining and Mobile Computing","authors":"Yan Luo, C. Ling, Jody Schuurman, R. Petrella","doi":"10.1109/ICDMW.2014.177","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.177","url":null,"abstract":"Type-2 Diabetes (T2D) is a dreadful disease affecting hundreds of millions of people worldwide, and is linked and worsen by unhealthy lifestyles. However, managing T2D effectively with lifestyle change remains highly challenging for both T2D patients and doctors. In this paper, we proposed, built, and evaluated a personalized diabetes recommendation system, called Gluco Guide for T2D patients. Gluco Guide conveniently aggregates a variety of lifestyle data via medical sensors and mobile devices, mines the data with a novel data-mining framework, and outputs personalized and timely recommendations to patients aimed to control their blood glucose level. To evaluate its clinical efficiency, we conducted a three-month clinical trial on human subjects. Due to the high cost and complexity of trials on human, a small but representative subject group was involved. Two standard laboratory blood tests for diabetes were used before and after the trial. The results are quite remarkable. Generally speaking, Gluco Guide amounted to turning an early diabetic patient to be pre-diabetic, and pre-diabetic to non-diabetic, in only 3-months.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122355207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining of Training Samples for Multiple Learning Machines in Computer-Aided Detection of Lesions in CT Images","authors":"Kenji Suzuki","doi":"10.1109/ICDMW.2014.111","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.111","url":null,"abstract":"Optimal selection of training samples is very difficult when multiple learning machines are used in classification. We investigated an approach to mining of training samples for multiple learning machines in computer-aided detection of lesions. Our approach starts from \"weakness\" analysis of a seed machine-learning (ML) model trained for a given task. The weakness is analyzed in the receiver-operating-characteristic (ROC) space in classification. The most to least \"difficult\" samples for the seed model are \"mined\" by dividing samples into N groups by the ROC scores. N ML models are trained with the mined N groups of training samples in an ensemble manner. We tested our approach in classification between 25 lesions and 489 non-lesions. Our ML ensemble trained with the mined samples achieved a performance higher than did an ML ensemble with manually selected training samples.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125504754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aarti Sathyanarayana, Jyotishman Pathak, R. McCoy, S. Romero-Brufau, Maryam Panaziahar, J. Srivastava
{"title":"Clinical Decision Making: A Framework for Predicting Rx Response","authors":"Aarti Sathyanarayana, Jyotishman Pathak, R. McCoy, S. Romero-Brufau, Maryam Panaziahar, J. Srivastava","doi":"10.1109/ICDMW.2014.154","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.154","url":null,"abstract":"Over seventy percent of Americans take at least one form of prescription medication, with twenty percent taking more than five. The numbers emphasize how important it is for clinicians to understand the effects of the medication and whether these medications are effective. In this paper we propose a data driven framework to predict the effectiveness of medication on a patient, specifically in the case of diabetes. Our dataset contains claims data from 1.5 million patients. A heuristic was established to evaluate the \"effectiveness\" of Metformin using a set of three criteria. Decision trees and random forests were used to create prediction models on the training data and select features. The model was able to correctly predict whether a patient responded well to the medication with approximately 80% accuracy and an F1-measure of approximately 90%.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lubao Wang, Qibo Sun, Shangguang Wang, You Ma, Jinliang Xu, Jinglin Li
{"title":"Web Service QoS Prediction Approach in Mobile Internet Environments","authors":"Lubao Wang, Qibo Sun, Shangguang Wang, You Ma, Jinliang Xu, Jinglin Li","doi":"10.1109/ICDMW.2014.27","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.27","url":null,"abstract":"Existing many Web service QoS prediction approaches are very accurate in Internet environments, however they cannot provide accurate prediction values in Mobile Internet environments since QoS values of Web services have great volatility. In this paper, we propose an accurate Web service QoS prediction approach by weakening the volatility of QoS data from Web services in Mobile Internet environments. This approach contains three process, i.e., QoS preprocessing, user similarity computing, and QoS predicting. We have implemented our proposed approach with experiment based on real world and synthetic datasets. The results show that our approach outperforms other approaches in Mobile Internet environments.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117322716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengmeng Yang, Kai Chen, Zhongchen Miao, Xiaokang Yang
{"title":"Cost-Effective User Monitoring for Popularity Prediction of Online User-Generated Content","authors":"Mengmeng Yang, Kai Chen, Zhongchen Miao, Xiaokang Yang","doi":"10.1109/ICDMW.2014.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2014.72","url":null,"abstract":"In this paper, we study on the popularity prediction of online user-generated contents, where high quality predictions give us much more flexibility and preparing time in deploying limited resources (such as advertising budget, monitoring capacity) into more popular contents. However the high retrieval cost of data used in prediction is a big challenge due to the large amount of users and contents involved. We propose a notion that higher popularity user-generated contents can be predicted by concentrating on fewer but informative users, as we notice the fact that contents generated by those users tend to become popular while that which are generated by the rest users do not. We develop a cost-effective popularity prediction framework to fulfil online prediction. It contains 3 modules: (a) online data retrieving, (b) informative users selection and (c) popularity prediction. A hybrid user selection algorithm and several popularity prediction algorithms/improvements are presented, and their performance are evaluated and compared using (a) the selected users' generated data and (b) all users' generated data, retrieved from Sina Weibo Micro blogger. The best prediction algorithm reaches a 78% accuracy at the time of 24 hours after publishing time when level width Nl equals 500. And the best combination of prediction and selection algorithms performs only about 7% worse on dataset of 2000 users than on dataset of all users (about 4.46 million).","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124410639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}