{"title":"Frequent Temporal Pattern Mining for Medical Data Based on Ranged Relations","authors":"S. Hirano, S. Tsumoto","doi":"10.1109/ICDMW.2017.87","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.87","url":null,"abstract":"This paper presents a temporal pattern mining method for medical data. It modifies the mining algorithms proposed by Batal et al. to incorporate with ranged relations. Experimental results demonstrate that the proposed method could generate frequent patterns with abstracted time ranges embedded in their temporal relations.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131866557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrimination at the Edge of Noise: A Hilbert Space of Stationary Ergodic Processes","authors":"I. Chattopadhyay","doi":"10.1109/ICDMW.2017.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.129","url":null,"abstract":"Identifying meaningful signal buried in noise is a problem of interest arising in diverse scenarios of data-driven modeling. We present here a theoretical framework for exploiting intrinsic geometry in data that resists noise corruption, and might be identifiable under severe obfuscation. Our approach is based on uncovering a valid complete inner product on the space of ergodic stationary finite valued processes, providing the latter with the structure of a Hilbert space on the real field. This rigorous construction, based on non-standard generalizations of the notions of sum and scalar multiplication of finite dimensional probability vectors, allows us to meaningfully talk about \"angles\" between data streams and data sources, and, make precise the notion of orthogonal stochastic processes. In particular, the relative angles appear to be preserved, and identifiable, under severe noise, and will be developed in future as the underlying principle for robust classification, clustering and unsupervised featurization algorithms.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130956606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felipe Vera, Víctor D. Cortés, Gabriel Iturra-Bocaz, J. D. Velásquez, P. Maldonado, Andrés Couve
{"title":"Akori: A Tool Based in Eye-Tracking Techniques for Analyzing Web User Behaviour on a Web Site","authors":"Felipe Vera, Víctor D. Cortés, Gabriel Iturra-Bocaz, J. D. Velásquez, P. Maldonado, Andrés Couve","doi":"10.1109/ICDMW.2017.90","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.90","url":null,"abstract":"As the use of the Internet grows every year, e-commerce's usage does as well. There is a tough competition between companies to be able to attract customers to use their services. The design of a website is crucial to retain a customer, and a retained client is more valuable over time, so understanding what attracts the attention of a potential client on a website is really important. This work proposes a novel web platform for understanding the most important features of a website for the user, based on biometric information provided by eye-trackers and electroencephalogram. Akori platform offers three services for understanding the most important part of a web page to the user. The first is the visual attention map, which highlights in different colors the most attractive zones for the user. The second service is a visual attention map too, but it uses a grey-scale gradient instead of colors. The third service, uses the salience map to identify the Website Key Objects on a web page and highlight the objects that are predicted as such. Our platform is useful to the telecommunication and advertising industries, as interviews with companies managers reveal. Thus, Akori promises to be a fundamental part for planning website design.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126714252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Selection for the Classification of Longitudinal Human Ageing Data","authors":"Tossapol Pomsuwan, A. Freitas","doi":"10.1109/ICDMW.2017.102","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.102","url":null,"abstract":"We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data – where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naïve Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional Neural Network Approach for Mapping Arctic Vegetation Using Multi-Sensor Remote Sensing Fusion","authors":"Zachary L. Langford, J. Kumar, F. Hoffman","doi":"10.1109/ICDMW.2017.48","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.48","url":null,"abstract":"Accurate and high-resolution maps of vegetation are critical for projects seeking to understand the terrestrial ecosystem processes and land-atmosphere interactions in Arctic ecosystems, such as U.S. Department of Energy's Next Generation Ecosystem Experiment (NGEE) Arctic. However, most existing Arctic vegetation maps are at a coarse resolution and with a varying degree of detail and accuracy. Remote sensing-based approaches for mapping vegetation, while promising, are challenging in high latitude environments due to frequent cloud cover, polar darkness, and limited availability of high-resolution remote sensing datasets (e.g., ∼ 5 m). This study proposes a new remote sensing based multi-sensor data fusion approach for developing high-resolution maps of vegetation in the Seward Peninsula, Alaska. We focus detailed analysis and validation study around the Kougarok river, located in the central Seward Peninsula of Alaska. We seek to evaluate the integration of hyper-spectral, multi-spectral, radar, and terrain datasets using unsupervised and supervised classification techniques over a ∼343.72 km 2 area for generating vegetation classifications at a variety of resolutions (5 m and 12.5 m). We fist applied a quantitative goodness-of-fit method, called Mapcurves, that shows the degree of spatial concordance between the public coarse resolution maps and k-means clustering values and relabels the k values based on the best overlap. We develop a convolutional neural network (CNN) approach for developing high resolution vegetation maps for our study region in Arctic. We compare two CNN approaches: (1) breaking up the images into small patches (e.g., 6 x 6) and predict the vegetation class for entire patch and (2) semantic segmentation and predict the vegetation class for every pixel. We also perform accuracy assessments of the developed data products and evaluate varying CNN architectures. The fusion of hyperspectral and optical datasets performed the best, with accuracy values increased from 0.64 to 0.96-0.97 when using a training map produced by unsupervised clustering and Mapcurves labeling for both CNN models.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115721667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deriving Data-Driven Insights from Climate Extreme Indices for the Continental US","authors":"Xinbo Huang, D. Sathiaraj, Lei Wang, B. Keim","doi":"10.1109/ICDMW.2017.46","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.46","url":null,"abstract":"Daily climate data observations from more than 3000 climate measurement sites in the continental U.S. were mined and analyzed to derive insights and trends from climate extreme indices. Daily climate data observations were aggregated by climate divisions and analyzed to derive a new climate extremes indices data set (Threshold Exceedence Frequency, TEF). Each climate division was statistically assessed for the following elements: maximum and minimum temperature, precipitation and snowfall. The climate data time series were divided into 2 time intervals (1946-1980 and 1981-2015) and the occurrence frequencies of various climate extreme indices was statistically examined. Results revealed interesting insights such as an increasing frequency of occurrence of night-time temperatures in South-east US and decreasing frequency of winter temperature and snowfall extremes in northern US. The study also produced a new web-based visualization system to analyze the results of the study. The visualization system included interactive choropleth maps and charts to depict spatiotemporal changes in various climate thresholds over time.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128667201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Active Learning and Semi-Supervised Learning by Using Selective Label Spreading","authors":"Xu Chen, Tao Wang","doi":"10.1109/ICDMW.2017.154","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.154","url":null,"abstract":"In the literature, a number of methods have been proposed for semi-supervised learning. Recently, graph-based methods of semi-supervised learning have become popular because of their capability of handling large amounts of unlabeled data. However, the existing graph based semi-supervised learning algorithms do not optimize the process of selecting better labeled data. We have developed a new selective semi-supervised learning algorithm, called selective label spreading (SLS) by integrating the active learning model into the label spreading framework. SLS optimizes the process of selecting better labeled data in order to improve classification performance. We applied SLS to the well-known hand-written digits recognition data set and demonstrated that SLS can improve the classification performance. The selective label spreading scheme requires a much smaller number of queries to achieve high accuracy compared with random query selection.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129597553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anomalous User Activity Detection in Enterprise Multi-source Logs","authors":"Qiaona Hu, Baoming Tang, Derek Lin","doi":"10.1109/ICDMW.2017.110","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.110","url":null,"abstract":"Security is one of the top concerns of any enterprise. Most security practitioners in enterprises rely on correlation rules to detect potential threats. While the rules are intuitive to design, each rule is independently defined per log source, unable to collectively address heterogeneity of data from a myriad of enterprise networking and security logs. Furthermore, correlation rules do not look for data events beyond a short time range. To complement the conventional correlation rules-based system, we propose a user activity anomaly detection method. The method first addresses data heterogeneity of multi-source logs by designing a meta data extraction step for event normalization. It then builds user-specific models to flag alerts for users whose currently observed event patterns are sufficiently different from their own patterns in the past.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130022792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robyn L. Miller, B. Moore, H. Viswanathan, G. Srinivasan
{"title":"Image Analysis Using Convolutional Neural Networks for Modeling 2D Fracture Propagation","authors":"Robyn L. Miller, B. Moore, H. Viswanathan, G. Srinivasan","doi":"10.1109/ICDMW.2017.137","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.137","url":null,"abstract":"The primary failure mechanism in brittle materials such as ceramics, granite and some metal alloys is through the presence of defects which result in crack formation and propagation under the application of load. We are interested in studying this process of crack propagation, interaction and coalescence, which degrades the strength of the specimen. Traditionally, engineering applications that study these materials employ finite element mesh based methods that require hundreds of hours of processing time on multi-core high performance clusters. We have developed a graph-based reduced order model that captures key geometric and topological features of the dynamic fracture propagation network. We report here the early stages of our study in which deep neural networks will be applied to dynamic directed weighted graphs capturing various metrics of crack-pair interaction strength with the aim of predicting crack lengths, dynamic crack growth/coalescence properties, distributions of these properties over the entire material through time, failure paths and time to failure. Our graph-based representations allow us to consider detailed topology in conjunction with metric geometry to gain insights into the dominant mechanisms that drive the physics in these systems.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125429868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Moderate User Data for News Recommendation","authors":"Dhruv Khattar, Vaibhav Kumar, Vasudeva Varma","doi":"10.1109/ICDMW.2017.104","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.104","url":null,"abstract":"It is very crucial for news aggregator websites which are recent in the market to actively engage its existing users. A recommendation system would help to tackle such a problem. However, due to the lack of sufficient amount of data, most of the state-of-the-art methods perform poorly in terms of recommending relevant news items to the users. In this paper, we propose a novel approach for Item-based Collaborative filtering for recommending news items using Markov Decision Process (MDP). Due to the sequential nature of news reading, we choose MDP to model our recommendation system as it is based on a sequence optimization paradigm. Further, we also incorporate factors like article freshness and similarity into our system by extrinsically modelling it in terms of reward for the MDP. We compare it with various other state-of-the-art methods. On a moderately low amount of data we see that our MDP-based approach outperforms the other approaches. One of the reasons for this is that the baselines fail to identify the underlying patterns within the sequence in which the articles are read by the users. Hence, the baselines are not able to generalize well.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}