Alex Pelan, K. Steinhaeuser, N. Chawla, D. Pitts, A. Ganguly
{"title":"Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system","authors":"Alex Pelan, K. Steinhaeuser, N. Chawla, D. Pitts, A. Ganguly","doi":"10.1109/CIDM.2011.5949305","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949305","url":null,"abstract":"Climate change is an issue of growing economic, social, and political concern. Continued rise in the average temperatures of the Earth could lead to drastic climate change or an increased frequency of extreme events, which would negatively affect agriculture, population, and global health. One way of studying the dynamics of the Earth's changing climate is by attempting to identify regions that exhibit similar climatic behavior in terms of long-term variability. Climate networks have emerged as a strong analytics framework for both descriptive analysis and predictive modeling of the emergent phenomena. Previously, the networks were constructed using only one measure of similarity, namely the (linear) Pearson cross correlation, and were then clustered using a community detection algorithm. However, nonlinear dependencies are known to exist in climate, which begs the question whether more complex correlation measures are able to capture any such relationships. In this paper, we present a systematic study of different univariate measures of similarity and compare how each affects both the network structure as well as the predictive power of the clusters.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123809216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User-guided discovery of declarative process models","authors":"F. Maggi, A. Mooij, Wil M.P. van der Aalst","doi":"10.1109/CIDM.2011.5949297","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949297","url":null,"abstract":"Process mining techniques can be used to effectively discover process models from logs with example behaviour. Cross-correlating a discovered model with information in the log can be used to improve the underlying process. However, existing process discovery techniques have two important drawbacks. The produced models tend to be large and complex, especially in flexible environments where process executions involve multiple alternatives. This “overload” of information is caused by the fact that traditional discovery techniques construct procedural models explicitly showing all possible behaviours. Moreover, existing techniques offer limited possibilities to guide the mining process towards specific properties of interest. These problems can be solved by discovering declarative models. Using a declarative model, the discovered process behaviour is described as a (compact) set of rules. Moreover, the discovery of such models can easily be guided in terms of rule templates. This paper uses DECLARE, a declarative language that provides more flexibility than conventional procedural notations such as BPMN, Petri nets, UML ADs, EPCs and BPEL. We present an approach to automatically discover DECLARE models. This has been implemented in the process mining tool ProM. Our approach and toolset have been applied to a case study provided by the company Thales in the domain of maritime safety and security.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122322019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local neighbourhood extension of SMOTE for mining imbalanced data","authors":"Tomasz Maciejewski, J. Stefanowski","doi":"10.1109/CIDM.2011.5949434","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949434","url":null,"abstract":"In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127071678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data mining driven agents for predicting online auction's end price","authors":"Preetinder Kaur, M. Goyal, Jie Lu","doi":"10.1109/CIDM.2011.5949427","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949427","url":null,"abstract":"Auctions can be characterized by distinct nature of their feature space. This feature space may include opening price, closing price, average bid rate, bid history, seller and buyer reputation, number of bids and many more. In this paper, a clustering based method is used to forecast the end-price of an online auction for autonomous agent based system. In the proposed model, the input auction space is partitioned into groups of similar auctions by k-means clustering algorithm. The recurrent problem of finding the value of k in k-means algorithm is solved by employing elbow method using one way analysis of variance (ANOVA). Then k numbers of regression models are employed to estimate the forecasted price of an online auction. Based on the transformed data after clustering and the characteristics of the current auction, bid selector nominates the regression model for the current auction whose price is to be forecasted. Our results show the improvements in the end price prediction for each cluster which support in favor of the proposed clustering based model for the bid prediction in the online auction environment.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121122185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein
{"title":"Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data","authors":"F. Cismondi, André S. Fialho, S. Vieira, J. Sousa, S. Reti, M. Howell, S. Finkelstein","doi":"10.1109/CIDM.2011.5949447","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949447","url":null,"abstract":"One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116033706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using gaming strategies for attacker and defender in recommender systems","authors":"J. Zhan, Lijo Thomas, Venkata Pasumarthi","doi":"10.1109/CIDM.2011.5949304","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949304","url":null,"abstract":"Ratings are the prominent factors to decide the fate of any product in the present Internet Market and many people follow the ratings in a genuine sense. Unfortunately, the Sibyl attacks can affect the credibility of the genuine product. Influence limiter algorithms in recommender systems have been used extensively to overcome the Sibyl attacks but the effort could not reach the safe mark. This paper highlights an approach to generating gaming strategies for the attacker and defender in a recommender system. In a given recommender system environment, attackers and defenders play the most crucial part in a gaming strategy. A sequence of decision rules that an attacker or defender may use to achieve their desired goal is represented in these strategies involved in the game theory. The valid approaches to avoid the Sibyl attacks from the attackers are efficiently defended by the defenders. In our approach, we define attack graphs, use cases, and misuses cases in our gaming framework to analyze the vulnerabilities and security measures incorporated in a recommender system.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114601792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating materialized views using ant based approaches and information retrieval technologies","authors":"H. Drias","doi":"10.1109/CIDM.2011.5949302","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949302","url":null,"abstract":"In this paper, a hybrid system combining ant based approaches and tabu search has been designed for the generation of materialized views in a relational data warehouse environment with the purpose of improving the queries performance. Two ACO algorithms were adapted for the views generation problem to take up the scalability challenge and information retrieval technologies are used in the search process. In addition, our approach manages dynamically the storage to include the best views determined by the bio-inspired approach. Experiments have been conducted to validate the designed algorithms and interesting performance is observed when comparing it with those of the previous related works.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing precision in Process Conformance: Stability, confidence and severity","authors":"J. Munoz-Gama, J. Carmona","doi":"10.1109/CIDM.2011.5949451","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949451","url":null,"abstract":"Process Conformance is becoming a crucial area due to the changing nature of processes within an Information System. By confronting specifications against system executions (the main problem tackled in process conformance), both system bugs and obsolete/incorrect specifications can be revealed. This paper presents novel techniques to enrich the process conformance analysis for the precision dimension. The new features of the metric proposed in this paper provides a complete view of the precision between a log and a model. The techniques have been implemented as a plug-in in an open-source Process Mining platform and experimental results witnessing both the theory and the goals of this work are presented.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130591829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Martínez-Martínez, Pablo Escandell-Montero, E. Soria-Olivas, J. Martín-Guerrero, M. Martínez-Sober, J. Gómez-Sanchís
{"title":"Sectors on sectors (SonS): A new hierarchical clustering visualization tool","authors":"J. Martínez-Martínez, Pablo Escandell-Montero, E. Soria-Olivas, J. Martín-Guerrero, M. Martínez-Sober, J. Gómez-Sanchís","doi":"10.1109/CIDM.2011.5949448","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949448","url":null,"abstract":"Clustering techniques have been widely applied to extract information from high-dimensional data structures in the last few years. Graphs are especially relevant for clustering, but many graphs associated with hierarchical clustering do not give any information about the values of the centroids' attributes and the relationships among them. In this paper, we propose a new visualization approach for hierarchical cluster analysis in which the above-mentioned information is available. The method is based on pie charts. The pie charts are divided into several pie segments or sectors corresponding to each cluster. The radius of each pie segment is proportional to the number of patterns included in each cluster. By means of new divisions in each pie sector and a color bar with as many labels as attributes, we can extract all the existing relationships among centroids' attributes at any hierarchy level. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131020493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Giuffrida, D. Recupero, Giuseppe Tribulato, C. Zarba
{"title":"A banner recommendation system based on web navigation history","authors":"G. Giuffrida, D. Recupero, Giuseppe Tribulato, C. Zarba","doi":"10.1109/CIDM.2011.5949437","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949437","url":null,"abstract":"We address the problem of selecting a banner advertisement, based on the profile of the online user. The profile consists of the set of webpages opened by the online user, optionally clustered.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128970965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}