2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)最新文献_第7页

Distributed evolutionary approach to data clustering and modeling 数据聚类和建模的分布式进化方法

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008660

Mustafa H. Hajeer, D. Dasgupta, Alexander Semenov, J. Veijalainen

{"title":"Distributed evolutionary approach to data clustering and modeling","authors":"Mustafa H. Hajeer, D. Dasgupta, Alexander Semenov, J. Veijalainen","doi":"10.1109/CIDM.2014.7008660","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008660","url":null,"abstract":"In this article we describe a framework (DEGA-Gen) for the application of distributed genetic algorithms for detection of communities in networks. The framework proposes efficient ways of encoding the network in the chromosomes, greatly optimizing the memory use and computations, resulting in a scalable framework. Different objective functions may be used for producing division of network into communities. The framework is implemented using open source implementation of MapReduce paradigm, Hadoop. We validate the framework by developing community detection algorithm, which uses modularity as measure of the division. Result of the algorithm is the network, partitioned into non-overlapping communities, in such a way, that network modularity is maximized. We apply the algorithm to well-known data sets, such as Zachary Karate club, bottlenose Dolphins network, College football dataset, and US political books dataset. Framework shows comparable results in achieved modularity; however, much less space is used for network representation in memory. Further, the framework is scalable and can deal with large graphs as it was tested on a larger youtube.com dataset.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122640838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Two key properties of dimensionality reduction methods 降维方法的两个关键性质

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008663

J. Lee, M. Verleysen

{"title":"Two key properties of dimensionality reduction methods","authors":"J. Lee, M. Verleysen","doi":"10.1109/CIDM.2014.7008663","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008663","url":null,"abstract":"Dimensionality reduction aims at providing faithful low-dimensional representations of high-dimensional data. Its general principle is to attempt to reproduce in a low-dimensional space the salient characteristics of data, such as proximities. A large variety of methods exist in the literature, ranging from principal component analysis to deep neural networks with a bottleneck layer. In this cornucopia, it is rather difficult to find out why a few methods clearly outperform others. This paper identifies two important properties that enable some recent methods like stochastic neighborhood embedding and its variants to produce improved visualizations of high-dimensional data. The first property is a low sensitivity to the phenomenon of distance concentration. The second one is plasticity, that is, the capability to forget about some data characteristics to better reproduce the other ones. In a manifold learning perspective, breaking some proximities typically allow for a better unfolding of data. Theoretical developments as well as experiments support our claim that both properties have a strong impact. In particular, we show that equipping classical methods with the missing properties significantly improves their results.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124809642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Rule extraction using genetic programming for accurate sales forecasting 基于遗传规划的规则提取，实现准确的销售预测

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008669

Rikard König, U. Johansson

{"title":"Rule extraction using genetic programming for accurate sales forecasting","authors":"Rikard König, U. Johansson","doi":"10.1109/CIDM.2014.7008669","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008669","url":null,"abstract":"The purpose of this paper is to propose and evaluate a method for reducing the inherent tendency of genetic programming to overfit small and noisy data sets. In addition, the use of different optimization criteria for symbolic regression is demonstrated. The key idea is to reduce the risk of overfitting noise in the training data by introducing an intermediate predictive model in the process. More specifically, instead of directly evolving a genetic regression model based on labeled training data, the first step is to generate a highly accurate ensemble model. Since ensembles are very robust, the resulting predictions will contain less noise than the original data set. In the second step, an interpretable model is evolved, using the ensemble predictions, instead of the true labels, as the target variable. Experiments on 175 sales forecasting data sets, from one of Sweden's largest wholesale companies, show that the proposed technique obtained significantly better predictive performance, compared to both straightforward use of genetic programming and the standard M5P technique. Naturally, the level of improvement depends critically on the performance of the intermediate ensemble.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"13 34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128949285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High dimensional exploration: A comparison of PCA, distance concentration, and classification performance in two fMRI datasets 高维探索:两种功能磁共振成像数据集中PCA、距离集中和分类性能的比较

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008662

J. Etzel, T. Braver

引用次数: 0

Discovering cross-organizational business rules from the cloud 从云中发现跨组织的业务规则

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008694

M. Bernardi, Marta Cimitile, F. Maggi

{"title":"Discovering cross-organizational business rules from the cloud","authors":"M. Bernardi, Marta Cimitile, F. Maggi","doi":"10.1109/CIDM.2014.7008694","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008694","url":null,"abstract":"Cloud computing is rapidly emerging as a new information technology that aims at providing improved efficiency in the private and public sectors, as well as promoting growth, competition, and business dynamism. Cloud computing represents, today, an opportunity also from the perspective of business process analytics since data recorded by process-centered cloud systems can be used to extract information about the underlying processes. Cloud computing architectures can be used in cross-organizational environments in which different organizations execute the same process in different variants and share information about how each variant is executed. If the process is characterized by low predictability and high variability, business rules become the best way to represent the process variants. The contribution of this paper consists in providing: (i) a cloud computing multi-tenancy architecture to support cross-organizational process executions; (ii) an approach for the systematic extraction/composition of distributed data into coherent event logs carrying process-related information of each variant; (iii) the integration of online process mining techniques for the runtime extraction of business rules from event logs representing the process variants running on the infrastructure. The proposed architecture has been implemented and applied for the execution of a real-life process for acknowledging an unborn child performed in four different Dutch municipalities.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125429284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Aggregating predictions vs. aggregating features for relational classification 聚合预测vs.关系分类的聚合特征

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008657

O. Schulte, Kurt Routley

{"title":"Aggregating predictions vs. aggregating features for relational classification","authors":"O. Schulte, Kurt Routley","doi":"10.1109/CIDM.2014.7008657","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008657","url":null,"abstract":"Relational data classification is the problem of predicting a class label of a target entity given information about features of the entity, of the related entities, or neighbors, and of the links. This paper compares two fundamental approaches to relational classification: aggregating the features of entities related to a target instance, or aggregating the probabilistic predictions based on the features of each entity related to the target instance. Our experiments compare different relational classifiers on sports, financial, and movie data. We examine the strengths and weaknesses of both score and feature aggregation, both conceptually and empirically. The performance of a single aggregate operator (e.g., average) can vary widely across datasets, for both feature and score aggregation. Aggregate features can be adapted to a dataset by learning with a set of aggregate features. Used adaptively, aggregate features outperformed learning with a single fixed score aggregation operator. Since score aggregation is usually applied with a single fixed operator, this finding raises the challenge of adapting score aggregation to specific datasets.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114908183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Patient level analytics using self-organising maps: A case study on Type-1 Diabetes self-care survey responses 使用自组织地图的患者水平分析:1型糖尿病自我护理调查回应的案例研究

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 1900-01-01 DOI: 10.1109/CIDM.2014.7008682

Santosh Tirunagari, N. Poh, K. Aliabadi, David Windridge, Deborah Cooke

引用次数: 10

To what extend can we predict students' performance? A case study in colleges in South Africa 我们能在多大程度上预测学生的表现?以南非大学为例

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 1900-01-01 DOI: 10.1109/CIDM.2014.7008698

N. Poh, I. Smythe

{"title":"To what extend can we predict students' performance? A case study in colleges in South Africa","authors":"N. Poh, I. Smythe","doi":"10.1109/CIDM.2014.7008698","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008698","url":null,"abstract":"Student performance depends upon factors other than intrinsic ability, such as environment, socio-economic status, personality and familial-context. Capturing these patterns of influence may enable an educator to ameliorate some of these factors, or for governments to adjust social policy accordingly. In order to understand these factors, we have undertaken the exercise of predicting student performance, using a cohort of approximately 8,000 South African college students. They all took a number of tests in English and Maths. We show that it is possible to predict English comprehension test results from (1) other test results; (2) from covariates about self-efficacy, social economic status, and specific learning difficulties there are 100 survey questions altogether; (3) from other test results + covariates (combination of (1) and (2)); and from (4) a more advanced model similar to (3) except that the covariates are subject to dimensionality reduction (via PCA). Models 1-4 can predict student performance up to a standard error of 13-15%. In comparison, a random guess would have a standard error of 17%. In short, it is possible to conditionally predict student performance based on self-efficacy, socio-economic background, learning difficulties, and related academic test results.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127438450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9