{"title":"Selecting the Right Peer Schools for AACSB Accreditation - A Data Mining Application","authors":"M. Kiang, D. Fisher, Steven A. Fisher, R. Chi","doi":"10.1109/CIDM.2007.368849","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368849","url":null,"abstract":"For a business school, the selection of its peer schools is an important component of its International Association for Management Education (AACSB) (re)accreditation process. A school typically compares itself with other institutions having similar structural and identity-based attributes. The identification of peer schools is critical and can have a significant impact on a business school's accreditation efforts. For many schools the selection of comparable peer schools is a judgmental process. This study offers an alternative means for selection; a quantitative technique called Kohonen's self-organizing map (SOM) network for clustering. SOM as a software agent uses visualization to present information to the school in choosing its peer schools.","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"17 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133847248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster Detection with the PYRAMID Algorithm","authors":"Samir Tout, William Sverdlik, Junping Sun","doi":"10.1109/CIDM.2007.368864","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368864","url":null,"abstract":"As databases continue to grow in size, efficient and effective clustering algorithms play a paramount role in data mining applications. Practical clustering faces several challenges including: identifying clusters of arbitrary shapes, sensitivity to the order of input, dynamic determination of the number of clusters, outlier handling, processing speed of massive data sets, handling higher dimensions, and dependence on user-supplied parameters. Many studies have addressed one or more of these challenges. PYRAMID, or parallel hybrid clustering using genetic programming and multi-objective fitness with density, is an algorithm that we introduced in a previous research, which addresses some of the above challenges. While leaving significant challenges for future work, such as handling higher dimensions, PYRAMID employs a combination of data parallelism, a form of genetic programming, and a multi-objective density-based fitness function in the context of clustering. This study adds to our previous research by exploring the detection capability of PYRAMID against a challenging dataset and evaluating its independence on user supplied parameters","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using concept structures for efficient document comparison and location","authors":"A. Edmonds","doi":"10.1109/CIDM.2007.368879","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368879","url":null,"abstract":"A method is discussed for comparing and locating similar documents in a computationally efficient manner by making use of inferred concept statistics, rather than word frequencies. This novel technique uses natural language structures to create a short 'concept signature' vector, which locates a document in 'concept space'. Similar documents can be located in large corpora in O(log(n)) time by making use of this space for indexing. Results from trials with reference and real world data sets are presented, along with a comparison of the method's document similarity characteristics and the cosine metric","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128607847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Ichihashi, Katsuhiro Honda, Yasuhiro Kuramoto, Fumiaki Matsuura
{"title":"Fuzzy c-Means Classifier for Relational Data","authors":"H. Ichihashi, Katsuhiro Honda, Yasuhiro Kuramoto, Fumiaki Matsuura","doi":"10.1109/CIDM.2007.368892","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368892","url":null,"abstract":"This paper proposes a relational version of the fuzzy c-means (FCM) classifier in which relational data instead of object data are used. The classifier based on the relational clustering is called \"relational classifier\". The classifier is useful when a feature space has an extremely high dimensionality that exceeds the number of objects and many of the feature values are missing, or when only relational data are available instead of the object data. The relational data is represented by a matrix in terms of distances (dissimilarity) between object data, and is not concerned with the relational database. The clustering algorithm used in the classifier includes, as a special case, the relational dual of FCM proposed by Hathaway, Davenport and Bezdek and can be seen as a simultaneous application of multidimensional scaling and clustering. The computational intensity of the classifier is comparable to Gaussian mixture classifier (GMC). The proposed classifier outperforms well established relational classifier known as k-nearest neighbor (k-NN) on several benchmark datasets from the UCI ML repository","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126679824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Clustering and Fuzzy Neural Network for Sales Forecasting in Printed Circuit Board Industry","authors":"P. Chang, Chen-Hao Liu, C. Fan, Hsiao-Ching Chang","doi":"10.1109/CIDM.2007.368860","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368860","url":null,"abstract":"Reliable prediction of sales can improve the quality of business strategy. This research develops a hybrid model by integrating K-mean cluster and fuzzy back propagation network (KFBPN) to forecast the future sales of a printed circuit board factory. Based on the K-mean clustering technique, the historic data can be classified into different clusters, thus the noise of the original data can be reduced and a more homogeneous region can be established for a more accurate prediction. Numerical data of various affecting factors and actual demand of the past 5 years of the printed circuit board (PCB) factory are collected and input into the hybrid model for future monthly sales forecasting. Experimental results show the effectiveness of the hybrid model when compared with other approaches","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Search Result Refinement via Machine Learning from Labeled-Unlabeled Data for Meta-search","authors":"I. B. Özyurt, Greg G. Brown","doi":"10.1109/CIDM.2007.368871","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368871","url":null,"abstract":"For a user, retrieving relevant information from search engines involves encoding her intent, at best partially, in search keywords. A small amount of user feedback, can be beneficial in refining the results returned by the search engines and aiding exploratory search for scientific literature and data. In this paper, three new variants to EM method for semi-supervised document classification by K. Nigam et al. (2000) is introduced for biomedical literature meta-search result refinement. Multi-mixture per class EM variant with agglomerative information bottleneck clustering by N. Slonim and N. Tishby (1999) using Davies-Bouldin cluster validity index by D. Davies and D. Bouldin (1979), has shown retrieval performance rivaling the state of the art transductive support vector machines (TSVM) by T. Joachims (1999) with more than one order of magnitude improvement in execution time","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126486066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Evolutionary Approach for Time Series Forecasting","authors":"T. Ferreira, G. C. Vasconcelos, P. Adeodato","doi":"10.1109/CIDM.2007.368933","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368933","url":null,"abstract":"This work introduces a new method for time series prediction - time-delay added evolutionary forecasting (TAEF) - that carries out an evolutionary search of the minimum necessary time lags embedded in the problem for determining the phase space that generates the time series. The method proposed consists of a hybrid model composed of an artificial neural network (ANN) combined with a modified genetic algorithm (GA) that is capable to evolve the complete network architecture and parameters, its training algorithm and the necessary time lags to represent the series. Initially, the TAEF method finds the most fitted predictor model and then performs a behavioral statistical test in order to adjust time phase distortions that may appear in the representation of sonic series. An experimental investigation is conducted with the method with sonic relevant time series and the results achieved are discussed and coin pared, according to several performance measures, to results found with the multilayer perteptron networks and other works reported in the literature","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"23 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121051101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Versatile and Efficient Meta-Learning Architecture: Knowledge Representation and Management in Computational Intelligence","authors":"Krzysztof Grabczewski, N. Jankowski","doi":"10.1109/CIDM.2007.368852","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368852","url":null,"abstract":"There are many data mining systems derived from machine learning, neural network, statistics and other fields. Most of them are dedicated to some particular algorithms or applications. Unfortunately, their architectures are still too naive to provide satisfactory background for advanced meta-learning problems. In order to efficiently perform sophisticated meta-level analysis, we need a very versatile, easily expandable system (in many independent aspects), which uniformly deals with different kinds of models and models with very complex structures of models (not only committees but also much more hierarchic models). Meta-level techniques must provide mechanisms facilitating optimization of computation time and memory consumption. This article presents requirements and their motivations for an advanced data mining system, efficient not only in model construction for given data, but also in meta-learning. Some particular solutions to significant problems are presented. The newly proposed advanced meta-learning architecture has been implemented in our new data analysis system.","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121579156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCGEN: A Practical Approach to Projected Clustering and its Application to Gene Expression Data","authors":"M. Bouguessa, Shengrui Wang","doi":"10.1109/CIDM.2007.368939","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368939","url":null,"abstract":"Clustering samples in gene expression data has always been a major challenge because of the high dimensionality of the input space (typically in the tens of thousands) and the small number of samples (typically less than a hundred). Moreover, clusters may hide in subspaces with very low dimensionalities. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. These challenges motivate our effort to propose a new and efficient partitional distance-based projected clustering algorithm for clustering samples in gene expression data. Our algorithm is capable of detecting projected clusters of extremely low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space. The suitability of our proposal has been demonstrated through an empirical study using public microarray datasets.","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125661185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Data Mining to Enhance Automated Planning and Scheduling","authors":"J. Frank","doi":"10.1109/CIDM.2007.368881","DOIUrl":"https://doi.org/10.1109/CIDM.2007.368881","url":null,"abstract":"Automated planning is a combinatorial problem that is important to many NASA endeavors, including ground operations and control applications for unmanned and manned space flight. There is significant value to integrating planning and data mining to create better planners. We describe current work in this area, covering uses of data mining to speed up planners, improve the quality of plans returned by planners, and learn domain models for automated planners. The central contribution of this paper is a snap shot of the state of the art in integrating these technologies and a summary of challenges and open research issues","PeriodicalId":423707,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131423442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}