{"title":"Relational data partitioning using evolutionary game theory","authors":"L. Hall, Alireza Chakeri","doi":"10.1109/CIDM.2014.7008656","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008656","url":null,"abstract":"This paper presents a new approach for relational data partitioning using the notion of dominant sets. A dominant set is a subset of data points satisfying the constraints of internal homogeneity and external in-homogeneity, i.e. a cluster. However, since any subset of a dominant set cannot be a dominant set itself, dominant sets tend to be compact sets. Hence, in this paper, we present a novel approach to enumerate well distributed clusters where the number of clusters need not be known. When the number of clusters is known, in order to search the solution space appropriately, after finding each dominant set, data points are partitioned into two disjoint subsets of data points using spectral graph image segmentation methods to enumerate the other well distributed dominant sets. For the latter case, we introduce a new hierarchical approach for relational data partitioning using a new class of evolutionary game theory dynamics called InImDynamics which is very fast and linear, in computational time, with the number of data points. In this regard, at each level of the proposed hierarchy, Dunn's index is used to find the appropriate number of clusters. Then the objects are partitioned based on the projected number of clusters using game theoretic relations. The same method is applied to each partition to extract its underlying structure. Although the resulting clusters exist in their equivalent partitions, they may not be clusters of the entire data. Hence, they are checked for being an actual cluster and if they are not, they are extended to an existing cluster of the data. The approach can also be used to assign unseen data to existing clusters, as well.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126585720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Joutsijoki, J. Rasku, Markus Haponen, Ivan Baldin, Y. Gizatdinova, M. Paci, Jyri Saarikoski, Kirsi Varpa, H. Siirtola, Jorge Avalos-Salguero, Kati Iltanen, J. Laurikkala, K. Penttinen, J. Hyttinen, K. Aalto-Setälä, M. Juhola
{"title":"Classification of iPSC colony images using hierarchical strategies with support vector machines","authors":"H. Joutsijoki, J. Rasku, Markus Haponen, Ivan Baldin, Y. Gizatdinova, M. Paci, Jyri Saarikoski, Kirsi Varpa, H. Siirtola, Jorge Avalos-Salguero, Kati Iltanen, J. Laurikkala, K. Penttinen, J. Hyttinen, K. Aalto-Setälä, M. Juhola","doi":"10.1109/CIDM.2014.7008152","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008152","url":null,"abstract":"In this preliminary research we examine the suitability of hierarchical strategies of multi-class support vector machines for classification of induced pluripotent stem cell (iPSC) colony images. The iPSC technology gives incredible possibilities for safe and patient specific drug therapy without any ethical problems. However, growing of iPSCs is a sensitive process and abnormalities may occur during the growing process. These abnormalities need to be recognized and the problem returns to image classification. We have a collection of 80 iPSC colony images where each one of the images is prelabeled by an expert to class bad, good or semigood. We use intensity histograms as features for classification and we evaluate histograms from the whole image and the colony area only having two datasets. We perform two feature reduction procedures for both datasets. In classification we examine how different hierarchical constructions effect the classification. We perform thorough evaluation and the best accuracy was around 54% obtained with the linear kernel function. Between different hierarchical structures, in many cases there are no significant changes in results. As a result, intensity histograms are a good baseline for the classification of iPSC colony images but more sophisticated feature extraction and reduction methods together with other classification methods need to be researched in future.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"14 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124237592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommendation for Web services with domain specific context awareness","authors":"B. Kumara, Incheon Paik, K. Koswatte, Wuhui Chen","doi":"10.1109/CIDM.2014.7008679","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008679","url":null,"abstract":"Construction of Web service recommendation systems for users has become an important issue in service computing area. Content-based service recommendation is one category of recommendation systems. The system recommends services based on functionality of the services. Current content-based approaches use syntactic or semantic methods to calculate the similarity. However, syntactic methods are insufficient in expressing semantic concepts and semantic content-based methods only consider basic semantic level. Further, the approaches do not consider the domain specific context in measuring the similarity. Thus, they have been failed to capture the semantic similarity of Web services under a certain domain and this is affected to the performance of the recommendation. In this paper, we propose domain specific context aware recommendation approach that uses support vector machine and domain data set from search engine in similarity calculation process. Experimental results show that our approach works efficiently.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling a neyman-pearson subset selection approach via heuristics for mining massive data","authors":"G. Ditzler, M. Austen, G. Rosen, R. Polikar","doi":"10.1109/CIDM.2014.7008701","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008701","url":null,"abstract":"Feature subset selection is an important step towards producing a classifier that relies only on relevant features, while keeping the computational complexity of the classifier low. Feature selection is also used in making inferences on the importance of attributes, even when classification is not the ultimate goal. For example, in bioinformatics and genomics feature subset selection is used to make inferences between the variables that best describe multiple populations. Unfortunately, many feature selection algorithms require the subset size to be specified a priori, but knowing how many variables to select is typically a nontrivial task. Other approaches rely on a specific variable subset selection framework to be used. In this work, we examine an approach to feature subset selection works with a generic variable selection algorithm, and our approach provides statistical inference on the number of features that are relevant, which may be unknown to the generic variable selection algorithm. This work extends our previous implementation of a Neyman-Pearson feature selection (NPFS) hypothesis test, which acts as a meta-subset selection algorithm. Specifically, we examine the conservativeness of the NPFS approach by biasing the hypothesis test, and examine other heuristics for NPFS. We include results from carefully designed synthetic datasets. Furthermore, we demonstrate the NPFS's ability to perform on data of a massive scale.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125088442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing datasets by attribute alignment","authors":"Jakub Smíd, Roman Neruda","doi":"10.1109/CIDM.2014.7008148","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008148","url":null,"abstract":"Metalearning approach to the model selection problem - exploiting the idea that algorithms perform similarly on similar datasets - requires a suitable metric on the dataset space. One common approach compares the datasets based on fixed number of features describing the datasets as a whole. The information based on individual attributes is usually aggregated, taken for the most relevant attributes only, or omitted altogether. In this paper, we propose an approach that aligns complete sets of attributes of the datasets, allowing for different number of attributes. By supplying the distance between two attributes, one can find the alignment minimizing the sum of individual distances between aligned attributes. We present two methods that are able to find such an alignment. They differ in computational complexity and presumptions about the distance function between two attributes supplied. Experiments were performed using the proposed methods and the results were compared with the baseline algorithm.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132278649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Villmann, M. Kaden, M. Lange, P. Sturmer, W. Hermann
{"title":"Precision-Recall-Optimization in Learning Vector Quantization Classifiers for Improved Medical Classification Systems","authors":"T. Villmann, M. Kaden, M. Lange, P. Sturmer, W. Hermann","doi":"10.1109/CIDM.2014.7008150","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008150","url":null,"abstract":"Classification and decision systems in data analysis are mostly based on accuracy optimization. This criterion is only a conditional informative value if the data are imbalanced or false positive/negative decisions cause different costs. Therefore more sophisticated statistical quality measures are favored in medicine, like precision, recall etc.. Otherwise, most classification approaches in machine learning are designed for accuracy optimization. In this paper we consider variants of learning vector quantizers (LVQs) explicitly optimizing those advanced statistical quality measures while keeping the basic intuitive ingredients of these classifiers, which are the prototype based principle and the Hebbian learning. In particular we focus in this contribution particularly to precision and recall as important measures for use in medical applications. We investigate these problems in terms of precision-recall curves as well as receiver-operating characteristic (ROC) curves well-known in statistical classification and test analysis. With the underlying more general framework, we provide a principled alternatives traditional classifiers, such that a closer connection to statistical classification analysis can be drawn.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133130886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Fiscon, Emanuel Weitschek, G. Felici, P. Bertolazzi, S. D. Salvo, P. Bramanti, M. C. D. Cola
{"title":"Alzheimer's disease patients classification through EEG signals processing","authors":"G. Fiscon, Emanuel Weitschek, G. Felici, P. Bertolazzi, S. D. Salvo, P. Bramanti, M. C. D. Cola","doi":"10.1109/CIDM.2014.7008655","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008655","url":null,"abstract":"Alzheimer's Disease (AD) and its preliminary stage - Mild Cognitive Impairment (MCI) - are the most widespread neurodegenerative disorders, and their investigation remains an open challenge. ElectroEncephalography (EEG) appears as a non-invasive and repeatable technique to diagnose brain abnormalities. Despite technical advances, the analysis of EEG spectra is usually carried out by experts that must manually perform laborious interpretations. Computational methods may lead to a quantitative analysis of these signals and hence to characterize EEG time series. The aim of this work is to achieve an automatic patients classification from the EEG biomedical signals involved in AD and MCI in order to support medical doctors in the right diagnosis formulation. The analysis of the biological EEG signals requires effective and efficient computer science methods to extract relevant information. Data mining, which guides the automated knowledge discovery process, is a natural way to approach EEG data analysis. Specifically, in our work we apply the following analysis steps: (i) pre-processing of EEG data; (ii) processing of the EEG-signals by the application of time-frequency transforms; and (iii) classification by means of machine learning methods. We obtain promising results from the classification of AD, MCI, and control samples that can assist the medical doctors in identifying the pathology.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123394979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Novelty detection applied to the classification problem using Probabilistic Neural Network","authors":"Balvant Yadav, V. Devi","doi":"10.1109/CIDM.2014.7008677","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008677","url":null,"abstract":"A novel pattern is an observation which is different as compared to the rest of the data. The task of novelty detection is to build a model which identifies novel patterns from a data set. This model has to be built in such a way that if a pattern is distant from the given training data, it should be classified as a novel pattern otherwise it should be classified into any one of the given classes. In this paper, we present two such new models, based on Probabilistic Neural Network for novelty detection. In the first model, we generate negative examples around the target class data and then train the classifier with these negative examples. In the second model, which is an incremental model, we present a new method to find optimal threshold for each class and if output value for a test pattern being assigned to a target class is less than the threshold of the target class, then we classify that pattern as a novel pattern. We show how decision boundaries are created when we add novelty detection mechanism and when we do not add novelty detection to our model. We show a comparative performance of both approaches.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132332234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Hernán Peluffo-Ordóñez, J. Lee, M. Verleysen
{"title":"Generalized kernel framework for unsupervised spectral methods of dimensionality reduction","authors":"Diego Hernán Peluffo-Ordóñez, J. Lee, M. Verleysen","doi":"10.1109/CIDM.2014.7008664","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008664","url":null,"abstract":"This work introduces a generalized kernel perspective for spectral dimensionality reduction approaches. Firstly, an elegant matrix view of kernel principal component analysis (PCA) is described. We show the relationship between kernel PCA, and conventional PCA using a parametric distance. Secondly, we introduce a weighted kernel PCA framework followed from least-squares support vector machines (LS-SVM). This approach starts with a latent variable that allows to write a relaxed LS-SVM problem. Such a problem is addressed by a primal-dual formulation. As a result, we provide kernel alternatives to spectral methods for dimensionality reduction such as multidimensional scaling, locally linear embedding, and laplacian eigenmaps; as well as a versatile framework to explain weighted PCA approaches. Experimentally, we prove that the incorporation of a SVM model improves the performance of kernel PCA.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114751528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing gym exercises using acceleration data from wearable sensors","authors":"Heli Koskimäki, Pekka Siirtola","doi":"10.1109/CIDM.2014.7008685","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008685","url":null,"abstract":"The activity recognition approaches can be used for entertainment, to give people information about their own behavior, and to monitor and supervise people through their actions. Thus, it is a natural consequence of that fact that the amount of wearable sensors based studies has increased as well, and new applications of activity recognition are being invented in the process. In this study, gym data, including 36 different exercise classes, is used aiming in the future to create automatic activity diaries showing reliably to end users how many sets of given exercise have been performed. The actual recognition is divided into two different steps. In the first step, activity recognition of certain time intervals is performed and in the second step the state-machine approach is used to decide when actual events (sets in gym data) were performed. The results showed that when recognizing different exercise sets from the same occasion (sequential exercise sets), on average, over 96 percent window-wise true positive rate can be achieved, and moreover, all the exercise events can be discovered using the state-machine approach. When using a separate validation test set, the accuracies decreased significantly for some classes, but even in this case, all the different sets were discovered for 26 different classes.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124899551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}