{"title":"Max-Coupled Learning: Application to Breast Cancer","authors":"Jaime S. Cardoso, Inês Domingues","doi":"10.1109/ICMLA.2011.93","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.93","url":null,"abstract":"In the predictive modeling tasks, a clear distinction is often made between learning problems that are supervised or unsupervised, the first involving only labeled data (training patterns with known category labels) while the latter involving only unlabeled data. There is a growing interest in a hybrid setting, called semi-supervised learning, in semi-supervised classification, the labels of only a small portion of the training data set are available. The unlabeled data, instead of being discarded, are also used in the learning process. Motivated by a breast cancer application, in this work we address a new learning task, in-between classification and semi-supervised classification. Each example is described using two different feature sets, not necessarily both observed for a given example. If a single view is observed, then the class is only due to that feature set, if both views are present the observed class label is the maximum of the two values corresponding to the individual views. We propose new learning methodologies adapted to this learning paradigm and experimentally compare them with baseline methods from the conventional supervised and unsupervised settings. The experimental results verify the usefulness of the proposed approaches.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPI-based Parallelization for ILP-based Multi-relational Concept Discovery","authors":"Alev Mutlu, P. Senkul, Y. Kavurucu","doi":"10.1109/ICMLA.2011.98","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.98","url":null,"abstract":"Multi-relational concept discovery is a predictive learning task that aims to discover descriptions of a target concept in the light of past experiences. Parallelization has emerged as a solution to deal with efficiency and scalability issues relating to large search spaces in concept discovery systems. In this work, we describe a parallelization method for the ILP-based concept discovery system called CRIS. CRIS is modified in such a way that steps involving high query processing are reorganized in a data parallel way. To evaluate the performance of the resulting system, called P-CRIS, a set of experiments is conducted.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121989366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes","authors":"M. Kon, Nikolay Nikolaev","doi":"10.1109/ICMLA.2011.160","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.160","url":null,"abstract":"We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the training and test data into new data with components having Gaussian empirical distributions. This map is an empirical version of the Gaussian copula used in probability and mathematical finance. The purpose is to form a feature mapped dataset as close as possible to Gaussian, after which standard quadratic discriminants can be used for classification. We discuss this method in general, and apply it to some datasets in computational biology.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125967549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charith D. Chitraranjan, Loai Al Nimer, O. Azzam, Saeed Salem, A. Denton, M. Iqbal, S. Kianian
{"title":"Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets","authors":"Charith D. Chitraranjan, Loai Al Nimer, O. Azzam, Saeed Salem, A. Denton, M. Iqbal, S. Kianian","doi":"10.1109/ICMLA.2011.71","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.71","url":null,"abstract":"We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126045871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Looking Beyond Genres: Identifying Meaningful Semantic Layers from Tags in Online Music Collections","authors":"R. Ferrer, T. Eerola","doi":"10.1109/ICMLA.2011.89","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.89","url":null,"abstract":"A scheme for identifying the semantic layers of music-related tags is presented. Arguments are provided why the applications of the tags cannot be effectively pursued without a reasonable understanding of their semantic qualities. The identification scheme consists of a set of filters. The first is related with social consensus, user-count ratio, and n-gram properties of tags. The next relies on look-up functions across multiple databases to determine the probable semantic layer of each tag. Examples of the semantic layers with prevalence rates are given based on application of the scheme to a subset of the Million Song Dataset. Finally, a validation of the results was carried out with an independent, smaller hand-annotated dataset, in which high agreement between the identification provided by the scheme and annotations was found.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131556374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental Learning Based on Growing Gaussian Mixture Models","authors":"A. Bouchachia, C. Vanaret","doi":"10.1109/ICMLA.2011.79","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.79","url":null,"abstract":"Incremental learning aims at equipping data-driven systems with self-monitoring and self-adaptation mechanisms to accommodate new data in an online setting. The resulting model underlying the system can be adjusted whenever data become available. The present paper proposes a new incremental learning algorithm, called 2G2M, to learn Growing Gaussian Mixture Models. The algorithm is furnished with abilities (1) to accommodate data online, (2) to maintain low complexity of the model, and (3) to reconcile labeled and unlabeled data. To discuss the efficiency of the proposed incremental learning algorithm, an empirical evaluation is provided.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131323968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Embedding of Co-occurrence Data for Query-Based Visualization","authors":"Mohammad Khoshneshin, W. Street, P. Srinivasan","doi":"10.1109/ICMLA.2011.42","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.42","url":null,"abstract":"We propose a generative probabilistic model for visualizing co-occurrence data. In co-occurrence data, there are a number of entities and the data includes the frequency of two entities co-occurring. We propose a Bayesian approach to infer the latent variables. Given the intractability of inference for the posterior distribution, we use approximate inference via variational approaches. The proposed Bayesian approach enables accurate embedding in high-dimensional space which is not useful for visualization. Therefore, we propose a method to embed a filtered number of entities for a query -- query-based visualization. Our experiments show that our proposed models outperform co-occurrence data embedding, the state-of-the-art model for visualizing co-occurrence data.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125521247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Book Recommendation Signage System Using Silhouette-Based Gait Classification","authors":"M. Mikawa, S. Izumi, Kazuyo Tanaka","doi":"10.1109/ICMLA.2011.43","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.43","url":null,"abstract":"A library creates new services for attracting library users continuously. This paper presents a new book recommendation digital signage system. The system classifies characteristics such as gender or age of a walking library user, and displays a recommended book on an LCD for him/her. A set of silhouette image sequence of a walker extracted from real-time video is used for classification with Support Vector Machine (SVM). Since a calculation amount of a silhouette-based classification method is less than a three-dimensional model-based classification, it is suitable for real-time classification. We design a classifier that has better performance by evaluating some parameters and image features for classification. Some experimental results reveal the validity and effectiveness of our proposed signage system.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Evolution of Convolutional Networks","authors":"Brian Cheung, Carl Sable","doi":"10.1109/ICMLA.2011.73","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.73","url":null,"abstract":"With the increasing trend of neural network models towards larger structures with more layers, we expect a corresponding exponential increase in the number of possible architectures. In this paper, we apply a hybrid evolutionary search procedure to define the initialization and architectural parameters of convolutional networks, one of the first successful deep network models. We make use of stochastic diagonal Levenberg-Marquardt to accelerate the convergence of training, lowering the time cost of fitness evaluation. Using parameters found from the evolutionary search together with absolute value and local contrast normalization preprocessing between layers, we achieve the best known performance on several of the MNIST Variations, rectangles-image and convex image datasets.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128875105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Detection through Sequential Filtering of Novelty Patterns","authors":"John Cuzzola, D. Gašević, E. Bagheri","doi":"10.1109/ICMLA.2011.69","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.69","url":null,"abstract":"Multi-threaded applications are commonplace in today's software landscape. Pushing the boundaries of concurrency and parallelism, programmers are maximizing performance demanded by stakeholders. However, multi-threaded programs are challenging to test and debug. Prone to their own set of unique faults, such as race conditions, testers need to turn to automated validation tools for assistance. This paper's main contribution is a new algorithm called multi-stage novelty filtering (MSNF) that can aid in the discovery of software faults. MSNF stresses minimal configuration, no domain specific data preprocessing or software metrics. The MSNF approach is based on a multi-layered support vector machine scheme. After experimentation with the MSNF algorithm, we observed promising results in terms of precision. However, MSNF relies on multiple iterations (i.e., stages). Here, we propose four different strategies for estimating the number of the requested stages.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}