David J. Miller, Chu-Fang Lin, G. Kesidis, Christopher M. Collins
{"title":"Improved Fine-Grained Component-Conditional Class Labeling with Active Learning","authors":"David J. Miller, Chu-Fang Lin, G. Kesidis, Christopher M. Collins","doi":"10.1109/ICMLA.2010.8","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.8","url":null,"abstract":"We have recently introduced new generative semi supervised mixtures with more fine-grained class label generation mechanisms than previous methods. Our models combine advantages of semi supervised mixtures, which achieve label extrapolation over a component, and nearest-neighbor (NN)/nearest-prototype (NP) classification, which achieves accurate classification in the vicinity of labeled samples. Our models are advantageous when within-component class proportions are not constant over the feature space region ``owned by'' a component. In this paper, we develop an active learning extension of our fine-grained labeling methods. We propose two new uncertainty sampling methods in comparison with traditional entropy-based uncertainty sampling. Our experiments on a number of UC Irvine data sets show that the proposed active learning methods improve classification accuracy more than standard entropy-based active learning. The proposed methods are particularly advantageous when the labeled percentage is small. We also extend our semi supervised method to allow variable weighting on labeled and unlabeled data likelihood terms. This approach is shown to outperform previous weighting schemes.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120957159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Implementation of a Real-Time Neural Network Controller Set for Reactive Power Compensation Systems","authors":"R. Bayindir, Alper Gorgun","doi":"10.1109/ICMLA.2010.107","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.107","url":null,"abstract":"This paper introduces the use of hardware implementation of a real time neural network controller set for reactive power compensation (RPC) systems with synchronous motor. In this study, measurement of parameters required in systems such as current, phase differences, frequency and power are measured by means of a PIC 18F452 microcontroller with high accuracy and then controlled via artificial neural networks;. The performance test based on obtained data using a computer codes written in Visual Basic.Net are implemented. Different ANN controller structures are verified by simulating them on a computer. It is evaluated that the set developed can be easily adapted in real time applications.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Centroid-based Classification Enhanced with Wikipedia","authors":"Abdullah Bawakid, M. Oussalah","doi":"10.1109/ICMLA.2010.17","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.17","url":null,"abstract":"Most of the traditional text classification methods employ Bag of Words (BOW) approaches relying on the words frequencies existing within the training corpus and the testing documents. Recently, studies have examined using external knowledge to enrich the text representation of documents. Some have focused on using WordNet which suffers from different limitations including the available number of words, synsets and coverage. Other studies used different aspects of Wikipedia instead. Depending on the features being selected and evaluated and the external knowledge being used, a balance between recall, precision, noise reduction and information loss has to be applied. In this paper, we propose a new Centroid-based classification approach relying on Wikipedia to enrich the representation of documents through the use of Wikpedia’s concepts, categories structure, links, and articles text. We extract candidate concepts for each class with the help of Wikipedia and merge them with important features derived directly from the text documents. Different variations of the system were evaluated and the results show improvements in the performance of the system.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127029623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Learning for Adaptive Programs by Leveraging Program Structure","authors":"Jervis Pinto, Alan Fern, Tim Bauer, Martin Erwig","doi":"10.1109/ICMLA.2010.150","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.150","url":null,"abstract":"We study how to effectively integrate reinforcement learning (RL) and programming languages via adaptation-based programming, where programs can include non-deterministic structures that can be automatically optimized via RL. Prior work has optimized adaptive programs by defining an induced sequential decision process to which standard RL is applied. Here we show that the success of this approach is highly sensitive to the specific program structure, where even seemingly minor program transformations can lead to failure. This sensitivity makes it extremely difficult for a non-RL-expert to write effective adaptive programs. In this paper, we study a more robust learning approach, where the key idea is to leverage information about program structure in order to define a more informative decision process and to improve the SARSA(lambda) RL algorithm. Our empirical results show significant benefits for this approach.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130660994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Techniques for Handling Incomplete Input Data with a Focus on Attribute Relevance Influence","authors":"M. Millán-Giraldo, J. S. Sánchez, V. Traver","doi":"10.1109/ICMLA.2010.126","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.126","url":null,"abstract":"This work presents a new approach based on support vector regression to deal with incomplete input (unseen) data and compares it to other existing techniques. The empirical analysis has been done over 18 real data sets and using five different classifiers, with the aim of foreseeing which technique can be deemed as more suitable for each classifier. Also, this study tries to devise how the relevance of the missing attribute affects the performance of each pair (handling algorithm, classifier). Experimental results demonstrate that no technique is absolutely better than the others for all classifiers. However, combining the proposed strategy with the nearest neighbor classifier appears as the best choice to face the problem of missing attribute values in the input data.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125122441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Noise Filtering Algorithm for Imbalanced Data","authors":"J. V. Hulse, T. Khoshgoftaar, Amri Napolitano","doi":"10.1109/ICMLA.2010.9","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.9","url":null,"abstract":"Noise filtering is a commonly-used methodology to improve the performance of learners built using low-quality data. A common type of noise filtering is a data preprocessing technique called classification filtering. In classification filtering, a classifier is built and evaluated on the training dataset (typically using cross-validation) and any misclassified instances are considered noisy. The strategies employed with classification filters are not ideal, particularly when learning from class-imbalanced data. To address this deficiency, we propose an alternative method for classification filtering called the threshold-adjusted classification filter. This methodology is compared with the standard classification filter, and the results clearly demonstrate the efficacy of our technique.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134104697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Dependencies Affect the Capability of Several Feature Selection Approaches to Extract the Key Features","authors":"Qin Yang, R. Gras","doi":"10.1109/ICMLA.2010.26","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.26","url":null,"abstract":"The goal of this research is to find how dependencies affect the capability of several feature selection approaches to extract of the relevant features for a classification purpose. The hypothesis is that more dependencies and higher level dependencies mean more complexity for the task. Some experiments are used to intend to discover some limitations of several feature selection approaches by altering the degree of dependency of the test datasets. A new method has been proposed, which uses a pair of pre-designed Bayesian Networks to generate the test datasets with an easy tuning level of complexity for feature selection test. Relief, CFS, NB-GA, NB-BOA, SVM-GA, SVM-BOA and SVM-mBOA are the filter or wrapper model feature selection approaches which are used and evaluated in the experiments. For these approaches, higher level of dependency among the relevant features greatly affect the capability to find the relevant features for classification. For Relief, SVM-BOA and SVM-mBOA, if the dependencies among the irrelevant features are altered, the performance of them changes as well. Moreover, a multi-objective optimization method is used to keep the diversity of the populations in each generation of the BOA search algorithm improving the overall quality of solutions in our experiments.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132147846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Analysis of Muscular Dystrophy Sub-types Using a Novel Integrative Scheme","authors":"Chen Wang, S. S. Ha, Y. Wang, J. Xuan, E. Hoffman","doi":"10.1109/ICMLA.2010.49","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.49","url":null,"abstract":"To construct biologically interpretable features and facilitate Muscular Dystrophy (MD) sub-types classification, we propose a novel integrative scheme utilizing PPI network, functional gene sets information, and mRNA profiling. The workflow of the proposed scheme includes three major steps: First, by combining protein–protein interaction network structure and gene co-expression relationship into new distance metric, we apply affinity propagation clustering to build gene sub-networks. Secondly, we further incorporate functional gene sets knowledge to complement the physical interaction information. Finally, based on constructed sub-network and gene set features, we apply multi-class support vector machine (MSVM) for MD sub-type classification, and highlight the biomarkers contributing to the sub-type prediction. The experimental results show that our scheme could construct sub-networks that are more relevant to MD than those constructed by conventional approach. Furthermore, our integrative strategy substantially improved the prediction accuracy, especially for those hard-to-classify sub-types.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130492798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Faddoul, Boris Chidlovskii, Fabien Torre, Rémi Gilleron
{"title":"Boosting Multi-Task Weak Learners with Applications to Textual and Social Data","authors":"J. Faddoul, Boris Chidlovskii, Fabien Torre, Rémi Gilleron","doi":"10.1109/ICMLA.2010.61","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.61","url":null,"abstract":"Learning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Adaboost algorithm Freund1999Short to the multi-task setting, it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between tasks for different regions of the learning space. Thus, we relax the conventional hypothesis that tasks behave similarly in the whole learning space. Moreover, MT-Adaboost can learn multiple tasks without imposing the constraint of sharing the same label set and/or examples between tasks. A theoretical analysis is derived from the analysis of the original Adaboost. Experiments for multiple tasks over large scale textual data sets with social context (Enron and Tobacco) give rise to very promising results.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ulrich Weiss, P. Biber, Stefan Laible, K. Bohlmann, A. Zell
{"title":"Plant Species Classification Using a 3D LIDAR Sensor and Machine Learning","authors":"Ulrich Weiss, P. Biber, Stefan Laible, K. Bohlmann, A. Zell","doi":"10.1109/ICMLA.2010.57","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.57","url":null,"abstract":"In the domain of agricultural robotics, one major application is crop scouting, e.g., for the task of weed control. For this task a key enabler is a robust detection and classification of the plant and species. Automatically distinguishing between plant species is a challenging task, because some species look very similar. It is also difficult to translate the symbolic high level description of the appearances and the differences between the plants used by humans, into a formal, computer understandable form. Also it is not possible to reliably detect structures, like leaves and branches in 3D data provided by our sensor. One approach to solve this problem is to learn how to classify the species by using a set of example plants and machine learning methods. In this paper we are introducing a method for distinguishing plant species using a 3D LIDAR sensor and supervised learning. For that we have developed a set of size and rotation invariant features and evaluated experimentally which are the most descriptive ones. Besides these features we have also compared different learning methods using the toolbox Weka. It turned out that the best methods for our application are simple logistic regression functions, support vector machines and neural networks. In our experiments we used six different plant species, typically available at common nurseries, and about 20 examples of each species. In the laboratory we were able to identify over 98% of these plants correctly.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115581041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}