{"title":"Speeding Up Greedy Forward Selection for Regularized Least-Squares","authors":"T. Pahikkala, A. Airola, T. Salakoski","doi":"10.1109/ICMLA.2010.55","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.55","url":null,"abstract":"We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm compared to previously proposed implementations.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132189739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel Algorithm for Predicting the Secondary Structure of Polycistronic MicroRNAs","authors":"Dianwei Han, G. Tang, Jun Zhang","doi":"10.1109/ICMLA.2010.80","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.80","url":null,"abstract":"MicroRNAs (miRNAs) are newly discovered endogenous small non-coding RNAs (21-25nt) that target their complementary gene transcripts for degradation or translational repression. The biogenesis of a functional miRNA is largely dependent on the secondary structure of the miRNA precursor (pre-miRNA). Recently, it has been shown that miRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design methods to predict such structures for miRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. First, the master processor partitions the input sequence into subsequences and distributes them to the slave processors. The slave processors will then predict the secondary structure based on their individual task. Afterward, the slave processors will return their results to the master processor. Finally, the master processor will merge the partial structures from the slave processors into a whole candidate secondary structure. The optimal structure is obtained by sorting the candidate structures according to their scores. Our experimental results indicate that the actual speed-ups match the trend of theoretic values.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132600453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Handley, Marie-Luise Schneider, Victor Ciriza, J. Earl
{"title":"Extreme Volume Detection for Managed Print Services","authors":"J. Handley, Marie-Luise Schneider, Victor Ciriza, J. Earl","doi":"10.1109/ICMLA.2010.95","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.95","url":null,"abstract":"A managed print service (MPS) manages the printing, scanning and facsimile devices in an enterprise to control cost and improve availability. Services include supplies replenishment, maintenance, repair, and use reporting. Customers are billed per page printed. Data are collected from a network of devices to facilitate management. The number of pages printed per device must be accurately counted to fairly bill the customer. Software errors, hardware changes, repairs, and human error all contribute to “meter reads” that are exceptionally high and are apt to be challenged by the customer were they to be billed. Account managers periodically review data for each device in an account. This process is tedious and time consuming and an automated solution is desired. Exceptional print volumes are not always salient and detecting them statistically is prone to errors owing to nonstationarity of the data. Mean levels and variances change over time and usage is highly auto correlated which precludes simple detection methods based on deviations from an average background. A solution must also be computationally inexpensive and require little auxiliary storage because hundreds of thousands of streams of device data must be processed. We present an algorithm and system for online detection of extreme print volumes that uses dynamic linear models (DLM) with variance learning. A DLM is a state space time series model comprising a random mean level system process and a random observation process. Both components are updated using Bayesian statistics. After each update, a forecasted value and its estimated variance are calculated. A read is flagged as exceptionally high if its value is highly unlikely with respect to a forecasted value and its standard deviation. We provide implementation details and results of a field test in which error rate was decreased from 26.4% to 0.5% on 728 observed meter reads.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133610229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Unsupervised Clustering over Watershed-Based Clustering","authors":"Sai Venu Gopal Lolla, L. L. Hoberock","doi":"10.1109/ICMLA.2010.44","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.44","url":null,"abstract":"This paper improves upon an existing Watershed algorithm-based clustering method. The existing method uses an experimentally determined parameter to construct a density function. A better method for evaluating the cell/window size (used in the construction of the density function) is proposed, eliminating the need for arbitrary parameters. The algorithm has been tested on both published and unpublished synthetic data, and the results demonstrate that the proposed approach is able to accurately estimate the number of clusters present in the data.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121330345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Projections for Manifold Learning","authors":"H. Strange, R. Zwiggelaar","doi":"10.1109/ICMLA.2010.54","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.54","url":null,"abstract":"Manifold learning is a widely used statistical tool which reduces the dimensionality of a data set while aiming to maintain both local and global properties of the data. We present a novel manifold learning technique which aligns local hyper planes to build a global representation of the data. A Minimum Spanning Tree provides the skeleton needed to traverse the manifold so that the local hyper planes can be merged using parallel projections to build a global hyper plane of the data. We show state of the art results when compared against existing manifold learning algorithm on both artificial and real world image data.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115405944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Inferences and Forecasting in Spatial Time Series Models","authors":"Sung Duck Lee, Duck-Ki Kim","doi":"10.1109/ICMLA.2010.170","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.170","url":null,"abstract":"The spatial time series data can be viewed as a set of time series collected simultaneously at a number of spatial locations with time. For example, The Mumps data have a feature to infect adjacent broader regions in accordance with spatial location and time. Therefore, The spatial time series models have many parameters of space and time. In this paper, We propose the method of bayesian inferences and prediction in spatial time series models with a Gibbs Sampler in order to overcome convergence problem in numerical methods. Our results are illustrated by using the data set of mumps cases reported from the Korea Center for Disease Control and Prevention monthly over the years 2001-2009, as well as a simulation study.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"522 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114623439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilayer Ferns: A Learning-based Approach of Patch Recognition and Homography Extraction","authors":"Gao Ce, Song Yixu, Jia Pei-fa","doi":"10.1109/ICMLA.2010.36","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.36","url":null,"abstract":"While local patches recognition is a key component of modern approaches to affine transformation detection and object detection, existing learning-based approaches just identify the patches based on a set of randomly picked and combined binary features, which will lose some strong correlations between features and can not provide stable and remarkable identification ability. In this paper, we proposed a method that select and organize the features in a Multilayer Ferns structure, and show that it is both faster in the run-time processing and more powerful in the identification ability than state-of-the-art ad hoc approaches.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129662200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haimonti Dutta, D. Waltz, Karthik M. Ramasamy, Philip Gross, Ansaf Salleb-Aouissi, H. Diab, Manoj Pooleery, C. Schevon, R. Emerson
{"title":"Patient-Specific Seizure Detection from Intra-cranial EEG Using High Dimensional Clustering","authors":"Haimonti Dutta, D. Waltz, Karthik M. Ramasamy, Philip Gross, Ansaf Salleb-Aouissi, H. Diab, Manoj Pooleery, C. Schevon, R. Emerson","doi":"10.1109/ICMLA.2010.119","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.119","url":null,"abstract":"Automatic seizure detection is becoming popular in modern epilepsy monitoring units since it assists diagnostic monitoring and reduces manual review of large volumes of EEG recordings. In this paper, we describe the application of machine learning algorithms for building patient-specific seizure detectors on multiple frequency bands of intra-cranial electroencephalogram (iEEG) recorded by a dense Micro-Electrode Array (MEA). The MEA is capable of recording at a very high sampling rate (30 KHz) producing an avalanche of time series data. We explore subsets of this data to build seizure detectors – we discuss several methods for extracting univariate and bivariate features from the channels and study the effectiveness of using high dimensional clustering algorithms such as K-means and Subspace clustering for constructing the model. Future work involves design of more robust seizure detectors using other features and non-parametric clustering techniques, detection of artifacts and understanding the generalization properties of the models.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125448492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian Nonparametric Model for Joint Relation Integration and Domain Clustering","authors":"Dazhuo Li, Fahim Mohammad, E. Rouchka","doi":"10.1109/ICMLA.2010.168","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.168","url":null,"abstract":"Relational databases provide unprecedented opportunities for knowledge discovery. Various approaches have been proposed to infer structures over entity types and predict relationships among elements of these types. However, discovering structures beyond the entity type level, e.g. clustering over relation concepts, remains a challenging task. We present a Bayesian nonparametric model for joint relation and domain clustering. The model can automatically infer the number of relation clusters, which is particularly important in novel cases where little prior knowledge is known about the number of relation clusters. The approach is applied to clustering various relations in a gene database. Keywords-relational learning; clustering; Bayesian non- parametric","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125565441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sriraam Natarajan, Gautam Kunapuli, Kshitij Judah, Prasad Tadepalli, K. Kersting, J. Shavlik
{"title":"Multi-Agent Inverse Reinforcement Learning","authors":"Sriraam Natarajan, Gautam Kunapuli, Kshitij Judah, Prasad Tadepalli, K. Kersting, J. Shavlik","doi":"10.1109/ICMLA.2010.65","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.65","url":null,"abstract":"Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multi-agent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a traffic-routing domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125044165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}