X. Xi, Eamonn J. Keogh, C. Shelton, Li Wei, C. Ratanamahatana
{"title":"Fast time series classification using numerosity reduction","authors":"X. Xi, Eamonn J. Keogh, C. Shelton, Li Wei, C. Ratanamahatana","doi":"10.1145/1143844.1143974","DOIUrl":"https://doi.org/10.1145/1143844.1143974","url":null,"abstract":"Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. While the idea of numerosity reduction for nearest-neighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy. We test our ideas with a comprehensive set of experiments, and show that it can efficiently produce extremely fast accurate classifiers.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127215702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ulf Brefeld, Thomas Gärtner, T. Scheffer, S. Wrobel
{"title":"Efficient co-regularised least squares regression","authors":"Ulf Brefeld, Thomas Gärtner, T. Scheffer, S. Wrobel","doi":"10.1145/1143844.1143862","DOIUrl":"https://doi.org/10.1145/1143844.1143862","url":null,"abstract":"In many applications, unlabelled examples are inexpensive and easy to obtain. Semi-supervised approaches try to utilise such examples to reduce the predictive error. In this paper, we investigate a semi-supervised least squares regression algorithm based on the co-learning approach. Similar to other semi-supervised algorithms, our base algorithm has cubic runtime complexity in the number of unlabelled examples. To be able to handle larger sets of unlabelled examples, we devise a semi-parametric variant that scales linearly in the number of unlabelled examples. Experiments show a significant error reduction by co-regularisation and a large runtime improvement for the semi-parametric approximation. Last but not least, we propose a distributed procedure that can be applied without collecting all data at a single site.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124736927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations","authors":"D. DeCoste","doi":"10.1145/1143844.1143876","DOIUrl":"https://doi.org/10.1145/1143844.1143876","url":null,"abstract":"Fast gradient-based methods for Maximum Margin Matrix Factorization (MMMF) were recently shown to have great promise (Rennie & Srebro, 2005), including significantly outperforming the previous state-of-the-art methods on some standard collaborative prediction benchmarks (including MovieLens). In this paper, we investigate ways to further improve the performance of MMMF, by casting it within an ensemble approach. We explore and evaluate a variety of alternative ways to define such ensembles. We show that our resulting ensembles can perform significantly better than a single MMMF model, along multiple evaluation metrics. In fact, we find that ensembles of partially trained MMMF models can sometimes even give better predictions in total training time comparable to a single MMMF model.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical comparison of supervised learning algorithms","authors":"R. Caruana, Alexandru Niculescu-Mizil","doi":"10.1145/1143844.1143865","DOIUrl":"https://doi.org/10.1145/1143844.1143865","url":null,"abstract":"A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125691506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithms for portfolio management based on the Newton method","authors":"A. Agarwal, Elad Hazan, Satyen Kale, R. Schapire","doi":"10.1145/1143844.1143846","DOIUrl":"https://doi.org/10.1145/1143844.1143846","url":null,"abstract":"We experimentally study on-line investment algorithms first proposed by Agarwal and Hazan and extended by Hazan et al. which achieve almost the same wealth as the best constant-rebalanced portfolio determined in hindsight. These algorithms are the first to combine optimal logarithmic regret bounds with efficient deterministic computability. They are based on the Newton method for offline optimization which, unlike previous approaches, exploits second order information. After analyzing the algorithm using the potential function introduced by Agarwal and Hazan, we present extensive experiments on actual financial data. These experiments confirm the theoretical advantage of our algorithms, which yield higher returns and run considerably faster than previous algorithms with optimal regret. Additionally, we perform financial analysis using mean-variance calculations and the Sharpe ratio.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129888826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concept boundary detection for speeding up SVMs","authors":"Navneet Panda, E. Chang, Gang Wu","doi":"10.1145/1143844.1143930","DOIUrl":"https://doi.org/10.1145/1143844.1143930","url":null,"abstract":"Support Vector Machines (SVMs) suffer from an O(n2) training cost, where n denotes the number of training instances. In this paper, we propose an algorithm to select boundary instances as training data to substantially reduce n. Our proposed algorithm is motivated by the result of (Burges, 1999) that, removing non-support vectors from the training set does not change SVM training results. Our algorithm eliminates instances that are likely to be non-support vectors. In the concept-independent preprocessing step of our algorithm, we prepare nearest-neighbor lists for training instances. In the concept-specific sampling step, we can then effectively select useful training data for each target concept. Empirical studies show our algorithm to be effective in reducing n, outperforming other competing downsampling algorithms without significantly compromising testing accuracy.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130384503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pachinko allocation: DAG-structured mixture models of topic correlations","authors":"Wei Li, A. McCallum","doi":"10.1145/1143844.1143917","DOIUrl":"https://doi.org/10.1145/1143844.1143917","url":null,"abstract":"Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129343052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic basis function construction for approximate dynamic programming and reinforcement learning","authors":"Philipp W. Keller, Shie Mannor, Doina Precup","doi":"10.1145/1143844.1143901","DOIUrl":"https://doi.org/10.1145/1143844.1143901","url":null,"abstract":"We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiclass boosting with repartitioning","authors":"Ling Li","doi":"10.1145/1143844.1143916","DOIUrl":"https://doi.org/10.1145/1143844.1143916","url":null,"abstract":"A multiclass classification problem can be reduced to a collection of binary problems with the aid of a coding matrix. The quality of the final solution, which is an ensemble of base classifiers learned on the binary problems, is affected by both the performance of the base learner and the error-correcting ability of the coding matrix. A coding matrix with strong error-correcting ability may not be overall optimal if the binary problems are too hard for the base learner. Thus a trade-off between error-correcting and base learning should be sought. In this paper, we propose a new multiclass boosting algorithm that modifies the coding matrix according to the learning ability of the base learner. We show experimentally that our algorithm is very efficient in optimizing the multiclass margin cost, and outperforms existing multiclass algorithms such as AdaBoost.ECC and one-vs-one. The improvement is especially significant when the base learner is not very powerful.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"191 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120891548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression with the optimised combination technique","authors":"J. Garcke","doi":"10.1145/1143844.1143885","DOIUrl":"https://doi.org/10.1145/1143844.1143885","url":null,"abstract":"We consider the sparse grid combination technique for regression, which we regard as a problem of function reconstruction in some given function space. We use a regularised least squares approach, discretised by sparse grids and solved using the so-called combination technique, where a certain sequence of conventional grids is employed. The sparse grid solution is then obtained by addition of the partial solutions with combination co-efficients dependent on the involved grids. This approach shows instabilities in certain situations and is not guaranteed to converge with higher discretisation levels. In this article we apply the recently introduced optimised combination technique, which repairs these instabilities. Now the combination coefficients also depend on the function to be reconstructed, resulting in a non-linear approximation method which achieves very competitive results. We show that the computational complexity of the improved method still scales only linear in regard to the number of data.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125480370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}