{"title":"Generalized spectral bounds for sparse LDA","authors":"B. Moghaddam, Yair Weiss, S. Avidan","doi":"10.1145/1143844.1143925","DOIUrl":"https://doi.org/10.1145/1143844.1143925","url":null,"abstract":"We present a discrete spectral framework for the sparse or cardinality-constrained solution of a generalized Rayleigh quotient. This NP-hard combinatorial optimization problem is central to supervised learning tasks such as sparse LDA, feature selection and relevance ranking for classification. We derive a new generalized form of the Inclusion Principle for variational eigenvalue bounds, leading to exact and optimal sparse linear discriminants using branch-and-bound search. An efficient greedy (approximate) technique is also presented. The generalization performance of our sparse LDA algorithms is demonstrated with real-world UCI ML benchmarks and compared to a leading SVM-based gene selection algorithm for cancer classification.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"56 56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient MAP approximation for dense energy functions","authors":"Marius Leordeanu, M. Hebert","doi":"10.1145/1143844.1143913","DOIUrl":"https://doi.org/10.1145/1143844.1143913","url":null,"abstract":"We present an efficient method for maximizing energy functions with first and second order potentials, suitable for MAP labeling estimation problems that arise in undirected graphical models. Our approach is to relax the integer constraints on the solution in two steps. First we efficiently obtain the relaxed global optimum following a procedure similar to the iterative power method for finding the largest eigenvector of a matrix. Next, we map the relaxed optimum on a simplex and show that the new energy obtained has a certain optimal bound. Starting from this energy we follow an efficient coordinate ascent procedure that is guaranteed to increase the energy at every step and converge to a solution that obeys the initial integral constraints. We also present a sufficient condition for ascent procedures that guarantees the increase in energy at every step.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134014825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Renders, Éric Gaussier, Cyril Goutte, F. Pacull, G. Csurka
{"title":"Categorization in multiple category systems","authors":"J. Renders, Éric Gaussier, Cyril Goutte, F. Pacull, G. Csurka","doi":"10.1145/1143844.1143938","DOIUrl":"https://doi.org/10.1145/1143844.1143938","url":null,"abstract":"We explore the situation in which documents have to be categorized into more than one category system, a situation we refer to as multiple-view categorization. More particularly, we address the case where two different categorizers have already been built based on non-necessarily identical training sets, each one labeled using one category system. On the top of these categorizers considered as black-boxes, we propose some algorithms able to exploit a third training set containing a few examples annotated in both category systems. Such a situation arises for example in large companies where incoming mails have to be routed to several departments, each one relying on its own category system. We focus here on exploiting possible dependencies between category systems in order to refine the categorization decisions made by categorizers trained independently on different category systems. After a description of the multiple categorization problem, we present several possible solutions, based either on a categorization or reweighting approach, and compare them on real data. Lastly, we show how the multimedia categorization problem can be cast as a multiple categorization problem and assess our methods in this framework.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129395553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local distance preservation in the GP-LVM through back constraints","authors":"Neil D. Lawrence, J. Q. Candela","doi":"10.1145/1143844.1143909","DOIUrl":"https://doi.org/10.1145/1143844.1143909","url":null,"abstract":"The Gaussian process latent variable model (GP-LVM) is a generative approach to nonlinear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a non-linear generalization of probabilistic PCA (PPCA) (Tipping & Bishop, 1999). While most approaches to non-linear dimensionality methods focus on preserving local distances in data space, the GP-LVM focusses on exactly the opposite. Being a smooth mapping from latent to data space, it focusses on keeping things apart in latent space that are far apart in data space. In this paper we first provide an overview of dimensionality reduction techniques, placing the emphasis on the kind of distance relation preserved. We then show how the GP-LVM can be generalized, through back constraints, to additionally preserve local distances. We give illustrative experiments on common data sets.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130502166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linli Xu, Dana F. Wilkinson, F. Southey, Dale Schuurmans
{"title":"Discriminative unsupervised learning of structured predictors","authors":"Linli Xu, Dana F. Wilkinson, F. Southey, Dale Schuurmans","doi":"10.1145/1143844.1143977","DOIUrl":"https://doi.org/10.1145/1143844.1143977","url":null,"abstract":"We present a new unsupervised algorithm for training structured predictors that is discriminative, convex, and avoids the use of EM. The idea is to formulate an unsupervised version of structured learning methods, such as maximum margin Markov networks, that can be trained via semidefinite programming. The result is a discriminative training criterion for structured predictors (like hidden Markov models) that remains unsupervised and does not create local minima. To reduce training cost, we reformulate the training procedure to mitigate the dependence on semidefinite programming, and finally propose a heuristic procedure that avoids semidefinite programming entirely. Experimental results show that the convex discriminative procedure can produce better conditional models than conventional Baum-Welch (EM) training.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123997286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical solutions to the problem of diagonal dominance in kernel document clustering","authors":"Derek Greene, P. Cunningham","doi":"10.1145/1143844.1143892","DOIUrl":"https://doi.org/10.1145/1143844.1143892","url":null,"abstract":"In supervised kernel methods, it has been observed that the performance of the SVM classifier is poor in cases where the diagonal entries of the Gram matrix are large relative to the off-diagonal entries. This problem, referred to as diagonal dominance, often occurs when certain kernel functions are applied to sparse high-dimensional data, such as text corpora. In this paper we investigate the implications of diagonal dominance for unsupervised kernel methods, specifically in the task of document clustering. We propose a selection of strategies for addressing this issue, and evaluate their effectiveness in producing more accurate and stable clusterings.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Zheng, Michael I. Jordan, B. Liblit, M. Naik, A. Aiken
{"title":"Statistical debugging: simultaneous identification of multiple bugs","authors":"A. Zheng, Michael I. Jordan, B. Liblit, M. Naik, A. Aiken","doi":"10.1145/1143844.1143983","DOIUrl":"https://doi.org/10.1145/1143844.1143983","url":null,"abstract":"We describe a statistical approach to software debugging in the presence of multiple bugs. Due to sparse sampling issues and complex interaction between program predicates, many generic off-the-shelf algorithms fail to select useful bug predictors. Taking inspiration from bi-clustering algorithms, we propose an iterative collective voting scheme for the program runs and predicates. We demonstrate successful debugging results on several real world programs and a large debugging benchmark suite.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128344488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Euclidean embedding","authors":"Lawrence Cayton, S. Dasgupta","doi":"10.1145/1143844.1143866","DOIUrl":"https://doi.org/10.1145/1143844.1143866","url":null,"abstract":"We derive a robust Euclidean embedding procedure based on semidefinite programming that may be used in place of the popular classical multidimensional scaling (cMDS) algorithm. We motivate this algorithm by arguing that cMDS is not particularly robust and has several other deficiencies. General-purpose semidefinite programming solvers are too memory intensive for medium to large sized applications, so we also describe a fast subgradient-based implementation of the robust algorithm. Additionally, since cMDS is often used for dimensionality reduction, we provide an in-depth look at reducing dimensionality with embedding procedures. In particular, we show that it is NP-hard to find optimal low-dimensional embeddings under a variety of cost functions.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124715131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A regularization framework for multiple-instance learning","authors":"Pak-Ming Cheung, J. Kwok","doi":"10.1145/1143844.1143869","DOIUrl":"https://doi.org/10.1145/1143844.1143869","url":null,"abstract":"This paper focuses on kernel methods for multi-instance learning. Existing methods require the prediction of the bag to be identical to the maximum of those of its individual instances. However, this is too restrictive as only the sign is important in classification. In this paper, we provide a more complete regularization framework for MI learning by allowing the use of different loss functions between the outputs of a bag and its associated instances. This is especially important as we generalize this for multi-instance regression. Moreover, both bag and instance information can now be directly used in the optimization. Instead of using heuristics to solve the resultant non-linear optimization problem, we use the constrained concave-convex procedure which has well-studied convergence properties. Experiments on both classification and regression data sets show that the proposed method leads to improved performance.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125877486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Totally corrective boosting algorithms that maximize the margin","authors":"Manfred K. Warmuth, Jun Liao, G. Rätsch","doi":"10.1145/1143844.1143970","DOIUrl":"https://doi.org/10.1145/1143844.1143970","url":null,"abstract":"We consider boosting algorithms that maintain a distribution over a set of examples. At each iteration a weak hypothesis is received and the distribution is updated. We motivate these updates as minimizing the relative entropy subject to linear constraints. For example AdaBoost constrains the edge of the last hypothesis w.r.t. the updated distribution to be at most γ = 0. In some sense, AdaBoost is \"corrective\" w.r.t. the last hypothesis. A cleaner boosting method is to be \"totally corrective\": the edges of all past hypotheses are constrained to be at most γ, where γ is suitably adapted.Using new techniques, we prove the same iteration bounds for the totally corrective algorithms as for their corrective versions. Moreover with adaptive γ, the algorithms provably maximizes the margin. Experimentally, the totally corrective versions return smaller convex combinations of weak hypotheses than the corrective ones and are competitive with LPBoost, a totally corrective boosting algorithm with no regularization, for which there is no iteration bound known.","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124406406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}