{"title":"使用枢轴索引支持向量机查询","authors":"Arun Qamra, E. Chang","doi":"10.1145/1160939.1160954","DOIUrl":null,"url":null,"abstract":"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.","PeriodicalId":346313,"journal":{"name":"Computer Vision meets Databases","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using pivots to index for support vector machine queries\",\"authors\":\"Arun Qamra, E. Chang\",\"doi\":\"10.1145/1160939.1160954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.\",\"PeriodicalId\":346313,\"journal\":{\"name\":\"Computer Vision meets Databases\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision meets Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1160939.1160954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision meets Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1160939.1160954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using pivots to index for support vector machine queries
In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.