使用枢轴索引支持向量机查询

Computer Vision meets Databases Pub Date : 2005-06-17 DOI:10.1145/1160939.1160954

Arun Qamra, E. Chang

{"title":"使用枢轴索引支持向量机查询","authors":"Arun Qamra, E. Chang","doi":"10.1145/1160939.1160954","DOIUrl":null,"url":null,"abstract":"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.","PeriodicalId":346313,"journal":{"name":"Computer Vision meets Databases","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using pivots to index for support vector machine queries\",\"authors\":\"Arun Qamra, E. Chang\",\"doi\":\"10.1145/1160939.1160954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.\",\"PeriodicalId\":346313,\"journal\":{\"name\":\"Computer Vision meets Databases\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision meets Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1160939.1160954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision meets Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1160939.1160954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在许多数据挖掘应用中，使用支持向量机来学习查询概念，然后使用学习到的支持向量机在给定的数据集中找到相应的最佳匹配。当数据集很大时，天真地扫描整个数据集以找到分类得分最高的实例是不切实际的。因此，索引策略是可伸缩性所需要的。与传统相似性搜索场景中以输入空间点的形式查询不同，SVM查询是(核函数诱导的)特征空间中的超平面，而最佳匹配是距离超平面最远的实例。此外，所使用的内核参数以及所使用的特征空间可能随查询而变化。这些问题使问题具有挑战性。在这项工作中，我们提出了一种索引策略，该策略使用枢轴(使用PCA或KPCA选择)从数据集中修剪不相关的实例，并放大较小的候选集，以有效地回答SVM查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using pivots to index for support vector machine queries

In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision meets Databases

自引率

0.00%

发文量