使用枢轴索引支持向量机查询

Arun Qamra, E. Chang
{"title":"使用枢轴索引支持向量机查询","authors":"Arun Qamra, E. Chang","doi":"10.1145/1160939.1160954","DOIUrl":null,"url":null,"abstract":"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.","PeriodicalId":346313,"journal":{"name":"Computer Vision meets Databases","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Using pivots to index for support vector machine queries\",\"authors\":\"Arun Qamra, E. Chang\",\"doi\":\"10.1145/1160939.1160954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.\",\"PeriodicalId\":346313,\"journal\":{\"name\":\"Computer Vision meets Databases\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision meets Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1160939.1160954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision meets Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1160939.1160954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在许多数据挖掘应用中,使用支持向量机来学习查询概念,然后使用学习到的支持向量机在给定的数据集中找到相应的最佳匹配。当数据集很大时,天真地扫描整个数据集以找到分类得分最高的实例是不切实际的。因此,索引策略是可伸缩性所需要的。与传统相似性搜索场景中以输入空间点的形式查询不同,SVM查询是(核函数诱导的)特征空间中的超平面,而最佳匹配是距离超平面最远的实例。此外,所使用的内核参数以及所使用的特征空间可能随查询而变化。这些问题使问题具有挑战性。在这项工作中,我们提出了一种索引策略,该策略使用枢轴(使用PCA或KPCA选择)从数据集中修剪不相关的实例,并放大较小的候选集,以有效地回答SVM查询。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using pivots to index for support vector machine queries
In many data-mining applications, Support Vector Machines are used to learn query concepts, and then the learned SVM is used to find the corresponding best matches in a given dataset. When the dataset is large, naively scanning the entire dataset to find the instances with the highest classification scores is not practical. An indexing strategy is thus desirable for scalability. In contrast to queries in traditional similarity search scenarios which are in the form of an input space point, SVM queries are hyperplanes in a (kernel function induced) feature space, and the best matches are instances farthest from the hyperplane. Also, the kernel parameters used, and hence the feature space used, may vary with the query. These issues make the problem challenging. In this work, we propose an indexing strategy that uses pivots (selected using PCA or KPCA) to prune irrelevant instances from the dataset, and zoom in on a smaller candidate set, to efficiently answer SVM queries.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信