Pratyush Pareek, Aaryan Bhardwaj, Sanskar Patro, Anirudh Arora, Muskan Deep Kaur Maini, Bagesh Kumar, O. P. Vyas
{"title":"基于数据密度梯度的最远边界点估计支持向量数据描述(SVDD)的样本约简","authors":"Pratyush Pareek, Aaryan Bhardwaj, Sanskar Patro, Anirudh Arora, Muskan Deep Kaur Maini, Bagesh Kumar, O. P. Vyas","doi":"10.1145/3549206.3549287","DOIUrl":null,"url":null,"abstract":"Classification is a quintessential application of machine learning for which support vector machines have been used ubiquitously because of their optimal margins and ease of use. However, they’re rarely used for large datasets due to the cubic time complexity of their training process. This has inspired several papers attempting to reduce the number of features or the number of training samples to lessen the training time of the SVMs. This paper aims to propose a novel approach for reducing the number of training samples for support vector data description (SVDD) while attempting to maximize the knowledge of the target class by selecting the most promising candidates for support vectors, which are the farthest boundary points of the data clusters. The proposed algorithm utilizes the density gradient across the data distribution to uniformly detect the boundary points, which are sampled as potential support vectors to train the support vector machines in a smaller amount of time without significant loss in accuracy. The proposed algorithm is verified via tests conducted on Human Activity Recognition, Breast Cancer Detection, and Heart Disease Detection Datasets.","PeriodicalId":199675,"journal":{"name":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sample Reduction for Support Vector Data Description (SVDD) by Farthest Boundary Point Estimation (FBPE) using Gradients of Data Density\",\"authors\":\"Pratyush Pareek, Aaryan Bhardwaj, Sanskar Patro, Anirudh Arora, Muskan Deep Kaur Maini, Bagesh Kumar, O. P. Vyas\",\"doi\":\"10.1145/3549206.3549287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is a quintessential application of machine learning for which support vector machines have been used ubiquitously because of their optimal margins and ease of use. However, they’re rarely used for large datasets due to the cubic time complexity of their training process. This has inspired several papers attempting to reduce the number of features or the number of training samples to lessen the training time of the SVMs. This paper aims to propose a novel approach for reducing the number of training samples for support vector data description (SVDD) while attempting to maximize the knowledge of the target class by selecting the most promising candidates for support vectors, which are the farthest boundary points of the data clusters. The proposed algorithm utilizes the density gradient across the data distribution to uniformly detect the boundary points, which are sampled as potential support vectors to train the support vector machines in a smaller amount of time without significant loss in accuracy. The proposed algorithm is verified via tests conducted on Human Activity Recognition, Breast Cancer Detection, and Heart Disease Detection Datasets.\",\"PeriodicalId\":199675,\"journal\":{\"name\":\"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3549206.3549287\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549206.3549287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sample Reduction for Support Vector Data Description (SVDD) by Farthest Boundary Point Estimation (FBPE) using Gradients of Data Density
Classification is a quintessential application of machine learning for which support vector machines have been used ubiquitously because of their optimal margins and ease of use. However, they’re rarely used for large datasets due to the cubic time complexity of their training process. This has inspired several papers attempting to reduce the number of features or the number of training samples to lessen the training time of the SVMs. This paper aims to propose a novel approach for reducing the number of training samples for support vector data description (SVDD) while attempting to maximize the knowledge of the target class by selecting the most promising candidates for support vectors, which are the farthest boundary points of the data clusters. The proposed algorithm utilizes the density gradient across the data distribution to uniformly detect the boundary points, which are sampled as potential support vectors to train the support vector machines in a smaller amount of time without significant loss in accuracy. The proposed algorithm is verified via tests conducted on Human Activity Recognition, Breast Cancer Detection, and Heart Disease Detection Datasets.