{"title":"基于主动学习的K-means支持向量机","authors":"J. Gan, Ang Li, Qian-Lin Lei, Hao Ren, Yun Yang","doi":"10.1109/ICIS.2017.7960089","DOIUrl":null,"url":null,"abstract":"In practice, unlabeled data can be cheaply and easily collected from target domain, but it is quite difficult and expensive to obtain a large amount of labeled data. Therefore how to use both of labeled and unlabeled data to improve the learning performance becomes critical issue for many real-world applications. Active Learning and Semi-supervised Learning are right solutions to such problem, and have been intensively studied from different perspectives. The former one advocates that learner is able to control the entire dataset and actively query the labels from the target dataset, the latter one tries to improve the learner's performance by using both of labeled and unlabeled instances at the same time. In this paper, we propose an Active Learning based SVM approach, KA-SvM. According to a cluster hypothesis, we use k-means to construct a pre-selection scheme, which obtains a subset of important instances as training set, then SVM can be optimally trained on such subset rather than entire one. Our approach has been generally evaluated on several benchmark datasets with comparison with other similar approaches, the experiment results demonstrate that our approach has the outstanding performance on both of classification accuracy and computation efficiency.","PeriodicalId":301467,"journal":{"name":"2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)","volume":"300 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"K-means based on active learning for support vector machine\",\"authors\":\"J. Gan, Ang Li, Qian-Lin Lei, Hao Ren, Yun Yang\",\"doi\":\"10.1109/ICIS.2017.7960089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In practice, unlabeled data can be cheaply and easily collected from target domain, but it is quite difficult and expensive to obtain a large amount of labeled data. Therefore how to use both of labeled and unlabeled data to improve the learning performance becomes critical issue for many real-world applications. Active Learning and Semi-supervised Learning are right solutions to such problem, and have been intensively studied from different perspectives. The former one advocates that learner is able to control the entire dataset and actively query the labels from the target dataset, the latter one tries to improve the learner's performance by using both of labeled and unlabeled instances at the same time. In this paper, we propose an Active Learning based SVM approach, KA-SvM. According to a cluster hypothesis, we use k-means to construct a pre-selection scheme, which obtains a subset of important instances as training set, then SVM can be optimally trained on such subset rather than entire one. Our approach has been generally evaluated on several benchmark datasets with comparison with other similar approaches, the experiment results demonstrate that our approach has the outstanding performance on both of classification accuracy and computation efficiency.\",\"PeriodicalId\":301467,\"journal\":{\"name\":\"2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)\",\"volume\":\"300 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIS.2017.7960089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2017.7960089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
K-means based on active learning for support vector machine
In practice, unlabeled data can be cheaply and easily collected from target domain, but it is quite difficult and expensive to obtain a large amount of labeled data. Therefore how to use both of labeled and unlabeled data to improve the learning performance becomes critical issue for many real-world applications. Active Learning and Semi-supervised Learning are right solutions to such problem, and have been intensively studied from different perspectives. The former one advocates that learner is able to control the entire dataset and actively query the labels from the target dataset, the latter one tries to improve the learner's performance by using both of labeled and unlabeled instances at the same time. In this paper, we propose an Active Learning based SVM approach, KA-SvM. According to a cluster hypothesis, we use k-means to construct a pre-selection scheme, which obtains a subset of important instances as training set, then SVM can be optimally trained on such subset rather than entire one. Our approach has been generally evaluated on several benchmark datasets with comparison with other similar approaches, the experiment results demonstrate that our approach has the outstanding performance on both of classification accuracy and computation efficiency.