{"title":"SVDD中的固定邻域球与模式选择","authors":"Dongyin Pan","doi":"10.1109/ITAIC.2014.7065065","DOIUrl":null,"url":null,"abstract":"For the problem of a large dataset, we need to select a subset to represent the original dataset. Many scholars do pattern selection from the problem of the kNN (k-nearest neighbors). The distribution of a pattern's neighbors is usually uneven. In this paper, we define a fixed neighborhood sphere. When the pattern locates near the boundary of the data distribution, there will be fewer neighbors in the neighborhood sphere and when the pattern locates within the data distribution, there will be more neighbors in the neighborhood sphere. According to gather the statistic of the neighbors in a fixed neighborhood sphere, we can find those patterns locating near the boundary of the data distribution. In SVDD (Support Vector Data Description), those patterns are locating near the boundary of the data distribution have more information. They are those patterns which would be support vectors. We can use FNSPS (fixed neighborhood sphere pattern selection) algorithm to select those patterns, which locate near the boundary of the data distribution. The experimental results show that the performance of the SVDD will not go bad. The time complexity of the naive identifying the neighbors in the fixed neighborhood sphere is O(n2). And the time complexity of the SVDD is O(n3). If we set a lower threshold, the FNSPS algorithm can also be used to remove the noise in the targets.","PeriodicalId":111584,"journal":{"name":"2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fixed neighborhood sphere and pattern selection in SVDD\",\"authors\":\"Dongyin Pan\",\"doi\":\"10.1109/ITAIC.2014.7065065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the problem of a large dataset, we need to select a subset to represent the original dataset. Many scholars do pattern selection from the problem of the kNN (k-nearest neighbors). The distribution of a pattern's neighbors is usually uneven. In this paper, we define a fixed neighborhood sphere. When the pattern locates near the boundary of the data distribution, there will be fewer neighbors in the neighborhood sphere and when the pattern locates within the data distribution, there will be more neighbors in the neighborhood sphere. According to gather the statistic of the neighbors in a fixed neighborhood sphere, we can find those patterns locating near the boundary of the data distribution. In SVDD (Support Vector Data Description), those patterns are locating near the boundary of the data distribution have more information. They are those patterns which would be support vectors. We can use FNSPS (fixed neighborhood sphere pattern selection) algorithm to select those patterns, which locate near the boundary of the data distribution. The experimental results show that the performance of the SVDD will not go bad. The time complexity of the naive identifying the neighbors in the fixed neighborhood sphere is O(n2). And the time complexity of the SVDD is O(n3). If we set a lower threshold, the FNSPS algorithm can also be used to remove the noise in the targets.\",\"PeriodicalId\":111584,\"journal\":{\"name\":\"2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITAIC.2014.7065065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITAIC.2014.7065065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fixed neighborhood sphere and pattern selection in SVDD
For the problem of a large dataset, we need to select a subset to represent the original dataset. Many scholars do pattern selection from the problem of the kNN (k-nearest neighbors). The distribution of a pattern's neighbors is usually uneven. In this paper, we define a fixed neighborhood sphere. When the pattern locates near the boundary of the data distribution, there will be fewer neighbors in the neighborhood sphere and when the pattern locates within the data distribution, there will be more neighbors in the neighborhood sphere. According to gather the statistic of the neighbors in a fixed neighborhood sphere, we can find those patterns locating near the boundary of the data distribution. In SVDD (Support Vector Data Description), those patterns are locating near the boundary of the data distribution have more information. They are those patterns which would be support vectors. We can use FNSPS (fixed neighborhood sphere pattern selection) algorithm to select those patterns, which locate near the boundary of the data distribution. The experimental results show that the performance of the SVDD will not go bad. The time complexity of the naive identifying the neighbors in the fixed neighborhood sphere is O(n2). And the time complexity of the SVDD is O(n3). If we set a lower threshold, the FNSPS algorithm can also be used to remove the noise in the targets.