{"title":"高维数据分类的随机投影与随机特征选择","authors":"Sachin Mylavarapu, A. Kabán","doi":"10.1109/UKCI.2013.6651321","DOIUrl":null,"url":null,"abstract":"Random projections and random subspace methods are very simple and computationally efficient techniques to reduce dimensionality for learning from high dimensional data. Since high dimensional data tends to be prevalent in many domains, such techniques are the subject of much recent interest. Random projections (RP) are motivated by their proven ability to preserve inter-point distances. By contrary, the random selection of features (RF) appears to be a heuristic, which nevertheless exhibits good performance in previous studies. In this paper we conduct a thorough empirical comparison between these two approaches in a variety of data sets with different characteristics. We also extend our study to multi-class problems. We find that RP tends to perform better than RF in terms of the classification accuracy in small sample settings, although RF is surprisingly good as well in many cases.","PeriodicalId":106191,"journal":{"name":"2013 13th UK Workshop on Computational Intelligence (UKCI)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Random projections versus random selection of features for classification of high dimensional data\",\"authors\":\"Sachin Mylavarapu, A. Kabán\",\"doi\":\"10.1109/UKCI.2013.6651321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random projections and random subspace methods are very simple and computationally efficient techniques to reduce dimensionality for learning from high dimensional data. Since high dimensional data tends to be prevalent in many domains, such techniques are the subject of much recent interest. Random projections (RP) are motivated by their proven ability to preserve inter-point distances. By contrary, the random selection of features (RF) appears to be a heuristic, which nevertheless exhibits good performance in previous studies. In this paper we conduct a thorough empirical comparison between these two approaches in a variety of data sets with different characteristics. We also extend our study to multi-class problems. We find that RP tends to perform better than RF in terms of the classification accuracy in small sample settings, although RF is surprisingly good as well in many cases.\",\"PeriodicalId\":106191,\"journal\":{\"name\":\"2013 13th UK Workshop on Computational Intelligence (UKCI)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 13th UK Workshop on Computational Intelligence (UKCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UKCI.2013.6651321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 13th UK Workshop on Computational Intelligence (UKCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKCI.2013.6651321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random projections versus random selection of features for classification of high dimensional data
Random projections and random subspace methods are very simple and computationally efficient techniques to reduce dimensionality for learning from high dimensional data. Since high dimensional data tends to be prevalent in many domains, such techniques are the subject of much recent interest. Random projections (RP) are motivated by their proven ability to preserve inter-point distances. By contrary, the random selection of features (RF) appears to be a heuristic, which nevertheless exhibits good performance in previous studies. In this paper we conduct a thorough empirical comparison between these two approaches in a variety of data sets with different characteristics. We also extend our study to multi-class problems. We find that RP tends to perform better than RF in terms of the classification accuracy in small sample settings, although RF is surprisingly good as well in many cases.