{"title":"A Feature Selection Algorithm of Dynamic Data-Stream Based on Hoeffding Inequality","authors":"Zhichao Yin, Chunyong Yin, Lu Feng","doi":"10.1109/AITS.2015.32","DOIUrl":null,"url":null,"abstract":"With the rapid development of the Internet, the application of data mining in the Internet is becoming more and more extensive. However, the complex data source's features are making the data mining process become very inefficient. In order to make data mining more efficient and simple, feature selection research is essential. In this paper, a new metric of mutual information based on mutual information is proposed (measure the correlation degree of the internal features of the collection), additionally Hoeffding inequality is also introduced to construct the HSF algorithm. The HSF is compared with the BIF (based on mutual information feature selection algorithm), the C4.5 classification algorithm is used as the testing algorithm for the experiments. Experiments show that HSF has better performance than BIF [1] in classification accuracy and error rate.","PeriodicalId":196795,"journal":{"name":"2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 4th International Conference on Advanced Information Technology and Sensor Application (AITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AITS.2015.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of the Internet, the application of data mining in the Internet is becoming more and more extensive. However, the complex data source's features are making the data mining process become very inefficient. In order to make data mining more efficient and simple, feature selection research is essential. In this paper, a new metric of mutual information based on mutual information is proposed (measure the correlation degree of the internal features of the collection), additionally Hoeffding inequality is also introduced to construct the HSF algorithm. The HSF is compared with the BIF (based on mutual information feature selection algorithm), the C4.5 classification algorithm is used as the testing algorithm for the experiments. Experiments show that HSF has better performance than BIF [1] in classification accuracy and error rate.