{"title":"Study on feature selection algorithm in topic tracking","authors":"Shengdong Li, Xueqiang Lv, Yuqin Li, Shuicai Shi","doi":"10.4156/IJIPM.VOL1.ISSUE1.3","DOIUrl":null,"url":null,"abstract":"Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. Feature selection algorithm in VSM is an important means of data pre-processing, and it can reduce vector space dimension and improve the generalization ability of the algorithm. Therefore, it is necessary for feature selection algorithms to be in-depth and extensive research. So we study how feature space dimension and feature selection algorithm affect topic tracking. Then we get the variation law that they affect topic tracking, and add up their optimal values in topic tracking. Finally, TDT evaluation methods prove that optimal topic tracking performance based on weight of evidence for text increases by 8.762% more than mutual information.","PeriodicalId":302068,"journal":{"name":"The 2nd International Conference on Software Engineering and Data Mining","volume":"224 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2nd International Conference on Software Engineering and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4156/IJIPM.VOL1.ISSUE1.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Text classification is the key technology for topic tracking, and vector space model (VSM) is one of the most simple and effective model for topics representation. Feature selection algorithm in VSM is an important means of data pre-processing, and it can reduce vector space dimension and improve the generalization ability of the algorithm. Therefore, it is necessary for feature selection algorithms to be in-depth and extensive research. So we study how feature space dimension and feature selection algorithm affect topic tracking. Then we get the variation law that they affect topic tracking, and add up their optimal values in topic tracking. Finally, TDT evaluation methods prove that optimal topic tracking performance based on weight of evidence for text increases by 8.762% more than mutual information.