{"title":"Domain Driven Two-Phase Feature Selection Method Based on Bhattacharyya Distance and Kernel Distance Measurements","authors":"Yibing Chen, Lingling Zhang, Jun Li, Yong Shi","doi":"10.1109/WI-IAT.2011.61","DOIUrl":null,"url":null,"abstract":"This paper proposes a two-phase feature selection method specific for bioinformatics domain from classification perspective in data mining. In the first phase, Bhattacharyya distance measurement is used for filtering the majority of irrelevant genes. Upon the basis, we apply floating sequential search method (FSSM) to further select informative gene set using kernel distance as measurement of class separability. The verification of colon tissue dataset using support vector machines (SVMs) proves that informative gene set selected by our method is acceptable for disease identification.","PeriodicalId":128421,"journal":{"name":"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2011.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
This paper proposes a two-phase feature selection method specific for bioinformatics domain from classification perspective in data mining. In the first phase, Bhattacharyya distance measurement is used for filtering the majority of irrelevant genes. Upon the basis, we apply floating sequential search method (FSSM) to further select informative gene set using kernel distance as measurement of class separability. The verification of colon tissue dataset using support vector machines (SVMs) proves that informative gene set selected by our method is acceptable for disease identification.