Weizhu Chen, Jun Yan, Benyu Zhang, Zheng Chen, Qiang Yang
{"title":"Document Transformation for Multi-label Feature Selection in Text Categorization","authors":"Weizhu Chen, Jun Yan, Benyu Zhang, Zheng Chen, Qiang Yang","doi":"10.1109/ICDM.2007.18","DOIUrl":null,"url":null,"abstract":"Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called entropy-based label assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"127","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 127
Abstract
Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called entropy-based label assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.