{"title":"基于代价敏感集成多标签学习方法的文本分类","authors":"Haifeng Hu, Tao Zhang, Jiansheng Wu","doi":"10.3923/JSE.2016.42.53","DOIUrl":null,"url":null,"abstract":"Text classification is one of the most important tasks in the Natural Language Processing research field. In most cases, text classification is usually a multi-label learning task where, three attributes (i.e., information gain, document frequency and chi-square test values) are widely used to describe documents and the degree of importance of each attribute varies depending on different applications. Hence, it is valuable to improve the prediction performance of text classification by assembling the above attributes. Furthermore, there exists a widespread problem of class imbalance in multi-label learning algorithm. Thus, in this study, a novel cost-sensitive ensemble multi-label learning method CS-EnMLKNN is proposed to assemble the attributes in text classification and deal with the class imbalance problem and a comprehensive framework for solving text classification problems is also proposed accordingly. Finally, experiments on two classic datasets show that our CS-EnMLKNN algorithm outperforms most state-of-the-art multi-label learning algorithms in terms of several learning evaluation criteria.","PeriodicalId":30943,"journal":{"name":"Journal of Software Engineering","volume":"10 1","pages":"42-53"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Classification Based on a Novel Cost-Sensitive Ensemble Multi-Label Learning Method\",\"authors\":\"Haifeng Hu, Tao Zhang, Jiansheng Wu\",\"doi\":\"10.3923/JSE.2016.42.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is one of the most important tasks in the Natural Language Processing research field. In most cases, text classification is usually a multi-label learning task where, three attributes (i.e., information gain, document frequency and chi-square test values) are widely used to describe documents and the degree of importance of each attribute varies depending on different applications. Hence, it is valuable to improve the prediction performance of text classification by assembling the above attributes. Furthermore, there exists a widespread problem of class imbalance in multi-label learning algorithm. Thus, in this study, a novel cost-sensitive ensemble multi-label learning method CS-EnMLKNN is proposed to assemble the attributes in text classification and deal with the class imbalance problem and a comprehensive framework for solving text classification problems is also proposed accordingly. Finally, experiments on two classic datasets show that our CS-EnMLKNN algorithm outperforms most state-of-the-art multi-label learning algorithms in terms of several learning evaluation criteria.\",\"PeriodicalId\":30943,\"journal\":{\"name\":\"Journal of Software Engineering\",\"volume\":\"10 1\",\"pages\":\"42-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3923/JSE.2016.42.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3923/JSE.2016.42.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text Classification Based on a Novel Cost-Sensitive Ensemble Multi-Label Learning Method
Text classification is one of the most important tasks in the Natural Language Processing research field. In most cases, text classification is usually a multi-label learning task where, three attributes (i.e., information gain, document frequency and chi-square test values) are widely used to describe documents and the degree of importance of each attribute varies depending on different applications. Hence, it is valuable to improve the prediction performance of text classification by assembling the above attributes. Furthermore, there exists a widespread problem of class imbalance in multi-label learning algorithm. Thus, in this study, a novel cost-sensitive ensemble multi-label learning method CS-EnMLKNN is proposed to assemble the attributes in text classification and deal with the class imbalance problem and a comprehensive framework for solving text classification problems is also proposed accordingly. Finally, experiments on two classic datasets show that our CS-EnMLKNN algorithm outperforms most state-of-the-art multi-label learning algorithms in terms of several learning evaluation criteria.