Rafia Shaikh, Muhammad Rafi, Naeem Ahmed Ahmed Mahoto, Adel Sulaiman, Asadullah Shaikh
{"title":"A Filter–based Feature Selection Approach in Multilabel Classification","authors":"Rafia Shaikh, Muhammad Rafi, Naeem Ahmed Ahmed Mahoto, Adel Sulaiman, Asadullah Shaikh","doi":"10.1088/2632-2153/ad035d","DOIUrl":null,"url":null,"abstract":"Abstract Multi–label classification is a fast–growing field of Machine Learning. Recent developments have shown several applications including social media, healthcare, bio–molecular analysis, scene, and music classification associated with the multilabel classification. In classification problems, instead of a single–label class assignment, multiple labels (multilabel or more than one class label) are assigned to an unseen record. Feature selection is a preprocessing phase used to identify the most relevant features that could improve the accuracy of the multilabel classifiers. The focus of this study is the feature selection method in multilabel classification. The
 study used a filter method in feature selection that involved the fisher score, ANOVA test, and mutual information. An extensive range of machine learning algorithms is applied in the modeling phase of a multilabel classification model that includes Binary Relevance, Classifier Chain, Label Powerset, Binary Relevance KNN, Multi–label Twin Support Vector Machine (MLTSVM), Multi–label KNN(MLKNN). Besides, label space partitioning and majority voting of ensemble methods are used, and also Random Forest as base learner. The experiments are carried out over five different multilabel benchmarking datasets. The evaluation of the classification model is measured using accuracy, precision, recall, F1 score, and hamming loss. The study demonstrated that the filter methods (i.e., mutual information) having top weighted 80% to 20% features provided significant outcomes.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"1 1","pages":"0"},"PeriodicalIF":6.3000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad035d","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Multi–label classification is a fast–growing field of Machine Learning. Recent developments have shown several applications including social media, healthcare, bio–molecular analysis, scene, and music classification associated with the multilabel classification. In classification problems, instead of a single–label class assignment, multiple labels (multilabel or more than one class label) are assigned to an unseen record. Feature selection is a preprocessing phase used to identify the most relevant features that could improve the accuracy of the multilabel classifiers. The focus of this study is the feature selection method in multilabel classification. The
 study used a filter method in feature selection that involved the fisher score, ANOVA test, and mutual information. An extensive range of machine learning algorithms is applied in the modeling phase of a multilabel classification model that includes Binary Relevance, Classifier Chain, Label Powerset, Binary Relevance KNN, Multi–label Twin Support Vector Machine (MLTSVM), Multi–label KNN(MLKNN). Besides, label space partitioning and majority voting of ensemble methods are used, and also Random Forest as base learner. The experiments are carried out over five different multilabel benchmarking datasets. The evaluation of the classification model is measured using accuracy, precision, recall, F1 score, and hamming loss. The study demonstrated that the filter methods (i.e., mutual information) having top weighted 80% to 20% features provided significant outcomes.
期刊介绍:
Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.