{"title":"一种新的文本分类特征排序算法:Brilliant概率特征选择器(BPFS)","authors":"Bekir Parlak","doi":"10.1111/coin.12599","DOIUrl":null,"url":null,"abstract":"Text classification (TC) is a very crucial task in this century of high‐volume text datasets. Feature selection (FS) is one of the most important stages in TC studies. In the literature, numerous feature selection methods are recommended for TC. In the TC domain, filter‐based FS methods are commonly utilized to select a more informative feature subsets. Each method uses a scoring system that is based on its algorithm to order the features. The classification process is then carried out by choosing the top‐N features. However, each method's feature order is distinct from the others. Each method selects by giving the qualities that are critical to its algorithm a high score, but it does not select by giving the features that are unimportant a low value. In this paper, we proposed a novel filter‐based FS method namely, brilliant probabilistic feature selector (BPFS), to assign a fair score and select informative features. While the BPFS method selects unique features, it also aims to select sparse features by assigning higher scores than common features. Extensive experimental studies using three effective classifiers decision tree (DT), support vector machines (SVM), and multinomial naive bayes (MNB) on four widely used datasets named Reuters‐21,578, 20Newsgroup, Enron1, and Polarity with different characteristics demonstrate the success of the BPFS method. For feature dimensions, 20, 50, 100, 200, 500, and 1000 dimensions were used. The experimental results on different benchmark datasets show that the BPFS method is more successful than the well‐known and recent FS methods according to Micro‐F1 and Macro‐F1 scores.","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel feature ranking algorithm for text classification: Brilliant probabilistic feature selector (BPFS)\",\"authors\":\"Bekir Parlak\",\"doi\":\"10.1111/coin.12599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification (TC) is a very crucial task in this century of high‐volume text datasets. Feature selection (FS) is one of the most important stages in TC studies. In the literature, numerous feature selection methods are recommended for TC. In the TC domain, filter‐based FS methods are commonly utilized to select a more informative feature subsets. Each method uses a scoring system that is based on its algorithm to order the features. The classification process is then carried out by choosing the top‐N features. However, each method's feature order is distinct from the others. Each method selects by giving the qualities that are critical to its algorithm a high score, but it does not select by giving the features that are unimportant a low value. In this paper, we proposed a novel filter‐based FS method namely, brilliant probabilistic feature selector (BPFS), to assign a fair score and select informative features. While the BPFS method selects unique features, it also aims to select sparse features by assigning higher scores than common features. Extensive experimental studies using three effective classifiers decision tree (DT), support vector machines (SVM), and multinomial naive bayes (MNB) on four widely used datasets named Reuters‐21,578, 20Newsgroup, Enron1, and Polarity with different characteristics demonstrate the success of the BPFS method. For feature dimensions, 20, 50, 100, 200, 500, and 1000 dimensions were used. The experimental results on different benchmark datasets show that the BPFS method is more successful than the well‐known and recent FS methods according to Micro‐F1 and Macro‐F1 scores.\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12599\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12599","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A novel feature ranking algorithm for text classification: Brilliant probabilistic feature selector (BPFS)
Text classification (TC) is a very crucial task in this century of high‐volume text datasets. Feature selection (FS) is one of the most important stages in TC studies. In the literature, numerous feature selection methods are recommended for TC. In the TC domain, filter‐based FS methods are commonly utilized to select a more informative feature subsets. Each method uses a scoring system that is based on its algorithm to order the features. The classification process is then carried out by choosing the top‐N features. However, each method's feature order is distinct from the others. Each method selects by giving the qualities that are critical to its algorithm a high score, but it does not select by giving the features that are unimportant a low value. In this paper, we proposed a novel filter‐based FS method namely, brilliant probabilistic feature selector (BPFS), to assign a fair score and select informative features. While the BPFS method selects unique features, it also aims to select sparse features by assigning higher scores than common features. Extensive experimental studies using three effective classifiers decision tree (DT), support vector machines (SVM), and multinomial naive bayes (MNB) on four widely used datasets named Reuters‐21,578, 20Newsgroup, Enron1, and Polarity with different characteristics demonstrate the success of the BPFS method. For feature dimensions, 20, 50, 100, 200, 500, and 1000 dimensions were used. The experimental results on different benchmark datasets show that the BPFS method is more successful than the well‐known and recent FS methods according to Micro‐F1 and Macro‐F1 scores.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.