Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset

ACM Transactions on Knowledge Discovery from Data (TKDD) Pub Date : 2021-05-19 DOI:10.1145/3447586

Dipanjyoti Paul, Rahul Kumar, S. Saha, Jimson Mathew

{"title":"Multi-objective Cuckoo Search-based Streaming Feature Selection for Multi-label Dataset","authors":"Dipanjyoti Paul, Rahul Kumar, S. Saha, Jimson Mathew","doi":"10.1145/3447586","DOIUrl":null,"url":null,"abstract":"The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: ) when the features arrive individually and ) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: ) when the features arrive individually and ) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.

查看原文本刊更多论文

基于多目标布谷鸟搜索的多标签数据流特征选择

特征选择方法是通过从大量用来表示数据的特征中去除不相关或冗余的特征，只选择相关特征的过程。如今，许多应用领域，尤其是社交媒体网络，在不同的时间戳上不断产生新的特征。在这种情况下，当功能以在线方式到达时，为了应对功能的持续到达，选择任务也必须是一个连续的过程。因此，必须结合基于流特征选择的方法，即每次出现一个新特征或一组特征时，都必须调用特征选择过程。同样，近年来，有许多应用领域生成的数据，其中样本可能属于多个称为多标签数据集的类。与实例相关联的多个标签之间可能有一些依赖关系。找出类标签之间的相互关系有助于在多个标签中选择判别特征。在本文中，我们开发了多标签数据的流特征选择方法，其中多个标签被简化到较低维空间。在执行选择方法之前，将相似的标签分组在一起，以提高选择质量并使模型具有时间效率。采用基于布谷鸟搜索的多目标方法选择最优特征集。该方法开发了两个版本的流特征选择方法:(当特征单独到达时)和(当特征以批量形式到达时)。来自文本、生物和音频等不同领域的多标签数据集已被用于测试所开发的流特征选择方法。将所提方法与以往的许多特征选择方法进行了比较，从中可以看出在特征选择过程中使用多目标和标签关联的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Knowledge Discovery from Data (TKDD)

自引率

0.00%

发文量