Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization

IF 1.2 4区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

International Journal of Applied Mathematics and Computer Science Pub Date : 2022-06-01 DOI:10.34768/amcs-2022-0023

Paweł Teisseyre

{"title":"Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization","authors":"Paweł Teisseyre","doi":"10.34768/amcs-2022-0023","DOIUrl":null,"url":null,"abstract":"Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.","PeriodicalId":50339,"journal":{"name":"International Journal of Applied Mathematics and Computer Science","volume":"4 1","pages":"311 - 322"},"PeriodicalIF":1.2000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics and Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.34768/amcs-2022-0023","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.

查看原文本刊更多论文

基于加权惩罚经验风险最小化的正多标签数据联合特征选择与分类

摘要:我们考虑了多个目标变量不被直接观察到的正无标签多标签场景。相反，我们观察替代变量，指示目标变量是否被标记。标签的存在意味着对应的变量是正的。没有标签意味着变量可以是正的也可以是负的。我们分析了基于两个加权惩罚经验风险最小化框架的嵌入式特征选择方法。在第一种方法中，我们引入了观测值的权重。其思想是为真实目标变量和相应代理变量的值之间存在一致性的观测值分配更大的权重。在第二种方法中，我们考虑一个加权的经验风险函数，它对应于真实未观察目标变量的风险函数。两种方法的权重都依赖于未知的倾向得分函数，其估计是一个具有挑战性的问题。我们建议对倾向得分使用非常简单的界限，这导致相对简单的权重形式。在实验中，我们分析了不同标签方案所考虑的方法的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Applied Mathematics and Computer Science 工程技术-计算机：人工智能

CiteScore

4.10

自引率

21.10%

发文量

审稿时长

4.2 months

期刊介绍： The International Journal of Applied Mathematics and Computer Science is a quarterly published in Poland since 1991 by the University of Zielona Góra in partnership with De Gruyter Poland (Sciendo) and Lubuskie Scientific Society, under the auspices of the Committee on Automatic Control and Robotics of the Polish Academy of Sciences. The journal strives to meet the demand for the presentation of interdisciplinary research in various fields related to control theory, applied mathematics, scientific computing and computer science. In particular, it publishes high quality original research results in the following areas: -modern control theory and practice- artificial intelligence methods and their applications- applied mathematics and mathematical optimisation techniques- mathematical methods in engineering, computer science, and biology.