Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization

IF 1.6 4区 计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS
Paweł Teisseyre
{"title":"Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization","authors":"Paweł Teisseyre","doi":"10.34768/amcs-2022-0023","DOIUrl":null,"url":null,"abstract":"Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.","PeriodicalId":50339,"journal":{"name":"International Journal of Applied Mathematics and Computer Science","volume":"4 1","pages":"311 - 322"},"PeriodicalIF":1.6000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics and Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.34768/amcs-2022-0023","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
基于加权惩罚经验风险最小化的正多标签数据联合特征选择与分类
摘要:我们考虑了多个目标变量不被直接观察到的正无标签多标签场景。相反,我们观察替代变量,指示目标变量是否被标记。标签的存在意味着对应的变量是正的。没有标签意味着变量可以是正的也可以是负的。我们分析了基于两个加权惩罚经验风险最小化框架的嵌入式特征选择方法。在第一种方法中,我们引入了观测值的权重。其思想是为真实目标变量和相应代理变量的值之间存在一致性的观测值分配更大的权重。在第二种方法中,我们考虑一个加权的经验风险函数,它对应于真实未观察目标变量的风险函数。两种方法的权重都依赖于未知的倾向得分函数,其估计是一个具有挑战性的问题。我们建议对倾向得分使用非常简单的界限,这导致相对简单的权重形式。在实验中,我们分析了不同标签方案所考虑的方法的预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.10
自引率
21.10%
发文量
0
审稿时长
4.2 months
期刊介绍: The International Journal of Applied Mathematics and Computer Science is a quarterly published in Poland since 1991 by the University of Zielona Góra in partnership with De Gruyter Poland (Sciendo) and Lubuskie Scientific Society, under the auspices of the Committee on Automatic Control and Robotics of the Polish Academy of Sciences. The journal strives to meet the demand for the presentation of interdisciplinary research in various fields related to control theory, applied mathematics, scientific computing and computer science. In particular, it publishes high quality original research results in the following areas: -modern control theory and practice- artificial intelligence methods and their applications- applied mathematics and mathematical optimisation techniques- mathematical methods in engineering, computer science, and biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信