{"title":"Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization","authors":"Paweł Teisseyre","doi":"10.34768/amcs-2022-0023","DOIUrl":null,"url":null,"abstract":"Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.","PeriodicalId":50339,"journal":{"name":"International Journal of Applied Mathematics and Computer Science","volume":"4 1","pages":"311 - 322"},"PeriodicalIF":1.6000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics and Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.34768/amcs-2022-0023","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
期刊介绍:
The International Journal of Applied Mathematics and Computer Science is a quarterly published in Poland since 1991 by the University of Zielona Góra in partnership with De Gruyter Poland (Sciendo) and Lubuskie Scientific Society, under the auspices of the Committee on Automatic Control and Robotics of the Polish Academy of Sciences.
The journal strives to meet the demand for the presentation of interdisciplinary research in various fields related to control theory, applied mathematics, scientific computing and computer science. In particular, it publishes high quality original research results in the following areas:
-modern control theory and practice-
artificial intelligence methods and their applications-
applied mathematics and mathematical optimisation techniques-
mathematical methods in engineering, computer science, and biology.