{"title":"对正数据和未标记数据的逻辑回归拟合策略的重新审视","authors":"Adam Wawrzenczyk, J. Mielniczuk","doi":"10.34768/amcs-2022-0022","DOIUrl":null,"url":null,"abstract":"Abstract Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.","PeriodicalId":50339,"journal":{"name":"International Journal of Applied Mathematics and Computer Science","volume":"5 1","pages":"299 - 309"},"PeriodicalIF":1.6000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data\",\"authors\":\"Adam Wawrzenczyk, J. Mielniczuk\",\"doi\":\"10.34768/amcs-2022-0022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.\",\"PeriodicalId\":50339,\"journal\":{\"name\":\"International Journal of Applied Mathematics and Computer Science\",\"volume\":\"5 1\",\"pages\":\"299 - 309\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Applied Mathematics and Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.34768/amcs-2022-0022\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics and Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.34768/amcs-2022-0022","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data
Abstract Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.
期刊介绍:
The International Journal of Applied Mathematics and Computer Science is a quarterly published in Poland since 1991 by the University of Zielona Góra in partnership with De Gruyter Poland (Sciendo) and Lubuskie Scientific Society, under the auspices of the Committee on Automatic Control and Robotics of the Polish Academy of Sciences.
The journal strives to meet the demand for the presentation of interdisciplinary research in various fields related to control theory, applied mathematics, scientific computing and computer science. In particular, it publishes high quality original research results in the following areas:
-modern control theory and practice-
artificial intelligence methods and their applications-
applied mathematics and mathematical optimisation techniques-
mathematical methods in engineering, computer science, and biology.