{"title":"On the latent distribution of logistic regression — An empirical study on spectroscopic profiling datasets","authors":"Yinsheng Zhang, Mingming He, Haiyan Wang","doi":"10.1016/j.mlwa.2025.100712","DOIUrl":null,"url":null,"abstract":"<div><div>Logistic regression is a simple yet widely used classification model in spectroscopic profiling analysis. Considering the model’s output represents a probability, this paper will investigate its latent distribution assumption, i.e., its inner linear regressor unit follows a standard logistic distribution. An empirical study on five spectroscopic profiling open datasets, i.e., wine, coffee, olive oil, cheese, and milk powder, was conducted to verify this latent distribution assertion. This paper measured the GoF (Goodness of Fit) of each dataset’s latent variable from three aspects, i.e., curve fitting, P–P and Q–Q plots, and K–S test. After hyper-parameter optimization and proper training, the latent variable, as a weighted sum of the original features, has demonstrated a high level of GoF on all the five datasets. This study verifies the suitability of logistic regression in spectroscopic profiling analysis and answers why the model output can be interpreted as a conditional probability.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100712"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Logistic regression is a simple yet widely used classification model in spectroscopic profiling analysis. Considering the model’s output represents a probability, this paper will investigate its latent distribution assumption, i.e., its inner linear regressor unit follows a standard logistic distribution. An empirical study on five spectroscopic profiling open datasets, i.e., wine, coffee, olive oil, cheese, and milk powder, was conducted to verify this latent distribution assertion. This paper measured the GoF (Goodness of Fit) of each dataset’s latent variable from three aspects, i.e., curve fitting, P–P and Q–Q plots, and K–S test. After hyper-parameter optimization and proper training, the latent variable, as a weighted sum of the original features, has demonstrated a high level of GoF on all the five datasets. This study verifies the suitability of logistic regression in spectroscopic profiling analysis and answers why the model output can be interpreted as a conditional probability.