Nicolas Heintz , Tom Francart , Alexander Bertrand
{"title":"Minimally informed linear discriminant analysis: Training an LDA model with unlabelled data","authors":"Nicolas Heintz , Tom Francart , Alexander Bertrand","doi":"10.1016/j.sigpro.2025.110226","DOIUrl":null,"url":null,"abstract":"<div><div>Linear Discriminant Analysis (LDA) is one of the oldest and most popular linear methods for supervised classification problems. Computing the optimal LDA projection vector requires calculating the average and covariance of the feature vectors of each class individually, which necessitates class labels to estimate these statistics from the data. In this paper we demonstrate that, if some minor prior information is available, it is possible to compute the exact projection vector from LDA models based on unlabelled data. More precisely, we show that either one of the following three pieces of information is sufficient to compute the LDA projection vector if only unlabelled data are available: (1) the class average of one of the two classes, (2) the difference between both class averages (up to a scaling), or (3) the class covariance matrices (up to a scaling). These theoretical results are validated in numerical experiments, demonstrating that this minimally informed Linear Discriminant Analysis (MILDA) model closely approximates the solution of a supervised LDA model, even on high-dimensional, poorly separated or extremely imbalanced data. Furthermore, we show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA and is able to quickly adapt to non-stationary data, making it well-suited to use as an adaptive classifier that is continuously retrained on (unlabelled) streaming data.</div></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"239 ","pages":"Article 110226"},"PeriodicalIF":3.6000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168425003408","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Linear Discriminant Analysis (LDA) is one of the oldest and most popular linear methods for supervised classification problems. Computing the optimal LDA projection vector requires calculating the average and covariance of the feature vectors of each class individually, which necessitates class labels to estimate these statistics from the data. In this paper we demonstrate that, if some minor prior information is available, it is possible to compute the exact projection vector from LDA models based on unlabelled data. More precisely, we show that either one of the following three pieces of information is sufficient to compute the LDA projection vector if only unlabelled data are available: (1) the class average of one of the two classes, (2) the difference between both class averages (up to a scaling), or (3) the class covariance matrices (up to a scaling). These theoretical results are validated in numerical experiments, demonstrating that this minimally informed Linear Discriminant Analysis (MILDA) model closely approximates the solution of a supervised LDA model, even on high-dimensional, poorly separated or extremely imbalanced data. Furthermore, we show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA and is able to quickly adapt to non-stationary data, making it well-suited to use as an adaptive classifier that is continuously retrained on (unlabelled) streaming data.
期刊介绍:
Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing.
Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.