{"title":"基于异构质量源的最优加权PCA的最优样本采集","authors":"David Hong;Laura Balzano","doi":"10.1109/LSP.2025.3550280","DOIUrl":null,"url":null,"abstract":"Modern high-dimensional datasets are often formed by acquiring samples from multiple sources having heterogeneous quality, i.e., some sources are noisier than others. Collecting data in this manner raises the following natural question: what is the best way to collect the data (i.e., how many samples should be acquired from each source) given constraints (e.g., on time or energy)? In general, the answer depends on what analysis is to be performed. In this paper, we study the foundational signal processing task of estimating underlying low-dimensional principal components. Since the resulting dataset will be high-dimensional and will have heteroscedastic noise, we focus on the recently proposed optimally weighted PCA, which is designed specifically for this setting. We develop an efficient method for designing sample acquisitions that optimize the asymptotic performance of optimally weighted PCA given resource constraints, and we illustrate the proposed method through various case studies.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1425-1429"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal Sample Acquisition for Optimally Weighted PCA From Heterogeneous Quality Sources\",\"authors\":\"David Hong;Laura Balzano\",\"doi\":\"10.1109/LSP.2025.3550280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern high-dimensional datasets are often formed by acquiring samples from multiple sources having heterogeneous quality, i.e., some sources are noisier than others. Collecting data in this manner raises the following natural question: what is the best way to collect the data (i.e., how many samples should be acquired from each source) given constraints (e.g., on time or energy)? In general, the answer depends on what analysis is to be performed. In this paper, we study the foundational signal processing task of estimating underlying low-dimensional principal components. Since the resulting dataset will be high-dimensional and will have heteroscedastic noise, we focus on the recently proposed optimally weighted PCA, which is designed specifically for this setting. We develop an efficient method for designing sample acquisitions that optimize the asymptotic performance of optimally weighted PCA given resource constraints, and we illustrate the proposed method through various case studies.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1425-1429\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10921711/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10921711/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Optimal Sample Acquisition for Optimally Weighted PCA From Heterogeneous Quality Sources
Modern high-dimensional datasets are often formed by acquiring samples from multiple sources having heterogeneous quality, i.e., some sources are noisier than others. Collecting data in this manner raises the following natural question: what is the best way to collect the data (i.e., how many samples should be acquired from each source) given constraints (e.g., on time or energy)? In general, the answer depends on what analysis is to be performed. In this paper, we study the foundational signal processing task of estimating underlying low-dimensional principal components. Since the resulting dataset will be high-dimensional and will have heteroscedastic noise, we focus on the recently proposed optimally weighted PCA, which is designed specifically for this setting. We develop an efficient method for designing sample acquisitions that optimize the asymptotic performance of optimally weighted PCA given resource constraints, and we illustrate the proposed method through various case studies.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.