{"title":"负未标记数据的稀疏伯努利混合建模:一种识别和表征长COVID的方法。","authors":"Tingyi Cao, Harrison T Reeder, Andrea S Foulkes","doi":"10.1093/biomtc/ujaf021","DOIUrl":null,"url":null,"abstract":"<p><p>SARS-CoV-2-infected individuals have reported a diverse collection of persistent and often debilitating symptoms commonly referred to as long COVID or post-acute sequelae of SARS-CoV-2 (PASC). Identifying PASC and its subphenotypes is challenging because available data are \"negative-unlabeled\" as uninfected individuals must be PASC negative, but those with prior infection have unknown PASC status. Moreover, feature selection among many potentially informative characteristics can facilitate reaching a concise and easily interpretable PASC definition. Therefore, to characterize PASC and the spectrum of PASC subphenotypes while identifying a minimal set of features, we propose a Bernoulli mixture model with novel parameterization to accommodate negative-unlabeled data and Bayesian priors to induce sparsity. We present an efficient expectation-maximization algorithm for estimation, and a grid search procedure to select the number of clusters and level of sparsity. We evaluate the proposed method with a simulation study and an analysis of data on self-reported symptoms from the ongoing Researching COVID to Enhance Recovery-Adult Cohort study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899553/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sparse Bernoulli mixture modeling with negative-unlabeled data: an approach to identify and characterize long COVID.\",\"authors\":\"Tingyi Cao, Harrison T Reeder, Andrea S Foulkes\",\"doi\":\"10.1093/biomtc/ujaf021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>SARS-CoV-2-infected individuals have reported a diverse collection of persistent and often debilitating symptoms commonly referred to as long COVID or post-acute sequelae of SARS-CoV-2 (PASC). Identifying PASC and its subphenotypes is challenging because available data are \\\"negative-unlabeled\\\" as uninfected individuals must be PASC negative, but those with prior infection have unknown PASC status. Moreover, feature selection among many potentially informative characteristics can facilitate reaching a concise and easily interpretable PASC definition. Therefore, to characterize PASC and the spectrum of PASC subphenotypes while identifying a minimal set of features, we propose a Bernoulli mixture model with novel parameterization to accommodate negative-unlabeled data and Bayesian priors to induce sparsity. We present an efficient expectation-maximization algorithm for estimation, and a grid search procedure to select the number of clusters and level of sparsity. We evaluate the proposed method with a simulation study and an analysis of data on self-reported symptoms from the ongoing Researching COVID to Enhance Recovery-Adult Cohort study.</p>\",\"PeriodicalId\":8930,\"journal\":{\"name\":\"Biometrics\",\"volume\":\"81 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899553/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomtc/ujaf021\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf021","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
Sparse Bernoulli mixture modeling with negative-unlabeled data: an approach to identify and characterize long COVID.
SARS-CoV-2-infected individuals have reported a diverse collection of persistent and often debilitating symptoms commonly referred to as long COVID or post-acute sequelae of SARS-CoV-2 (PASC). Identifying PASC and its subphenotypes is challenging because available data are "negative-unlabeled" as uninfected individuals must be PASC negative, but those with prior infection have unknown PASC status. Moreover, feature selection among many potentially informative characteristics can facilitate reaching a concise and easily interpretable PASC definition. Therefore, to characterize PASC and the spectrum of PASC subphenotypes while identifying a minimal set of features, we propose a Bernoulli mixture model with novel parameterization to accommodate negative-unlabeled data and Bayesian priors to induce sparsity. We present an efficient expectation-maximization algorithm for estimation, and a grid search procedure to select the number of clusters and level of sparsity. We evaluate the proposed method with a simulation study and an analysis of data on self-reported symptoms from the ongoing Researching COVID to Enhance Recovery-Adult Cohort study.
期刊介绍:
The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.