Statistics in Biosciences最新文献

筛选
英文 中文
Canopy2: Tumor Phylogeny Inference by Bulk DNA and Single-Cell RNA Sequencing. Canopy2:通过大量DNA和单细胞RNA测序推断肿瘤系统发育。
IF 0.4
Statistics in Biosciences Pub Date : 2026-01-01 Epub Date: 2025-01-08 DOI: 10.1007/s12561-024-09466-1
Ann Marie K Weideman, Rujin Wang, Joseph G Ibrahim, Yuchao Jiang
{"title":"Canopy2: Tumor Phylogeny Inference by Bulk DNA and Single-Cell RNA Sequencing.","authors":"Ann Marie K Weideman, Rujin Wang, Joseph G Ibrahim, Yuchao Jiang","doi":"10.1007/s12561-024-09466-1","DOIUrl":"10.1007/s12561-024-09466-1","url":null,"abstract":"<p><p>Tumors are comprised of a mixture of distinct cell populations that differ in terms of genetic makeup and function. Such heterogeneity plays a role in the development of drug resistance and the ineffectiveness of targeted cancer therapies. Insight into this complexity can be obtained through the construction of a phylogenetic tree, which illustrates the evolutionary lineage of tumor cells as they acquire mutations over time. We propose Canopy2, a Bayesian framework that uses single nucleotide variants derived from bulk DNA and single-cell RNA sequencing to infer tumor phylogeny and conduct mutational profiling of tumor subpopulations. Canopy2 uses Markov chain Monte Carlo methods to sample from a joint probability distribution involving a mixture of binomial and beta-binomial distributions, specifically chosen to account for the sparsity and stochasticity of the single-cell data. Canopy2 demystifies the sources of zeros in the single-cell data and separates zeros categorized as non-cancerous (cells without mutations), stochastic (mutations not expressed due to bursting), and technical (expressed mutations not picked up by sequencing). Simulations demonstrate that Canopy2 consistently outperforms competing methods and reconstructs the clonal tree with high fidelity, even in situations involving low sequencing depth, poor single-cell yield, and highly-advanced and polyclonal tumors. We further assess the performance of Canopy2 through application to breast cancer and glioblastoma data, benchmarking against existing methods. Canopy2 is an open-source R package available at https://github.com/annweideman/canopy2.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"18 1","pages":"68-110"},"PeriodicalIF":0.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multilevel Multivariate Functional Principal Component Analysis of Evoked and Induced Event-Related Spectral Perturbations. 诱发和诱导事件相关谱扰动的多水平多元泛函主成分分析。
IF 0.4
Statistics in Biosciences Pub Date : 2025-12-29 DOI: 10.1007/s12561-025-09510-8
Mingfei Dong, Donatello Telesca, Abigail Dickinson, Catherine Sugar, Sara J Webb, Shafali Jeste, April R Levin, Frederick Shic, Adam Naples, Susan Faja, Geraldine Dawson, James C McPartland, Damla Şentürk
{"title":"Multilevel Multivariate Functional Principal Component Analysis of Evoked and Induced Event-Related Spectral Perturbations.","authors":"Mingfei Dong, Donatello Telesca, Abigail Dickinson, Catherine Sugar, Sara J Webb, Shafali Jeste, April R Levin, Frederick Shic, Adam Naples, Susan Faja, Geraldine Dawson, James C McPartland, Damla Şentürk","doi":"10.1007/s12561-025-09510-8","DOIUrl":"10.1007/s12561-025-09510-8","url":null,"abstract":"<p><p>Event-related spectral perturbations (ERSPs) capture dynamic changes in electroencephalography (EEG) power across frequency and trial time. Even though they are obtained at the trial level, they are commonly averaged across trials and analyzed at the subject level for enhancing the signal-to-noise ratio. While evoked activity is stimulus-locked, representing the brain's predictable response to stimuli, induced signals that are not strictly locked to stimulus presentation are thought to be generated by higher-order processes, such as attention and integration. Motivated by joint modeling of multilevel (trials nested in subjects) and multivariate (evoked and induced) ERSP data from a visual-evoked potentials (VEP) task, we propose a multilevel multivariate functional principal components analysis (FPCA) for high-dimensional functional outcomes as a function of time and frequency. The proposed estimation procedure utilizes multilevel univariate FPCA decompositions along each variate of the multivariate outcome using fast covariance estimation and incorporates the dependency across outcome variates at each level of the data. Hence, the proposed approach for multilevel multivariate FPCA can efficiently scale up to higher dimensional functional outcomes and increasing number of variates in the multivariate functional outcome vector. Extensive simulations show the efficacy of the proposed approach, while applications to VEP data lead to new insights on autism-specific neural activity patterns. The autistic group shows significantly lower evoked and higher induced gamma power compared to the neurotypical group. In addition, while subject level variation is dominated by variation in the stimulus-locked evoked signal in neurotypical development, it is dominated by induced power in autism.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146067612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Brier Score - an Overall Summary Measure for Risk Prediction Models with Clinical Utility Consideration. 加权Brier评分-考虑临床效用的风险预测模型的总体总结措施。
IF 0.4
Statistics in Biosciences Pub Date : 2025-09-18 DOI: 10.1007/s12561-025-09505-5
Kehao Zhu, Yingye Zheng, Kwun Chuen Gary Chan
{"title":"Weighted Brier Score - an Overall Summary Measure for Risk Prediction Models with Clinical Utility Consideration.","authors":"Kehao Zhu, Yingye Zheng, Kwun Chuen Gary Chan","doi":"10.1007/s12561-025-09505-5","DOIUrl":"10.1007/s12561-025-09505-5","url":null,"abstract":"<p><p>As advancements in novel biomarker-based algorithms and models accelerate their use in disease risk prediction, it is crucial to evaluate these models within the context of their intended clinical application. Prediction models output the absolute risk of disease; subsequently, patient counseling and shared decision-making are based on the estimated individual risk and cost-benefit assessment. The overall impact of the application is referred to as clinical utility, which received significant attention and desire to incorporate into model assessment lately. The classic Brier score is a popular measure of prediction accuracy; however, it is insufficient for effectively assessing clinical utility. To address this limitation, we propose a class of weighted Brier scores that aligns with the decision-theoretic framework of clinical utility. Additionally, we decompose the weighted Brier score into discrimination and calibration components, and we link the weighted Brier score to the <math><mi>H</mi></math> measure, which has been proposed as an alternative to the area under the receiver operating characteristic curve. This theoretical link to the <math><mi>H</mi></math> measure further supports our weighting method and underscores the essential elements of discrimination and calibration in risk prediction evaluation. The practical use of the weighted Brier score as an overall summary is demonstrated using data from a prostate cancer study.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523994/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145309467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accounting for Competing Risks in the Assessment of Prognostic Biomarkers' Discriminative Accuracy. 在评估预后生物标志物的鉴别准确性时考虑竞争风险。
IF 0.4
Statistics in Biosciences Pub Date : 2025-07-07 DOI: 10.1007/s12561-025-09499-0
Xinran Huang, Xinyang Jiang, Ruosha Li, Jing Ning
{"title":"Accounting for Competing Risks in the Assessment of Prognostic Biomarkers' Discriminative Accuracy.","authors":"Xinran Huang, Xinyang Jiang, Ruosha Li, Jing Ning","doi":"10.1007/s12561-025-09499-0","DOIUrl":"https://doi.org/10.1007/s12561-025-09499-0","url":null,"abstract":"<p><p>The discriminative performance of biomarkers often changes over time and exhibits heterogeneity across subgroups defined by patient characteristics. Assessing how this performance varies with these factors is crucial for a comprehensive evaluation of biomarkers and to identify areas for improvement in sub-populations with poor performance. Additionally, the presence of competing risks complicates the assessment of discriminative performance. Ignoring competing risks can lead to misleading conclusions, as the biomarker's performance for the event of interest, such as disease onset, may be confounded by its performance for competing events, such as death. To address these challenges, we develop a regression model to assess the impact of covariates on the discriminative performance of biomarkers, characterized by the covariate-specific time-dependent Area-undercurve (AUC) for a specific cause. We construct a pseudo partial-likelihood for estimation and inference and establish the asymptotic properties of the proposed estimators. Through simulation studies, we demonstrate the finite sample performance of these estimators, and we apply the proposed method to data from the African American Study of Kidney Disease and Hypertension (AASK).</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366773/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144973349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation. 集群级混杂的鲁棒隐私保护模型:识别移植获取中的差异。
IF 0.4
Statistics in Biosciences Pub Date : 2025-07-07 DOI: 10.1007/s12561-025-09496-3
Nicholas Hartman, Kevin He
{"title":"Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation.","authors":"Nicholas Hartman, Kevin He","doi":"10.1007/s12561-025-09496-3","DOIUrl":"10.1007/s12561-025-09496-3","url":null,"abstract":"<p><p>In health services applications where the patients are clustered within common institutions or geographic regions, it is often of interest to estimate the treatment effects of the medical providers after adjusting for confounding risk factors that are related to patients' choices of provider but beyond the providers' control. While most existing risk-adjustment methods are only capable of controlling for patient-level confounding risk factors (e.g., age or comorbidities), there are often important cluster-level confounding variables (e.g., regional or community-level risk factors) that should be accounted for in provider evaluations. These adjustments for cluster-level confounding factors are further complicated by the limited availability of protected patient health data, the inevitable influence of unobservable confounding factors, and the presence of outlying cluster units. To address these issues, we propose a privacy-preserving model and a novel Pseudo-Bayesian inference method to robustly assess the providers' treatment effects with adjustments for observed cluster-level confounders and corrections for overdispersion from unobserved cluster-level confounding factors. We derive theoretical connections between our proposed estimation method and the Correlated Random Effects model, uncovering several advantages in terms of estimation stability, computational efficiency, and privacy preservation. Motivated by efforts to improve equity in transplant care, we apply these methods to evaluate transplant centers while adjusting for observed geographic disparities in donor organ availability and correcting for overdispersion from unobservable confounding factors, such as the complex impact of the COVID-19 pandemic.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12830051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Central Posterior Envelopes for Bayesian Longitudinal Functional Principal Component Analysis. 贝叶斯纵向功能主成分分析的中央后包膜。
IF 0.4
Statistics in Biosciences Pub Date : 2025-07-05 DOI: 10.1007/s12561-025-09497-2
Joanna Boland, Qi Qian, Donatello Telesca, Shafali Jeste, Abigail Dickinson, Damla Şentürk
{"title":"Central Posterior Envelopes for Bayesian Longitudinal Functional Principal Component Analysis.","authors":"Joanna Boland, Qi Qian, Donatello Telesca, Shafali Jeste, Abigail Dickinson, Damla Şentürk","doi":"10.1007/s12561-025-09497-2","DOIUrl":"10.1007/s12561-025-09497-2","url":null,"abstract":"<p><p>Longitudinally observed functional data are commonly encountered in biomedical studies. Under the weak separability assumption of the high dimensional covariance, the recently proposed Bayesian longitudinal functional principal component analysis (B-LFPCA) achieves the decomposition of the multidimensional signal into highly interpretable lower dimensional summaries, including eigenfunctions that capture directions of variation in the data along the longitudinal and functional dimensions. B-LFPCA provides uncertainty quantification of the estimated functional decomposition components through simultaneous parametric credible bands formed using the posterior sample. However, these traditional summaries are inherently based on point-wise summaries of the estimated functional components and do not take into account the functional nature of the estimated quantities. We introduce central posterior envelopes (CPEs) for uncertainty quantification of the low-dimensional B-LFPCA decomposition components based on functional depth ordering of the posterior estimates. The proposed CPEs are fully data-driven visualization tools, displaying the most-central regions of the posterior sample at specified <math><mi>α</mi></math> -level percentile contours. Modified band depth and modified volume depth are utilized to order posterior sample of functional decomposition components, including the mean function and the marginal longitudinal and functional eigenfunctions. The proposed CPEs are applied to analyze the longitudinally observed Event Related Potentials (ERPs) recorded during an implicit learning paradigm, leading to novel insights on longitudinal learning trends across a group of autistic kids and their neurotypical peers. Finally, effectiveness of the proposed CPEs is demonstrated through extensive simulations that explore different scenarios of increased variability in the longitudinal functional data.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716410/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias and Efficiency Comparison between Multiple Imputation and Available-Case Analysis for Missing Data in Longitudinal Models. 纵向模型中缺失数据的多重输入与有效案例分析的偏差与效率比较。
IF 0.4
Statistics in Biosciences Pub Date : 2025-06-12 DOI: 10.1007/s12561-025-09493-6
Panpan Zhang, Sharon X Xie
{"title":"Bias and Efficiency Comparison between Multiple Imputation and Available-Case Analysis for Missing Data in Longitudinal Models.","authors":"Panpan Zhang, Sharon X Xie","doi":"10.1007/s12561-025-09493-6","DOIUrl":"10.1007/s12561-025-09493-6","url":null,"abstract":"<p><p>In this paper, we compare the performance of available-case analysis (ACA) and several multiple imputation (MI) approaches for handling missing data problems in longitudinal analysis through estimation bias and relative efficiency. When the missingness of covariates depends on observed responses, ACA produces estimation bias, but it is preferred when there are only missing values in longitudinal responses. Multilevel MI methods are not always a solution to longitudinal data analysis. Single-level MI methods, like fully conditional specification (FCS), provide unbiased estimates under a variety of missing data scenarios, and improve efficiency gain in certain scenarios. The general assumption of missing data mechanism is missing at random (MAR). We carry out a systematic synthetic data analysis where missing data exist in longitudinal outcomes or/and covariates under different kinds of missing data generation procedures. The analysis model is a linear mixed-effects model. For each of the missing data scenarios, we give our recommendation (between ACA and a specific MI method) based on theoretical justifications and extensive simulations. In addition, a longitudinal neurodegenerative disease dataset is used as a real case study.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144875909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Covariate-Balancing-Aware Interpretable Deep Learning Models for Treatment Effect Estimation. 用于治疗效果估计的协变量平衡感知可解释深度学习模型。
IF 0.4
Statistics in Biosciences Pub Date : 2025-04-01 Epub Date: 2023-10-28 DOI: 10.1007/s12561-023-09394-6
Kan Chen, Qishuo Yin, Qi Long
{"title":"Covariate-Balancing-Aware Interpretable Deep Learning Models for Treatment Effect Estimation.","authors":"Kan Chen, Qishuo Yin, Qi Long","doi":"10.1007/s12561-023-09394-6","DOIUrl":"10.1007/s12561-023-09394-6","url":null,"abstract":"<p><p>Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first provide a theoretical analysis and derive an upper bound for the bias of average treatment effect (ATE) estimation under the strong ignorability assumption. Derived by leveraging appealing properties of the weighted energy distance, our upper bound is tighter than what has been reported in the literature. Motivated by the theoretical analysis, we propose a novel objective function for estimating the ATE that uses the energy distance balancing score and hence does not require the correct specification of the propensity score model. We also leverage recently developed neural additive models to improve interpretability of deep learning models used for potential outcome prediction. We further enhance our proposed model with an energy distance balancing score weighted regularization. The superiority of our proposed model over current state-of-the-art methods is demonstrated in semi-synthetic experiments using two benchmark datasets, namely, IHDP and ACIC, as well as is examined through the study of the effect of smoking on the blood level of cadmium using NHANES.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"17 1","pages":"132-150"},"PeriodicalIF":0.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957463/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepBiome: A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis. DeepBiome:用于微生物组数据分析的系统发育树信息深度神经网络。
IF 0.4
Statistics in Biosciences Pub Date : 2025-04-01 Epub Date: 2024-06-14 DOI: 10.1007/s12561-024-09434-9
Jing Zhai, Youngwon Choi, Xingyi Yang, Yin Chen, Kenneth Knox, Homer L Twigg, Joong-Ho Won, Hua Zhou, Jin J Zhou
{"title":"DeepBiome: A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis.","authors":"Jing Zhai, Youngwon Choi, Xingyi Yang, Yin Chen, Kenneth Knox, Homer L Twigg, Joong-Ho Won, Hua Zhou, Jin J Zhou","doi":"10.1007/s12561-024-09434-9","DOIUrl":"10.1007/s12561-024-09434-9","url":null,"abstract":"<p><p>Evidence linking the microbiome to human health is rapidly growing. The microbiome profile has the potential as a novel predictive biomarker for many diseases. However, tables of bacterial counts are typically sparse, and bacteria are classified within a hierarchy of taxonomic levels, ranging from species to phylum. Existing tools focus on identifying microbiome associations at either the community level or a specific, pre-defined taxonomic level. Incorporating the evolutionary relationship between bacteria can enhance data interpretation. This approach allows for aggregating microbiome contributions, leading to more accurate and interpretable results. We present DeepBiome, a phylogeny-informed neural network architecture, to predict phenotypes from microbiome counts and uncover the microbiome-phenotype association network. It utilizes microbiome abundance as input and employs phylogenetic taxonomy to guide the neural network's architecture. Leveraging phylogenetic information, DeepBiome is applicable to both regression and reduces the need for extensive tuning of the deep learning architecture, minimizes overfitting, and, crucially, enables the visualization of the path from microbiome counts to disease. It classification problems. Simulation studies and real-life data analysis have shown that DeepBiome is both highly accurate and efficient. It offers deep insights into complex microbiome-phenotype associations, even with small to moderate training sample sizes. In practice, the specific taxonomic level at which microbiome clusters tag the association remains unknown. Therefore, the main advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host-microbe interactions, which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and TensorFlow. It is an open-source tool available at https://github.com/Young-won/DeepBiome.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":"17 1","pages":"191-215"},"PeriodicalIF":0.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144973306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices. 不平衡特征矩阵的新型矩阵上标量回归。
IF 0.4
Statistics in Biosciences Pub Date : 2025-03-05 DOI: 10.1007/s12561-025-09476-7
Jeremy Rubin, Fan Fan, Laura Barisoni, Andrew R Janowczyk, Jarcy Zee
{"title":"Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices.","authors":"Jeremy Rubin, Fan Fan, Laura Barisoni, Andrew R Janowczyk, Jarcy Zee","doi":"10.1007/s12561-025-09476-7","DOIUrl":"10.1007/s12561-025-09476-7","url":null,"abstract":"<p><p>Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject's biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145138874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书