不平衡特征矩阵的新型矩阵上标量回归。

IF 0.4 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Jeremy Rubin, Fan Fan, Laura Barisoni, Andrew R Janowczyk, Jarcy Zee
{"title":"不平衡特征矩阵的新型矩阵上标量回归。","authors":"Jeremy Rubin, Fan Fan, Laura Barisoni, Andrew R Janowczyk, Jarcy Zee","doi":"10.1007/s12561-025-09476-7","DOIUrl":null,"url":null,"abstract":"<p><p>Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject's biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456458/pdf/","citationCount":"0","resultStr":"{\"title\":\"Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices.\",\"authors\":\"Jeremy Rubin, Fan Fan, Laura Barisoni, Andrew R Janowczyk, Jarcy Zee\",\"doi\":\"10.1007/s12561-025-09476-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject's biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.</p>\",\"PeriodicalId\":45094,\"journal\":{\"name\":\"Statistics in Biosciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456458/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Biosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s12561-025-09476-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Biosciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12561-025-09476-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

数字化肾活检小管的图像特征可以作为新的生物标志物为疾病预后提供见解。对于每个受试者,我们可以构建一个矩阵,其条目是一组共同的图像特征(例如,面积,方向,偏心率),这些特征是对该受试者活检的每个小管进行测量的。以往使用图像特征矩阵预测标量结果的标量-矩阵回归方法无法处理不同受试者的不同数量的小管。我们提出了聚类结构化laSSO (CLUSSO),这是一种新颖的矩阵上标量回归技术,允许不平衡数量的小管,以预测图像特征矩阵的标量结果。通过将小管分类到两个不同的簇中,CLUSSO对主题内和簇内的小管特征值进行平均和加权,以创建可用于结构化套索回归的平衡特征矩阵。我们发展了理论的大管样本性质,用于特征系数估计的误差范围。仿真研究结果表明,在识别真正影响结果的图像特征时,相对于在所有小管中平均特征值的朴素方法,CLUSSO通常具有较低的假阳性率和较高的真阳性率。此外,我们发现CLUSSO具有较低的偏差,并且可以以与naïve方法竞争的精度预测结果。最后,我们将CLUSSO应用于来自肾病综合征研究网络(NEPTUNE)的肾小球疾病患者肾活检的肾管图像特征来预测肾功能,并使用来自治愈肾小球肾病(CureGN)研究的受试者作为外部验证集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices.

Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject's biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistics in Biosciences
Statistics in Biosciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
2.00
自引率
0.00%
发文量
28
期刊介绍: Statistics in Biosciences (SIBS) is published three times a year in print and electronic form. It aims at development and application of statistical methods and their interface with other quantitative methods, such as computational and mathematical methods, in biological and life science, health science, and biopharmaceutical and biotechnological science. SIBS publishes scientific papers and review articles in four sections, with the first two sections as the primary sections. Original Articles publish novel statistical and quantitative methods in biosciences. The Bioscience Case Studies and Practice Articles publish papers that advance statistical practice in biosciences, such as case studies, innovative applications of existing methods that further understanding of subject-matter science, evaluation of existing methods and data sources. Review Articles publish papers that review an area of statistical and quantitative methodology, software, and data sources in biosciences. Commentaries provide perspectives of research topics or policy issues that are of current quantitative interest in biosciences, reactions to an article published in the journal, and scholarly essays. Substantive science is essential in motivating and demonstrating the methodological development and use for an article to be acceptable. Articles published in SIBS share the goal of promoting evidence-based real world practice and policy making through effective and timely interaction and communication of statisticians and quantitative researchers with subject-matter scientists in biosciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信