Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li
{"title":"Molecular group and correlation guided structural learning for multi-phenotype prediction.","authors":"Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li","doi":"10.1093/bib/bbae585","DOIUrl":null,"url":null,"abstract":"<p><p>We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562839/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae585","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.
我们提出了一种生物信息学监督学习工具--生物组指导的多变量多反应线性回归(Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion,Brilliant),该工具设计用于具有多表型反应的基因组数据的特征选择和结果预测。Brilliant 特别将基因组和/或表型分组结构以及表型相关结构纳入特征选择、效应估计和受惩罚多反应线性回归模型下的结果预测中。大量的模拟证明,与同类方法相比,Brilliant 的性能更优越。我们将 Brilliant 应用于两项 omics 研究。在第一项研究中,我们在多变量基因表达和高维 DNA 甲基化图谱之间发现了新的关联信号,为波多黎各儿童哮喘队列中的基线 CpG 基因调控模式提供了生物学见解。第二项研究的重点是利用高维基因表达谱进行细胞类型解旋预测。利用 Brilliant,我们提高了细胞类型分数预测的准确性,并确定了新的细胞类型特征基因。
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.