inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics

IF 11.5 2区 生物学 Q1 GENETICS & HEREDITY
Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative
{"title":"inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics","authors":"Lei Du,&nbsp;Jin Zhang,&nbsp;Ying Zhao,&nbsp;Muheng Shang,&nbsp;Lei Guo,&nbsp;Junwei Han,&nbsp;The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1016/j.gpb.2023.03.005","DOIUrl":null,"url":null,"abstract":"<div><p>Identifying <strong>genetic risk factors</strong> for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of <strong>cross-endophenotype</strong> (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, <em>i.e.</em>, pairwise endophenotype correlation-guided MTSCCA (<em>pc</em>MTSCCA) and high-order endophenotype correlation-guided MTSCCA (<em>hoc</em>MTSCCA). <em>pc</em>MTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. <em>hoc</em>MTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared <em>pc</em>MTSCCA and <em>hoc</em>MTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using <strong>multi-omic endophenotypes</strong> and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007330</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 396-413"},"PeriodicalIF":11.5000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022923000943","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying genetic risk factors for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.

inMTSCCA:多组脑成像遗传学的综合多任务稀疏典型相关分析。
识别阿尔茨海默病(AD)的遗传危险因素是一个重要的研究课题。到目前为止,与病例对照研究相比,不同的内表型,如成像衍生的内表型和蛋白质组表达衍生的内血型,在揭示风险基因方面显示出巨大的价值。在生物学上,不同组学衍生的内表型的共同变化模式可能是由共同的遗传基础造成的。然而,现有的方法主要集中于内表型单独的影响;交叉内表型(CEP)关联的作用在很大程度上仍未被利用。在这项研究中,我们使用多组数据的内表型及其CEP关联来识别遗传风险因素,并提出了两种集成的多任务稀疏典型相关分析(inMTSCCA)方法,即成对内表型相关引导的MTSCCA(pcMTSCCA。pcMTSCCA采用磁共振成像(MRI)衍生的、血浆衍生的和脑脊液(CSF)衍生的内表型之间的成对相关性作为额外的惩罚。hocMTSCCA使用这些多组数据之间的高阶相关性进行正则化。为了找出个体和群体水平的遗传风险因素,以及改变的内表型标记,我们对两个模型都引入了稀疏性诱导惩罚。我们在模拟和真实数据集(包括神经成像数据、蛋白质组分析和遗传数据)上比较了pcMTSCCA和hocMTSCCA与三种相关方法。结果表明,与基准测试相比,我们的方法获得了更好或可比的正则相关系数和更好的特征子集。最重要的是,已鉴定的遗传位点和异质性内表型标记显示出高度相关性。因此,联合使用多组体内表型及其CEP关联有望揭示遗传风险因素。inMTSCCA的源代码和手册可在https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genomics, Proteomics & Bioinformatics
Genomics, Proteomics & Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
14.30
自引率
4.20%
发文量
844
审稿时长
61 days
期刊介绍: Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信