Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative
{"title":"inMTSCCA:多组脑成像遗传学的综合多任务稀疏典型相关分析。","authors":"Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1016/j.gpb.2023.03.005","DOIUrl":null,"url":null,"abstract":"<div><p>Identifying <strong>genetic risk factors</strong> for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of <strong>cross-endophenotype</strong> (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, <em>i.e.</em>, pairwise endophenotype correlation-guided MTSCCA (<em>pc</em>MTSCCA) and high-order endophenotype correlation-guided MTSCCA (<em>hoc</em>MTSCCA). <em>pc</em>MTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. <em>hoc</em>MTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared <em>pc</em>MTSCCA and <em>hoc</em>MTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using <strong>multi-omic endophenotypes</strong> and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007330</span><svg><path></path></svg>.</p></div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 2","pages":"Pages 396-413"},"PeriodicalIF":11.5000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics\",\"authors\":\"Lei Du, Jin Zhang, Ying Zhao, Muheng Shang, Lei Guo, Junwei Han, The Alzheimer's Disease Neuroimaging Initiative\",\"doi\":\"10.1016/j.gpb.2023.03.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Identifying <strong>genetic risk factors</strong> for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of <strong>cross-endophenotype</strong> (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, <em>i.e.</em>, pairwise endophenotype correlation-guided MTSCCA (<em>pc</em>MTSCCA) and high-order endophenotype correlation-guided MTSCCA (<em>hoc</em>MTSCCA). <em>pc</em>MTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. <em>hoc</em>MTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared <em>pc</em>MTSCCA and <em>hoc</em>MTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using <strong>multi-omic endophenotypes</strong> and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at <span>https://ngdc.cncb.ac.cn/biocode/tools/BT007330</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":12528,\"journal\":{\"name\":\"Genomics, Proteomics & Bioinformatics\",\"volume\":\"21 2\",\"pages\":\"Pages 396-413\"},\"PeriodicalIF\":11.5000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, Proteomics & Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1672022923000943\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022923000943","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics
Identifying genetic risk factors for Alzheimer’s disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case–control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
期刊介绍:
Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.